# Model Inference with Online Features

This notebook demonstrates:
1. **Load model** from MLflow Model Registry
2. **Get online features** from Feast for real-time prediction
3. **Make predictions** and visualize results

## Prerequisites
- Completed `02-feast-features.ipynb` (features materialized)
- Completed `03-training.ipynb` (model registered in MLflow)


---
## 1. Setup


In [None]:
import os
import warnings
warnings.filterwarnings('ignore')

from pathlib import Path
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import mlflow
import mlflow.pytorch
from feast import FeatureStore
import joblib

# Configuration - aligned with example manifests
NAMESPACE = os.environ.get("NAMESPACE", "feast-mlops-demo")
MLFLOW_TRACKING_URI = f"http://mlflow.{NAMESPACE}.svc.cluster.local:5000"
SHARED_DIR = os.environ.get("SHARED_DIR", "/shared")
os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

print(f"""
Configuration:
  Namespace: {NAMESPACE}
  MLflow: {MLFLOW_TRACKING_URI}
  Shared Storage: {SHARED_DIR}
""")


---
## 2. Load Model from MLflow


In [None]:
# Load production model from MLflow
model_name = "sales-forecast-model"
model_uri = f"models:/{model_name}/Production"

try:
    model = mlflow.pytorch.load_model(model_uri)
    print(f"‚úÖ Loaded model: {model_name} (Production)")
except Exception as e:
    print(f"‚ö†Ô∏è Could not load from MLflow: {e}")
    print("Loading from local file...")
    
    # Define model architecture
    class SalesForecastModel(nn.Module):
        def __init__(self, input_dim=9):
            super().__init__()
            self.network = nn.Sequential(
                nn.Linear(input_dim, 128),
                nn.BatchNorm1d(128),
                nn.ReLU(),
                nn.Dropout(0.2),
                nn.Linear(128, 64),
                nn.BatchNorm1d(64),
                nn.ReLU(),
                nn.Linear(64, 32),
                nn.ReLU(),
                nn.Linear(32, 1)
            )
        def forward(self, x):
            return self.network(x).squeeze(-1)
    
    model = SalesForecastModel()
    model.load_state_dict(torch.load(f"{SHARED_DIR}/models/sales_forecast_model.pt"))
    print("‚úÖ Loaded model from local file")

model.eval()


In [None]:
# Load scalers
scalers = joblib.load(f"{SHARED_DIR}/models/scalers.joblib")
scaler_X = scalers["scaler_X"]
scaler_y = scalers["scaler_y"]
print("‚úÖ Loaded scalers")


---
## 3. Get Online Features from Feast


In [None]:
# Initialize Feast
REPO_DIR = Path(SHARED_DIR) / "feature_repo"
fs = FeatureStore(repo_path=str(REPO_DIR))
print(f"Feast project: {fs.project}")


In [None]:
# Simulate real-time prediction request
# In production, these would come from incoming requests
entity_rows = [
    {"store_id": 1, "dept_id": 1},
    {"store_id": 1, "dept_id": 5},
    {"store_id": 10, "dept_id": 3},
    {"store_id": 25, "dept_id": 7},
    {"store_id": 45, "dept_id": 10},
]

# Get online features (low-latency from PostgreSQL)
print("üöÄ Fetching online features...")
feature_vector = fs.get_online_features(
    features=[
        "sales_features:lag_1",
        "sales_features:lag_2",
        "sales_features:lag_4",
        "sales_features:rolling_mean_4w",
        "store_features:store_size",
        "store_features:temperature",
        "store_features:fuel_price",
        "store_features:cpi",
        "store_features:unemployment",
    ],
    entity_rows=entity_rows
).to_df()

print(f"‚úÖ Retrieved features for {len(feature_vector)} entities")
feature_vector


---
## 4. Make Predictions


In [None]:
# Prepare features for prediction
feature_cols = ["lag_1", "lag_2", "lag_4", "rolling_mean_4w",
                "store_size", "temperature", "fuel_price", "cpi", "unemployment"]

# Handle missing values (use median imputation)
X = feature_vector[feature_cols].fillna(feature_vector[feature_cols].median()).values

# Scale features
X_scaled = scaler_X.transform(X)

# Make predictions
with torch.no_grad():
    X_tensor = torch.FloatTensor(X_scaled)
    predictions_scaled = model(X_tensor).numpy()

# Inverse transform to get actual values
predictions = scaler_y.inverse_transform(predictions_scaled.reshape(-1, 1)).flatten()

# Create results dataframe
results = pd.DataFrame({
    "store_id": [e["store_id"] for e in entity_rows],
    "dept_id": [e["dept_id"] for e in entity_rows],
    "predicted_weekly_sales": predictions.round(2)
})

print("üìä Predictions:")
results


In [None]:
# Visualize predictions
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 5))

# Create labels
labels = [f"Store {r['store_id']}\nDept {r['dept_id']}" for _, r in results.iterrows()]
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(results)))

bars = ax.barh(labels, results["predicted_weekly_sales"], color=colors)
ax.set_xlabel("Predicted Weekly Sales ($)")
ax.set_title("Real-Time Sales Forecasts")

# Add value labels
for bar, val in zip(bars, results["predicted_weekly_sales"]):
    ax.text(val + 50, bar.get_y() + bar.get_height()/2, 
            f"${val:,.0f}", va='center', fontweight='bold')

ax.set_xlim(0, max(results["predicted_weekly_sales"]) * 1.2)
plt.tight_layout()
plt.show()


---
## 5. Batch Inference (Optional)

For large-scale batch predictions, use the offline store.


In [None]:
from datetime import datetime, timezone

# Create entity DataFrame for batch inference
# All store-dept combinations for current week
stores = list(range(1, 51))
depts = list(range(1, 13))
now = datetime.now(timezone.utc)

batch_entities = pd.DataFrame([
    {"store_id": s, "dept_id": d, "event_timestamp": now}
    for s in stores for d in depts
])

print(f"Batch inference for {len(batch_entities):,} entity combinations")


In [None]:
# Fetch batch features (via Ray for distributed processing)
print("üöÄ Fetching batch features via Ray...")

batch_features = fs.get_historical_features(
    entity_df=batch_entities,
    features=[
        "sales_features:lag_1",
        "sales_features:lag_2",
        "sales_features:lag_4",
        "sales_features:rolling_mean_4w",
        "store_features:store_size",
        "store_features:temperature",
        "store_features:fuel_price",
        "store_features:cpi",
        "store_features:unemployment",
    ]
).to_df()

print(f"‚úÖ Retrieved {len(batch_features):,} feature rows")


In [None]:
# Batch predictions
X_batch = batch_features[feature_cols].fillna(batch_features[feature_cols].median()).values
X_batch_scaled = scaler_X.transform(X_batch)

with torch.no_grad():
    batch_preds_scaled = model(torch.FloatTensor(X_batch_scaled)).numpy()

batch_preds = scaler_y.inverse_transform(batch_preds_scaled.reshape(-1, 1)).flatten()

batch_features["predicted_sales"] = batch_preds

# Aggregate by store
store_forecast = batch_features.groupby("store_id")["predicted_sales"].sum().reset_index()
store_forecast.columns = ["store_id", "total_predicted_sales"]
store_forecast = store_forecast.sort_values("total_predicted_sales", ascending=False)

print("üè™ Top 10 Stores by Predicted Weekly Sales:")
store_forecast.head(10)


---
## Summary

‚úÖ **What we accomplished:**
1. Loaded production model from MLflow Model Registry
2. Fetched online features from Feast (low-latency PostgreSQL)
3. Made real-time predictions for specific store-department combinations
4. Performed batch inference using historical features (distributed via Ray)

---

## üéâ End-to-End Pipeline Complete!

| Stage | Component | What Happened |
|-------|-----------|---------------|
| **Features** | Feast + Ray | Distributed feature computation |
| **Training** | Kubeflow + MLflow | Distributed training, experiment tracking |
| **Serving** | MLflow Registry | Model versioning, deployment staging |
| **Inference** | Feast Online Store | Low-latency feature serving |

**Production considerations:**
- Deploy Feast feature server for HTTP-based feature serving
- Use KServe for model serving with autoscaling
- Set up MLflow Model Registry webhooks for CI/CD
- Configure monitoring dashboards for feature drift detection
