# Model Deployment & Prediction Pipeline

### Introduction
This notebook wraps up the demand forecasting pipeline by turning our best-performing trained model into deployable assets. The LightGBM model and feature columns are already saved — now, we build real-world inference functions and simulate usage.   

### Project Goals
• Load best-performing LightGBM model  
• Create scalable, modular prediction function  
• Load feature columns for consistent prediction inputs  
• Modularize logic for reproducibility and future expansion    

### Importing Python Modules

In [1]:
# Core packages
import numpy as np
import pandas as pd

# Model loading/saving
import joblib

# Evaluation
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Plotting
import matplotlib.pyplot as plt

# Suppress Warnings for Clean Output
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

### Inference Function

In [2]:
#Quick Predict Function (Testing / Exploration)

def predict_lgb(input_df: pd.DataFrame, feature_columns: list, model) -> np.ndarray:
    """
    Predict Units_Sold using a LightGBM model for a given input DataFrame.

    Args:
        input_df (pd.DataFrame): New data
        feature_columns (list): Features used in model training
        model: Loaded LightGBM model object

    Returns:
        np.ndarray: Predicted Units_Sold values
    """
    missing_cols = set(feature_columns) - set(input_df.columns)
    if missing_cols:
        raise ValueError(f"Missing columns in input data: {missing_cols}")

    X = input_df[feature_columns].copy()
    preds_log = model.predict(X)
    preds_actual = np.expm1(preds_log)
    return preds_actual

### Validation on New Inputs

In [3]:
# Load saved LightGBM model and feature columns
lgb_model = joblib.load("../models/lgb_comb_model.pkl")
with open("../models/lgb_comb_features.json", "r") as f:
    feature_columns_lgb = json.load(f)
  
# Load test data
test_lgb = pd.read_parquet("../data/X_test_lgb_comb.parquet")

# Predict on sample input
sample_preds = predict_lgb(test_lgb, feature_columns_lgb, lgb_model)

print(sample_preds[:5])

[74.85999584 67.42318876 64.77749185 78.45253508 99.88417163]


### Modularization and Structure

In [4]:
# Full Modularized Predict Function

def predict_lgb_from_dataframe(df: pd.DataFrame, feature_cols_lgb: list, model_path: str) -> np.ndarray:
    """
    Modularized prediction function for loading model, selecting features, and predicting Units_Sold.

    Args:
        df (pd.DataFrame): Input data
        feature_cols_lgb (list): Trained feature list
        model_path (str): Path to LightGBM model (.pkl)

    Returns:
        np.ndarray: Predicted Units_Sold
    """
    model = joblib.load(model_path)

    missing_cols = set(feature_cols_lgb) - set(df.columns)
    if missing_cols:
        raise ValueError(f"Missing columns in input data: {missing_cols}")

    X = df[feature_cols_lgb].copy()
    preds_log = model.predict(X)
    preds_actual = np.expm1(preds_log)
    return preds_actual

In [5]:
# Example modularized usage
preds_lgb = predict_lgb_from_dataframe(
    df=test_lgb,
    feature_cols_lgb=feature_columns_lgb,
    model_path="../models/lgb_comb_model.pkl"
)
# Output preview
print(preds_lgb[:5])

[74.85999584 67.42318876 64.77749185 78.45253508 99.88417163]


### Final Integration: Loading Production-Ready Prediction Functions

In [6]:
# 1. Adding project path to sys
import sys
import os
import json
import joblib
import pandas as pd

current_dir = os.getcwd()
project_root = os.path.abspath(os.path.join(current_dir, ".."))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# 2. Import production module
from pipeline.predict import predict_lgb_from_dataframe

# 3. Load assets
model_path = "../models/lgb_comb_model.pkl"
feature_path = "../models/lgb_comb_features.json"
data_path = "../data/X_test_lgb_comb.parquet" 

lgb_model = joblib.load(model_path)
with open(feature_path, "r") as f:
    feature_columns_lgb = json.load(f)

# 4. Load production data
df_production = pd.read_parquet("../data/X_test_lgb_comb.parquet")

# 5. Predict
production_preds = predict_lgb_from_dataframe(
    df=df_production,
    feature_cols_lgb=feature_columns_lgb,
    model_path=model_path
)

# Optional: preview predictions
print(production_preds[:5])

[74.85999584 67.42318876 64.77749185 78.45253508 99.88417163]


#### Author  
*Eszter Varga – Data Scientist*  
*GitHub: Timensider*  

*I worked through this project independently, collaborating with AI tools and documentation along the way — just as I would in a real-world workflow.*