### **Model Deployment and Inference**

This notebook validates that the trained **sales prediction pipeline** can be loaded and used to make predictions on new, unseen user input.

We load a single artifact:
- `models/sales_pipeline_latest.pkl`

#### **Model Selection Justification (Why ML over LSTM)**

We experimented with:
1. Machine Learning (LightGBM)
2. Time Series Forecasting
3. Deep Learning (LSTM)

The LightGBM model was selected for deployment because:
- Higher validation accuracy (better RMSE/R²)
- Instant inference (low latency)
- Requires only current business inputs (no 30-day history)
- Lightweight and suitable for real-time usage
- Easier for business users

LSTM requires sequential historical inputs and is slower for real-time business decisions.
Therefore, LightGBM is more suitable for production deployment.

In [None]:
import os
import pandas as pd
import joblib
import logging
import warnings

#### **Silencing Warnings**
These are not to be removed as they are ok to be present in the system. But in case there is a need of warning then this code we can comment or remove.

In [None]:
warnings.filterwarnings("ignore", category=UserWarning)

#### **Logger Setup**
Logs are written to the `logs/` folder. Use these logs to debug long training runs or unexpected behavior; they're helpful for reproducing issues later.

In [146]:
if not os.path.exists("logs"):
    os.makedirs("logs")

logging.basicConfig(
    filename="logs/model_deployment_inference.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    force=True
)

logger = logging.getLogger(__name__)

def print_and_log(message, level='info'):
    print(message)
    if level == 'info':
        logger.info(message)
    elif level == 'warning':
        logger.warning(message)
    elif level == 'error':
        logger.error(message)
    else:
        logger.debug(message) 
              
print_and_log("===== MODEL DEPLOYMENT AND INFERENCE SAMPLE PROCESS STARTED =====")

===== MODEL DEPLOYMENT AND INFERENCE SAMPLE PROCESS STARTED =====


#### **Loading the trained model pipeline(sample)**

In [147]:
logger.info("Loading the trained model pipeline...")
pipeline = joblib.load("models/sales_pipeline_latest.pkl")
print_and_log("Pipeline loaded successfully!")

Pipeline loaded successfully!


#### **Creating sample input data for inference**

In [148]:
logger.info("Creating sample input data for inference...")
sample_input = pd.DataFrame([{
    "storetype": "a",
    "assortment": "c",
    "stateholiday": "0",
    "customers": 650,
    "competitiondistance": 450,
    "promo": 1
}])
print_and_log("Sample input data created successfully!")
sample_input

Sample input data created successfully!


Unnamed: 0,storetype,assortment,stateholiday,customers,competitiondistance,promo
0,a,c,0,650,450,1


#### **Performing inference using the loaded pipeline**

In [149]:
logger.info("Performing inference using the loaded pipeline...")
prediction = pipeline.predict(sample_input)
print_and_log(f"Predicted Store Sales: {int(prediction[0])}")

Predicted Store Sales: 6077





### **Summary**
#### **Final Summary — Model Deployment and Inference**

In this notebook, we validated the deployment readiness of the Rossmann Store Sales Prediction system.

The objective was to confirm that the trained model can operate independently of the training notebooks and correctly generate predictions using new unseen input data.


#### **What Was Done**

1. The final trained machine learning pipeline (`sales_pipeline_latest.pkl`) was loaded.
2. A real-world business input scenario was simulated (store conditions provided manually).
3. The pipeline automatically performed:
   - categorical encoding
   - numerical scaling
   - feature alignment
4. The model generated a predicted sales value successfully.

This confirms that the preprocessing and modeling steps are fully integrated inside a single reusable pipeline.


#### **Why a Pipeline Was Used**

During earlier experiments, the model depended on separate components:
- label encoders
- scalers
- feature transformations

This caused deployment issues due to schema mismatch between training and inference.

To resolve this, a Scikit-Learn Pipeline was created combining:
Preprocessing (ColumnTransformer) + LightGBM Model

This ensures:
• identical preprocessing during training and prediction  
• no manual feature handling during deployment  
• reduced risk of errors  
• easier integration with applications

#### **Model Selection Justification**

Multiple approaches were explored:
- Machine Learning (LightGBM)
- Time Series Forecasting
- Deep Learning (LSTM)

Although LSTM captured sequential patterns, it required historical sequences and longer inference time.

The LightGBM pipeline was selected for deployment because:
- better performance on tabular retail data
- faster prediction time
- requires only current store parameters
- suitable for real business users

#### **Deployment Readiness**

The successful prediction confirms:
- model serialization works correctly
- preprocessing pipeline functions properly
- feature schema consistency is maintained
- the system is ready for real-time usage

The same pipeline is now integrated into a Streamlit web application (`app.py`) where a user can enter store
