## **PROJECT TITLE**

## MODEL DEPLOYMENT


# **üéØ PROJECT OBJECTIVE**

The objective of this project is to:

train a machine learning regression model on historical sales data,

evaluate its performance,

and deploy the trained model as a web application that allows users to predict sales for future dates.

This project demonstrates end-to-end data analytics skills:

data preparation ‚Üí model training ‚Üí evaluation ‚Üí deployment.

# **PART 1 ‚Äî MODEL TRAINING & SAVING**

In [1]:
# @title Import Required Libraries

import pandas as pd
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split

import joblib
print("imported")

imported


In [2]:
# @title Load Dataset
df = pd.read_csv("train.csv")
df.head()


Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,0,2013-01-01,1,AUTOMOTIVE,0.0,0
1,1,2013-01-01,1,BABY CARE,0.0,0
2,2,2013-01-01,1,BEAUTY,0.0,0
3,3,2013-01-01,1,BEVERAGES,0.0,0
4,4,2013-01-01,1,BOOKS,0.0,0


In [3]:
# @title Data Cleaning & Preparation

# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

# Sort by date
df = df.sort_values('date')

# Fill missing promotion values
df['onpromotion'].fillna(0, inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['onpromotion'].fillna(0, inplace=True)


In [4]:
# @title Feature Engineering

# Time-based features
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['weekday'] = df['date'].dt.weekday


In [5]:
# @title Define Features & Target

X = df[['year', 'month', 'day', 'weekday', 'onpromotion']]
y = df['sales']

In [6]:
# @title Train‚ÄìTest Split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False
)

In [7]:
# @title Polynomial Regression Model

poly = PolynomialFeatures(degree=2, include_bias=False)

X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

model = LinearRegression()
model.fit(X_train_poly, y_train)


In [8]:
# @title Model Evaluation

y_pred = model.predict(X_test_poly)

mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("MAE:", mae)
print("RMSE:", rmse)


MAE: 426.1042092738523
RMSE: 1043.1315057732306


In [9]:
# @title Save Model & Transformer

joblib.dump(model, "sales_model.pkl")
joblib.dump(poly, "poly_transformer.pkl")


['poly_transformer.pkl']

# **PART 2 ‚Äî DEPLOYMENT (Streamlit App)**

In [10]:
!pip install -r reuirements.txt


Collecting streamlit (from -r reuirements.txt (line 1))
  Downloading streamlit-1.53.1-py3-none-any.whl.metadata (10 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit->-r reuirements.txt (line 1))
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.53.1-py3-none-any.whl (9.1 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m9.1/9.1 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m6.9/6.9 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.53.1


In [11]:
import streamlit as st
import pandas as pd
import joblib

# Load trained model and transformer
model = joblib.load("sales_model.pkl")
poly = joblib.load("poly_transformer.pkl")

st.set_page_config(page_title="Sales Forecasting App")

st.title("üìà Sales Forecasting ML App")
st.write("Predict future sales using a trained machine learning model.")

# User inputs
year = st.number_input("Year", min_value=2013, max_value=2030, value=2017)
month = st.number_input("Month", min_value=1, max_value=12, value=1)
day = st.number_input("Day", min_value=1, max_value=31, value=1)
weekday = st.number_input("Weekday (0=Mon, 6=Sun)", min_value=0, max_value=6, value=0)
onpromotion = st.number_input("Items on Promotion", min_value=0, value=0)

# Prediction
if st.button("Predict Sales"):
    input_data = pd.DataFrame(
        [[year, month, day, weekday, onpromotion]],
        columns=['year', 'month', 'day', 'weekday', 'onpromotion']
    )

    input_poly = poly.transform(input_data)
    prediction = model.predict(input_poly)

    st.success(f"üí∞ Predicted Sales: {prediction[0]:.2f}")


2026-01-30 12:36:52.144 
  command:

    streamlit run /usr/local/lib/python3.12/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2026-01-30 12:36:52.183 Session state does not function when running a script without `streamlit run`


# **Key Notes**
- Historical sales data was prepared using time-based feature engineering.
- Polynomial regression was used to capture non-linear trends and seasonality.
- The trained model was evaluated using MAE and RMSE metrics.
- The final model was saved and deployed using a Streamlit web application.
- The deployment enables real-time sales prediction through user inputs.


## **Conclusion**
This project demonstrates a complete end-to-end machine learning workflow,
from data preparation and model training to deployment.
By converting the trained model into a web application, the project shows how
data analytics solutions can be transformed into practical, business-ready tools
for decision-making.
