# 🛒 Retail Sales Prediction Project
### End-to-End Machine Learning Deployment


## 📌 Overview  
This project follows a complete **ML pipeline** from data preprocessing to model training and deployment using Flask & Docker.  

### 🔥 Steps Covered:  
1️⃣ Data Cleaning & Exploration  
2️⃣ Model Training & Evaluation  
3️⃣ Save the Model (`.pkl` file)  
4️⃣ Build a Flask API for Predictions  
5️⃣ Deploy using Docker 🚀  


In [None]:

# 📚 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import pickle


## 📂 Load & Explore Dataset

In [None]:

# Load Dataset
file_path = "Warehouse_and_Retail_Sales.csv"
df = pd.read_csv(file_path)

# Show dataset info
df.info(), df.head()


## 🧹 Data Cleaning

In [None]:

# Handle missing values
df["SUPPLIER"].fillna("UNKNOWN_SUPPLIER", inplace=True)
df["ITEM TYPE"].fillna("UNKNOWN_TYPE", inplace=True)
df["RETAIL SALES"].fillna(df["RETAIL SALES"].median(), inplace=True)

# Confirm no missing values
df.isnull().sum()


## 📊 Exploratory Data Analysis (EDA)

In [None]:

# Plot sales distributions
sns.set_style("whitegrid")
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.histplot(df["RETAIL SALES"], bins=50, kde=True, ax=axes[0], color="blue")
axes[0].set_title("Retail Sales Distribution")

sns.histplot(df["RETAIL TRANSFERS"], bins=50, kde=True, ax=axes[1], color="green")
axes[1].set_title("Retail Transfers Distribution")

sns.histplot(df["WAREHOUSE SALES"], bins=50, kde=True, ax=axes[2], color="red")
axes[2].set_title("Warehouse Sales Distribution")

plt.tight_layout()
plt.show()


## 🏗️ Feature Engineering & Model Training

In [None]:

# Feature Selection
features = ["YEAR", "MONTH", "SUPPLIER", "ITEM TYPE", "RETAIL TRANSFERS", "WAREHOUSE SALES"]
target = "RETAIL SALES"

# Encode categorical features
label_encoders = {}
for col in ["SUPPLIER", "ITEM TYPE"]:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

# Train-Test Split (10% for training due to memory constraints)
X = df[features]
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.9, random_state=42)

# Train Linear Regression Model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Model Evaluation
y_pred_lr = lr_model.predict(X_test)
mae_lr = mean_absolute_error(y_test, y_pred_lr)
mse_lr = mean_squared_error(y_test, y_pred_lr)
r2_lr = r2_score(y_test, y_pred_lr)

mae_lr, mse_lr, r2_lr


## 💾 Save Model for Deployment

In [None]:

# Save the trained model as a pickle file
model_filename = "retail_sales_model.pkl"
with open(model_filename, "wb") as file:
    pickle.dump(lr_model, file)

print(f"Model saved as {model_filename}")


## 🚀 Flask API for Model Deployment

In [None]:
'''

from flask import Flask, request, jsonify
import pickle
import numpy as np

# Load Model
with open("retail_sales_model.pkl", "rb") as file:
    model = pickle.load(file)

# Initialize Flask App
app = Flask(__name__)

@app.route("/")
def home():
    return "Retail Sales Prediction API is Running!"

@app.route("/predict", methods=["POST"])
def predict():
    data = request.get_json()
    features = np.array([
        data["YEAR"], data["MONTH"], data["SUPPLIER"], 
        data["ITEM TYPE"], data["RETAIL TRANSFERS"], data["WAREHOUSE SALES"]
    ]).reshape(1, -1)
    prediction = model.predict(features)[0]
    return jsonify({"predicted_retail_sales": round(prediction, 2)})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

'''

## 🐳 Dockerize Flask API

In [None]:
'''

# Use Python base image
FROM python:3.9

# Set working directory
WORKDIR /app

# Copy files to container
COPY . /app

# Install dependencies
RUN pip install flask numpy pandas scikit-learn

# Expose port
EXPOSE 5000

# Run Flask app
CMD ["python", "app.py"]

'''