<a href="https://colab.research.google.com/github/HarbdhulQuadri/IndabaX2025-Deploying-with-streamlit-demo/blob/main/nigeria_sme_sales_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting Daily Sales for Nigerian SMEs using Real Data + Streamlit

## Step 1:  Install Required Packages



In [39]:
!pip install --quiet kagglehub pandas scikit-learn joblib openpyxl


##  Step 2: Download Dataset from Kaggle

In [41]:
import kagglehub
import os

# Download dataset from public Kaggle repo
dataset_path = kagglehub.dataset_download("babajidedairo/nigerian-ecommerce-sales-dataset")

# Confirm download
print(" Dataset path:", dataset_path)
print(" Files inside:", os.listdir(dataset_path))


 Dataset path: /kaggle/input/nigerian-ecommerce-sales-dataset
 Files inside: ['Nigerian E-Commerce Dataset.xlsx']


##  Step 3: Load Excel Dataset



In [42]:
import pandas as pd

# Define exact file path
file_path = os.path.join(dataset_path, "Nigerian E-Commerce Dataset.xlsx")

# Load .xlsx file using openpyxl engine
df = pd.read_excel(file_path, engine="openpyxl")

# Preview raw data
df.head()


Unnamed: 0,Order ID,Branch Location,Branch Name,Business Name,Is Deleted,Item ID,Item Name,Item Price,Order Item Number,Item Status,Packed Quantity,Quantity,Total Price,Order Date,Order Region,Order Local Area
0,4672,Lagos,Generic Store,Generic Stores,False,60a7b0242498ec1dd380508c,Golden Penny Spaghetti - 500g,4950.0,MLPLOCN1FAHUIYK50S0W9YUQ,Cancelled,1,1,4950.0,2021-05-31,Lagos,Ifako-Ijaye
1,4672,Lagos,Multipro Consumer Product Limited,MUL,False,6076c792a6000742949a819c,DANO COOLCOW SACHET - 12X380g,3392.75,ML1DN3SZT8R02DKKNKBLXDXA,Cancelled,2,2,6785.5,2021-05-31,Lagos,Ifako-Ijaye
2,4671,Lagos,Multipro Consumer Product Limited,MUL,False,6076c792a6000742949a819c,DANO COOLCOW SACHET - 12X380g,3392.75,ML2UMJU6I2P0O958PKZ9AMDQ,Cancelled,1,1,3392.75,2021-05-31,Lagos,Ifako-Ijaye
3,4670,Lagos,TDILIFE,TDILIFE,False,608045d069c51b4e80e70343,HOLLANDIA EVAP MILK FULL CREAM 60g X 48,3370.0,MLDFDZKVPFV0SHDGGA2KFNRG,Delivered,1,1,3370.0,2021-05-31,Lagos,Ifako-Ijaye
4,4670,Lagos,TDILIFE,TDILIFE,False,608042a469c51b4e80e702f7,HOLLANDIA EVAP MILK FULL CREAM 190g X 24,4845.0,MLFLBFFM0O5UAS0MROFAL0QA,Cancelled,1,1,4845.0,2021-05-31,Lagos,Ifako-Ijaye


##  Step 4: Clean & Engineer Features

In [43]:
# Simplify column names
df.columns = df.columns.str.strip().str.replace(" ", "_").str.lower()

# Parse dates
df['order_date'] = pd.to_datetime(df['order_date'])

# Drop rows with missing or invalid sales data
df = df.dropna(subset=['order_date', 'quantity', 'item_price'])
df = df[df['item_price'] > 0]
df = df[df['quantity'] > 0]

# Feature engineering
df['day'] = df['order_date'].dt.day
df['month'] = df['order_date'].dt.month
df['is_weekend'] = (df['order_date'].dt.weekday >= 5).astype(int)
df['promo_day'] = df['day'].apply(lambda x: 1 if x in [1, 15, 30] else 0)
df['weather_encoded'] = df['day'] % 3  # 0=Sunny, 1=Cloudy, 2=Rainy

# Target variable
df['total_sales'] = df['quantity'] * df['item_price']

# Final columns preview
df[['day', 'month', 'is_weekend', 'promo_day', 'weather_encoded', 'total_sales']].head()


Unnamed: 0,day,month,is_weekend,promo_day,weather_encoded,total_sales
0,31,5,0,0,1,4950.0
1,31,5,0,0,1,6785.5
2,31,5,0,0,1,3392.75
3,31,5,0,0,1,3370.0
4,31,5,0,0,1,4845.0


##  5: Define Features & Target



In [45]:
X = df[['day', 'month', 'is_weekend', 'promo_day', 'weather_encoded']]
y = df['total_sales']


## 6: Train the Model

In [47]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train RandomForest
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate model
y_pred = model.predict(X_test)
print(" MAE:", mean_absolute_error(y_test, y_pred))
print(" R² Score:", r2_score(y_test, y_pred))


 MAE: 618261.2168149918
 R² Score: -0.11574686424655223


##  Step 6: Export Model for Streamlit App

In [49]:
import joblib

# Save model
joblib.dump(model, "model.pkl")
print(" model.pkl saved — ready for Streamlit")


 model.pkl saved — ready for Streamlit


In [51]:
# Create sample input based on form fields
sample_input = pd.DataFrame([{
    'day': 15,
    'month': 7,
    'is_weekend': 0,
    'promo_day': 1,
    'weather_encoded': 2  # Rainy
}])

sample_input.to_csv("sample_form_input.csv", index=False)
print(" sample_form_input.csv created")


 sample_form_input.csv created


## ✅ Next Steps

You now have:
- `model.pkl`: Ready for prediction in Streamlit
- `sme_sales_data.csv`: Sample CSV to test

Head to [https://share.streamlit.io](https://share.streamlit.io) and deploy your app using these files.

We'll now move to building the Streamlit app together.