StockSense

StockSense is a machine learning-powered app that helps small and medium-sized businesses make data-driven inventory decisions by predicting product demand and recommending optimal restocking quantities.

Overview

Inventory mismanagement is one of the biggest causes of lost revenue for small businesses. StockSense addresses this by:

Analyzing historical sales data
Forecasting future product demand
Generating actionable restocking recommendations

The goal is to reduce stockouts, minimize overstocking, and improve cash flow efficiency.

Dataset

The dataset used for this project is a fictional dataset, generated by ChatGPT. The dataset aims to mimic real-world inventory dynamics and includes product details, stock levels, sales data, supplier performance, and restocking schedules. It contains 901 rows and 9 columns of data reflecting the inventory and sales patterns of a typical supermarket, spanning a timeframe from January 1, 2024, to June 28, 2024. The dataset is available in an csv format and can be accessed from the provided link.

Features

CSV Upload (sales data input)
Sales Insights & Exploratory Analysis
Demand Prediction (per product)
Restock Recommendation Engine
Lightweight and fast (Streamlit-based UI)

Current Progress

1. Data Cleaning and Processing

The function for data loading was created and test in the notebook
Unwanted columns removal function implemented
Function for converting date to datetime format was implemented
Check for duplication and null value
The data was sorted by date and product ID
Check for any negative value in the numeric features but none was found
Quantity_sold_lag feature was added to the dataset, needed for the model for future sales demand prediction
Some null values was discovered in the lag feature, and a function was created to drop them

2. EDA (Exploratory Data Analysis)

EDA was performed, and meaningful insights was gain from the correlation analysis for the numerical data most especially the lag features and the product quantity sold, which shows that yesterday sales partially influence today's sales.

3. Data Splitting

After the data cleaning, the dataset now consists of 895 rows and 7 columns
The dataset was splitted into training, validation and testing set
training set consists of 626 columns, validation set consists of 134 and testing set consists of 135 rows
The dataset was further splitted into features and target variables across all the 3 splitted datasets
The model will be train on the training set and the baseline testing will be done on the validation test, and the testing will not be used until after the tuning and final model training.

4. Model Training

The product name was dropped, while the product id was encoded, both give the same information and one has to be dropped to avoid redundancy
The model was trained using the RandomForestRegressor model

5. Prediction

The initial baseline model was evaluated using the following regression metrics:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R² Score

The baseline prediction scores are:
- MAE - 2.8
- MSE - 12.4
- R2 Score - -0.07
Evaluation Summary
- The baseline model achieved an MAE of approximately 2.8 units, indicating that predictions deviate from actual sales by an average of about 3 units.
- The negative R² score suggests that the current model performs slightly below a naive baseline predictor that estimates the average sales value.
- While the baseline metrics are acceptable for an initial MVP prototype, the model currently has limited predictive power due to minimal feature engineering and the simplified nature of the dataset.
Future Improvements
- To improve forecasting performance, additional feature engineering and model optimization will be required, including:
- Additional lag-based features
- Rolling statistics (moving averages)
- Date-derived features (day of week, month, seasonality)
- Enhanced demand patterns and external business signals

6. Error Analysis

An error analysis was conducted to evaluate how closely the model predictions aligned with actual sales values.

Key Observations

The predicted sales values generally followed the same trend pattern as the actual sales data.
The prediction curve exhibited a similar zig-zag movement to the actual demand series, indicating that the model was able to capture part of the underlying sales behavior.
Although the model learned short-term demand patterns, it struggled with prediction precision and larger fluctuations in demand.

The error analysis suggests that the baseline model is capable of identifying general sales movement trends but still has limited forecasting accuracy. This behavior is expected at the MVP stage due to:

limited feature engineering
relatively small dataset size
synthetic data generation
absence of advanced temporal and seasonal features

The current model provides a functional baseline forecasting system suitable for MVP development. Further improvements in feature engineering and data realism are expected to improve model performance and predictive reliability.

7. Features Engineering

Rolling mean Feature
- A 7-day rolling mean feature (rolling_mean_7) was introduced to capture short-term sales trends and recent demand behavior.
- The feature was generated using historical sales values grouped by product and shifted appropriately to prevent data leakage.
- The rolling feature was intended to help the model learn:
  - short-term demand trends
  - smoother sales movement patterns
  - recent purchasing behavior over multiple days
- After retraining the model with the additional rolling feature, model performance declined across all evaluation metrics.
Metric Baseline Model With Rolling Feature

MAE 2.81 2.98

MSE 12.41 12.96

R² Score -0.07 -0.21
- The decrease in performance suggests that the rolling mean feature did not provide meaningful predictive information for the current dataset. Possible reasons include:
  - limited dataset size
  - weak temporal patterns in the synthetic data
  - smoothing of useful short-term variations already captured by lag_1
- This experiment demonstrates that adding more features does not always improve model performance and highlights the importance of validating feature engineering decisions using evaluation metrics.
- The rolling feature was removed from the final baseline model configuration, and the simpler feature set produced more stable forecasting performance for the MVP stage.
Days of the Week Features

A day_of_week feature was extracted from the date column to capture weekly sales patterns.
After adding the feature, the model performance improved slightly:

Metric	Score
MAE	2.75
MSE	12.5
R² Score	-0.098

The feature was retained in the final baseline model because it improved prediction performance compared to the previous model configuration.

8. Hyperparameter Tuning

Hyperparameter tuning was performed using GridSearchCV to improve the performance of the Random Forest Regressor model.

The tuning process optimized key model parameters such as:

n_estimators
max_depth
min_samples_split
min_samples_leaf

Tuned Model Performance

Metric	Score
MAE	2.66
MSE	11.47
R² Score	-0.006

After hyperparameter tuning:

prediction error reduced further
model performance improved across all evaluation metrics
the R² score became positive, indicating that the model was able to explain part of the variation in sales demand

The tuned Random Forest model was selected as the final model configuration for the StockSense MVP.

9. Production Pipeline Development

A production-style machine learning pipeline was developed to automate the preprocessing, training, and prediction workflow of the StockSense demand forecasting system.

The production pipeline includes:

automated feature engineering using pandas
categorical feature encoding
numerical preprocessing
Random Forest model training
hyperparameter tuning using GridSearchCV

Custom time-series features such as lag features and day-of-week extraction were handled outside the sklearn pipeline to ensure stability and prevent transformation issues during training and inference.

Pipeline Model Performance

Metric	Score
MAE	3.14
MSE	15.39
R² Score	0.062

The production pipeline successfully automated the end-to-end machine learning workflow while maintaining a positive R² score.

Although the error metrics slightly increased compared to manual experimentation, the pipeline provided:

consistent preprocessing
reusable training workflow
improved maintainability
production-ready architecture

The final pipeline serves as the foundation for future deployment and real-time prediction integration in the StockSense MVP.

10. Final Model Testing

The final trained pipeline was evaluated on an unseen test dataset to measure the model’s generalization performance and validate its readiness for deployment.

The test set was kept completely separate from the training and validation stages to ensure unbiased evaluation.

Metric Score

Metric	Score
MAE	2.79
MSE	12.22
R² Score	0.076

The model maintained stable performance on unseen data and achieved a positive R² score during final evaluation.

This indicates that the model was able to learn meaningful demand patterns without significant overfitting.

The final evaluation confirmed that the StockSense forecasting pipeline is suitable for MVP-level deployment and further production integration.

11. Model Serialization and Artifact Packaging

The final trained StockSense forecasting pipeline was serialized and packaged for production-level usage.

The complete trained pipeline, including preprocessing and the Random Forest model, was saved as a reusable artifact using joblib.

Saved Artifacts

Artifact	Description
`stocksense_v1.pkl`	Serialized trained forecasting pipeline
`metrics.json`	Final model evaluation metrics

Full Production Metrics

{ "Mean Squared Error": 12.218739274599537, "Mean Absolute Error": 2.794756301376071, "R2 Score": 0.07627316392652639 }

The final model pipeline was successfully packaged into a deployable artifact capable of:

loading trained preprocessing logic
performing automated prediction
reusing consistent feature transformations
supporting future deployment workflows

This stage completed the transition of the StockSense project from an experimental notebook workflow into a production-style machine learning system.

12. Test Development

Basic unit and integration tests were introduced to validate the reliability and stability of the StockSense machine learning pipeline.

The testing phase focused on verifying the correctness of:

feature engineering functions
datetime conversion
preprocessing workflow
pipeline training
prediction generation

The tests were designed to ensure that:

engineered features are created successfully
datetime columns are properly converted
preprocessing functions return expected outputs
the training pipeline fits without errors
the trained model can generate predictions correctly

The project used pytest for automated testing and validation of the machine learning workflow.

The first set of tests covered:

add_engineered_features()
datetime validation
run_pipeline()
prediction workflow validation

The tests confirmed that the preprocessing and machine learning pipeline components were functioning correctly and producing consistent outputs.

This testing phase improved the reliability, maintainability, and production readiness of the StockSense forecasting system.

13. Streamlit Application Development & Deployment Preparation

A Streamlit web application was developed to provide an interactive interface for the StockSense demand forecasting system.

The application was designed to simulate a real-world machine learning deployment workflow by allowing users to upload raw inventory datasets and automatically generate demand forecasts using the trained pipeline model.

Application Features

The Streamlit application includes:

CSV file upload functionality
automatic data preprocessing
trained model loading
real-time demand prediction
forecast result visualization
downloadable prediction reports

Production Architecture

The deployment workflow was separated into modular components for:

preprocessing
inference
model loading
prediction handling

This structure improved maintainability, scalability, and production readiness.

Model Inference

The deployed application used the serialized tuned pipeline model saved as a production artifact using joblib.

The inference workflow automatically handled:

feature engineering
categorical encoding
preprocessing transformations
prediction generation

without requiring manual intervention.

The Streamlit application successfully transformed the StockSense forecasting pipeline into a functional machine learning MVP capable of performing end-to-end demand forecasting through a user-friendly interface.

The final system demonstrated:

automated inference workflow
reusable production pipeline
modular deployment architecture
interactive forecasting capability

Recommendation System

The recommendation module generates inventory decisions based on predicted product demand. It helps identify which products should be restocked, maintained, or reduced in supply.

The system returns a dataframe with:

product_id
product_name
predicted_quantity_sold
recommendation

Example Output

product_id	product_name	predicted_quantity_sold	recommendation
P001	Rice	25	Restock
P002	Sugar	10	Maintain
P003	Soap	4	Reduce Stock

14 Data Input Validation & Dynamic Column Mapping

The application includes a robust data validation and mapping layer in the Streamlit interface to handle inconsistent dataset formats from different sources (CSV/Excel uploads). This ensures that the model receives standardized input regardless of variations in column naming conventions.

Different users or systems may provide datasets with varying column names such as:

product, product_name, item_name
qty, quantity, sales_qty
date, transaction_date, time

These inconsistencies can break the ML pipeline and cause prediction errors.

Validation Checks

Before passing data to the model, the system ensures:

Required columns exist
Data types are correct
No missing critical fields (e.g., product_id, quantity_sold)
Date columns are properly parsed

Automatic Data Cleaning

The pipeline automatically:

Converts date columns to datetime format
Handles missing values where applicable
Ensures numeric consistency for prediction features
Removes or flags invalid rows

Fail-Safe Error Handling

If required columns are missing or cannot be mapped, the app:

Displays a clear error message in Streamlit
Suggests the expected column format to the user

15 Model Iteration & Improvement

Initial Baseline Model

The first version of the forecasting model was trained using a basic feature set with:

price
stock_after
categorical product features

Baseline Performance Score:

Metric	Score
MAE	2.79
MSE	12.22
R² Score	0.076

The negative R² score indicated that the model struggled to capture meaningful sales patterns using the initial feature configuration.

Feature Engineering Enhancement

Additional engineered features were later introduced, including:

lag features
rolling statistics
product average sales
price ratio
trend
temporal features (e.g., day of week)

These improvements significantly enhanced the model’s ability to learn demand patterns. Updated Performance:

Metric	Score
MAE	2.61
MSE	10.26
R² Score	0.374

Feature engineering substantially improved model performance, particularly the R² score, demonstrating the importance of temporal and historical sales features in retail demand forecasting.

XGBoost Model Evaluation (Version 2)

To further improve forecasting performance, the project was upgraded from the baseline Random Forest model to an XGBoost regression model.

The improved version incorporated additional engineered features such as:

lag features,
rolling statistics,
and temporal sales features.

The trained model was serialized and saved as: stocksense_v2.pkl to support reusable inference and deployment within the application pipeline.

XGBoost Performance Results

Metric	Score
Mean Squared Error (MSE)	76.88
Mean Absolute Error (MAE)	6.39
R² Score	0.558

Performance Interpretation

The XGBoost model achieved a significantly stronger R² score compared to previous experiments, indicating a much better ability to capture underlying sales patterns within the dataset.

Key observations:

The model explained approximately 55.8% of the variance in product demand.
The updated dataset introduced stronger retail demand patterns and variability, enabling the model to learn more meaningful relationships.
Feature engineering significantly improved the model’s ability to capture historical sales behavior and temporal trends.

Although the error metrics (MAE and MSE) increased numerically due to the more complex and realistic dataset distribution, the higher R² score indicates that the model generalized much better to broader retail demand patterns.

Model Development Progression

Version	Dataset	Model	Key Improvement	R² Score
V1	Initial synthetic dataset	Random Forest	Basic preprocessing pipeline	-0.006
V1.1	Initial synthetic dataset	Random Forest	Added engineered temporal features	0.374
V2	Improved retail simulation dataset	XGBoost	Enhanced dataset realism + engineered features	0.558

Artifact Management

The trained XGBoost model was exported and versioned as:

artifacts/models/stocksense_v2.pkl

This supports:

reusable inference,
deployment-ready prediction workflow,
and model version tracking across experiments.

The transition to a more realistic retail sales dataset, combined with advanced feature engineering and the XGBoost model, produced the strongest forecasting performance achieved during development.

This experiment highlighted the importance of:

realistic retail demand simulation,
iterative model experimentation,
and structured feature engineering in inventory forecasting systems.

The current XGBoost model serves as the primary forecasting engine powering the StockSense MVP recommendation system.

StockSense v3 — Model Improvement Progress Documentation

This stage of the StockSense project focused on improving the forecasting model performance through better temporal feature engineering, correction of grouping logic, and comparative evaluation of ensemble machine learning models.

The improvements made during this phase significantly increased the forecasting capability of the system and resolved several important modeling issues discovered during experimentation.

Product Grouping Logic Correction

Initial Problem The original forecasting pipeline used: product_id for:

lag feature generation
rolling statistics
temporal grouping

Example: df.groupby("product_id")

However, during dataset analysis, it was discovered that:

each row contained a unique product_id

even for the same product.

This caused a major issue because:

lag features could not capture historical continuity
rolling windows became ineffective
temporal forecasting patterns were broken

Effectively, each product row was treated as an isolated observation rather than part of a historical product sequence.

Solution Implemented

The grouping logic was redesigned to use:

product_name

instead of:

product_id

Updated grouping example:

df.groupby("product_name")

This correction restored:

proper historical continuity
valid lag calculations
meaningful rolling window statistics
product-level temporal learning

XGboost Model Performance (Version 3)

Metric	Score
MAE	6.32
MSE	74.63
R² Score	0.637

Project Steps

Data collection and loading
Data cleaning and preprocessing
Exploratory data analysis (EDA)
Data splitting
Feature engineering
Baseline modeling
Error analysis
hyperparameter tuning and model optimization
Final evaluation (test set)
Build full preprocessing + model pipeline
Artifact creation
Unit testing
Model packaging and deployment readiness
Monitoring and maintenance plan

Development Status

🚧 MVP in active development

Contribution

This is currently a personal MVP project. Contributions, ideas, and feedback are welcome.

License

This project is open-source and available under the MIT License.

👤 Author

Built by Issa Muiz Machine Learning & Data Science Enthusiast

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
app		app
artifacts		artifacts
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
set.cfg		set.cfg
setup.py		setup.py

Metric	Baseline Model	With Rolling Feature
MAE	2.81	2.98
MSE	12.41	12.96
R² Score	-0.07	-0.21

Folders and files

Latest commit

History

Repository files navigation

StockSense

Overview

Dataset

Features

Current Progress

1. Data Cleaning and Processing

2. EDA (Exploratory Data Analysis)

3. Data Splitting

4. Model Training

5. Prediction

6. Error Analysis

Key Observations

7. Features Engineering

8. Hyperparameter Tuning

Tuned Model Performance

9. Production Pipeline Development

Pipeline Model Performance

10. Final Model Testing

Metric Score

11. Model Serialization and Artifact Packaging

Saved Artifacts

Full Production Metrics

12. Test Development

13. Streamlit Application Development & Deployment Preparation

Application Features

Production Architecture

Model Inference

Recommendation System

Example Output

14 Data Input Validation & Dynamic Column Mapping

15 Model Iteration & Improvement

Initial Baseline Model

Feature Engineering Enhancement

XGBoost Model Evaluation (Version 2)

XGBoost Performance Results

Performance Interpretation

Key observations:

Model Development Progression

Artifact Management

StockSense v3 — Model Improvement Progress Documentation

Product Grouping Logic Correction

Solution Implemented

XGboost Model Performance (Version 3)

Project Steps

Development Status

Contribution

License

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages