StockSense is a machine learning-powered app that helps small and medium-sized businesses make data-driven inventory decisions by predicting product demand and recommending optimal restocking quantities.
Inventory mismanagement is one of the biggest causes of lost revenue for small businesses. StockSense addresses this by:
- Analyzing historical sales data
- Forecasting future product demand
- Generating actionable restocking recommendations
The goal is to reduce stockouts, minimize overstocking, and improve cash flow efficiency.
The dataset used for this project is a fictional dataset, generated by ChatGPT. The dataset aims to mimic real-world inventory dynamics and includes product details, stock levels, sales data, supplier performance, and restocking schedules. It contains 901 rows and 9 columns of data reflecting the inventory and sales patterns of a typical supermarket, spanning a timeframe from January 1, 2024, to June 28, 2024. The dataset is available in an csv format and can be accessed from the provided link.
- CSV Upload (sales data input)
- Sales Insights & Exploratory Analysis
- Demand Prediction (per product)
- Restock Recommendation Engine
- Lightweight and fast (Streamlit-based UI)
- The function for data loading was created and test in the notebook
- Unwanted columns removal function implemented
- Function for converting date to datetime format was implemented
- Check for duplication and null value
- The data was sorted by date and product ID
- Check for any negative value in the numeric features but none was found
- Quantity_sold_lag feature was added to the dataset, needed for the model for future sales demand prediction
- Some null values was discovered in the lag feature, and a function was created to drop them
- EDA was performed, and meaningful insights was gain from the correlation analysis for the numerical data most especially the lag features and the product quantity sold, which shows that yesterday sales partially influence today's sales.
- After the data cleaning, the dataset now consists of 895 rows and 7 columns
- The dataset was splitted into training, validation and testing set
- training set consists of 626 columns, validation set consists of 134 and testing set consists of 135 rows
- The dataset was further splitted into features and target variables across all the 3 splitted datasets
- The model will be train on the training set and the baseline testing will be done on the validation test, and the testing will not be used until after the tuning and final model training.
- The product name was dropped, while the product id was encoded, both give the same information and one has to be dropped to avoid redundancy
- The model was trained using the RandomForestRegressor model
- The initial baseline model was evaluated using the following regression metrics:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R² Score
-
The baseline prediction scores are:
- MAE - 2.8
- MSE - 12.4
- R2 Score - -0.07
-
Evaluation Summary
- The baseline model achieved an MAE of approximately 2.8 units, indicating that predictions deviate from actual sales by an average of about 3 units.
- The negative R² score suggests that the current model performs slightly below a naive baseline predictor that estimates the average sales value.
- While the baseline metrics are acceptable for an initial MVP prototype, the model currently has limited predictive power due to minimal feature engineering and the simplified nature of the dataset.
-
Future Improvements
- To improve forecasting performance, additional feature engineering and model optimization will be required, including:
- Additional lag-based features
- Rolling statistics (moving averages)
- Date-derived features (day of week, month, seasonality)
- Enhanced demand patterns and external business signals
An error analysis was conducted to evaluate how closely the model predictions aligned with actual sales values.
- The predicted sales values generally followed the same trend pattern as the actual sales data.
- The prediction curve exhibited a similar zig-zag movement to the actual demand series, indicating that the model was able to capture part of the underlying sales behavior.
- Although the model learned short-term demand patterns, it struggled with prediction precision and larger fluctuations in demand.
The error analysis suggests that the baseline model is capable of identifying general sales movement trends but still has limited forecasting accuracy. This behavior is expected at the MVP stage due to:
- limited feature engineering
- relatively small dataset size
- synthetic data generation
- absence of advanced temporal and seasonal features
The current model provides a functional baseline forecasting system suitable for MVP development. Further improvements in feature engineering and data realism are expected to improve model performance and predictive reliability.
-
Rolling mean Feature
-
A 7-day rolling mean feature (
rolling_mean_7) was introduced to capture short-term sales trends and recent demand behavior. -
The feature was generated using historical sales values grouped by product and shifted appropriately to prevent data leakage.
-
The rolling feature was intended to help the model learn:
- short-term demand trends
- smoother sales movement patterns
- recent purchasing behavior over multiple days
-
After retraining the model with the additional rolling feature, model performance declined across all evaluation metrics.
Metric Baseline Model With Rolling Feature MAE 2.81 2.98 MSE 12.41 12.96 R² Score -0.07 -0.21 -
The decrease in performance suggests that the rolling mean feature did not provide meaningful predictive information for the current dataset. Possible reasons include:
- limited dataset size
- weak temporal patterns in the synthetic data
- smoothing of useful short-term variations already captured by
lag_1
-
This experiment demonstrates that adding more features does not always improve model performance and highlights the importance of validating feature engineering decisions using evaluation metrics.
-
The rolling feature was removed from the final baseline model configuration, and the simpler feature set produced more stable forecasting performance for the MVP stage.
-
-
Days of the Week Features
-
A
day_of_weekfeature was extracted from thedatecolumn to capture weekly sales patterns. -
After adding the feature, the model performance improved slightly:
| Metric | Score |
|---|---|
| MAE | 2.75 |
| MSE | 12.5 |
| R² Score | -0.098 |
- The feature was retained in the final baseline model because it improved prediction performance compared to the previous model configuration.
Hyperparameter tuning was performed using GridSearchCV to improve the performance of the Random Forest Regressor model.
The tuning process optimized key model parameters such as:
n_estimatorsmax_depthmin_samples_splitmin_samples_leaf
| Metric | Score |
|---|---|
| MAE | 2.66 |
| MSE | 11.47 |
| R² Score | -0.006 |
After hyperparameter tuning:
- prediction error reduced further
- model performance improved across all evaluation metrics
- the R² score became positive, indicating that the model was able to explain part of the variation in sales demand
The tuned Random Forest model was selected as the final model configuration for the StockSense MVP.
A production-style machine learning pipeline was developed to automate the preprocessing, training, and prediction workflow of the StockSense demand forecasting system.
The production pipeline includes:
- automated feature engineering using pandas
- categorical feature encoding
- numerical preprocessing
- Random Forest model training
- hyperparameter tuning using
GridSearchCV
Custom time-series features such as lag features and day-of-week extraction were handled outside the sklearn pipeline to ensure stability and prevent transformation issues during training and inference.
| Metric | Score |
|---|---|
| MAE | 3.14 |
| MSE | 15.39 |
| R² Score | 0.062 |
The production pipeline successfully automated the end-to-end machine learning workflow while maintaining a positive R² score.
Although the error metrics slightly increased compared to manual experimentation, the pipeline provided:
- consistent preprocessing
- reusable training workflow
- improved maintainability
- production-ready architecture
The final pipeline serves as the foundation for future deployment and real-time prediction integration in the StockSense MVP.
The final trained pipeline was evaluated on an unseen test dataset to measure the model’s generalization performance and validate its readiness for deployment.
The test set was kept completely separate from the training and validation stages to ensure unbiased evaluation.
| Metric | Score |
|---|---|
| MAE | 2.79 |
| MSE | 12.22 |
| R² Score | 0.076 |
The model maintained stable performance on unseen data and achieved a positive R² score during final evaluation.
This indicates that the model was able to learn meaningful demand patterns without significant overfitting.
The final evaluation confirmed that the StockSense forecasting pipeline is suitable for MVP-level deployment and further production integration.
The final trained StockSense forecasting pipeline was serialized and packaged for production-level usage.
The complete trained pipeline, including preprocessing and the Random Forest model, was saved as a reusable artifact using joblib.
| Artifact | Description |
|---|---|
stocksense_v1.pkl |
Serialized trained forecasting pipeline |
metrics.json |
Final model evaluation metrics |
{ "Mean Squared Error": 12.218739274599537, "Mean Absolute Error": 2.794756301376071, "R2 Score": 0.07627316392652639 }
The final model pipeline was successfully packaged into a deployable artifact capable of:
- loading trained preprocessing logic
- performing automated prediction
- reusing consistent feature transformations
- supporting future deployment workflows
This stage completed the transition of the StockSense project from an experimental notebook workflow into a production-style machine learning system.
Basic unit and integration tests were introduced to validate the reliability and stability of the StockSense machine learning pipeline.
The testing phase focused on verifying the correctness of:
- feature engineering functions
- datetime conversion
- preprocessing workflow
- pipeline training
- prediction generation
The tests were designed to ensure that:
- engineered features are created successfully
- datetime columns are properly converted
- preprocessing functions return expected outputs
- the training pipeline fits without errors
- the trained model can generate predictions correctly
The project used pytest for automated testing and validation of the machine learning workflow.
The first set of tests covered:
add_engineered_features()- datetime validation
run_pipeline()- prediction workflow validation
The tests confirmed that the preprocessing and machine learning pipeline components were functioning correctly and producing consistent outputs.
This testing phase improved the reliability, maintainability, and production readiness of the StockSense forecasting system.
A Streamlit web application was developed to provide an interactive interface for the StockSense demand forecasting system.
The application was designed to simulate a real-world machine learning deployment workflow by allowing users to upload raw inventory datasets and automatically generate demand forecasts using the trained pipeline model.
The Streamlit application includes:
- CSV file upload functionality
- automatic data preprocessing
- trained model loading
- real-time demand prediction
- forecast result visualization
- downloadable prediction reports
The deployment workflow was separated into modular components for:
- preprocessing
- inference
- model loading
- prediction handling
This structure improved maintainability, scalability, and production readiness.
The deployed application used the serialized tuned pipeline model saved as a production artifact using joblib.
The inference workflow automatically handled:
- feature engineering
- categorical encoding
- preprocessing transformations
- prediction generation
without requiring manual intervention.
The Streamlit application successfully transformed the StockSense forecasting pipeline into a functional machine learning MVP capable of performing end-to-end demand forecasting through a user-friendly interface.
The final system demonstrated:
- automated inference workflow
- reusable production pipeline
- modular deployment architecture
- interactive forecasting capability
The recommendation module generates inventory decisions based on predicted product demand. It helps identify which products should be restocked, maintained, or reduced in supply.
The system returns a dataframe with:
product_idproduct_namepredicted_quantity_soldrecommendation
| product_id | product_name | predicted_quantity_sold | recommendation |
|---|---|---|---|
| P001 | Rice | 25 | Restock |
| P002 | Sugar | 10 | Maintain |
| P003 | Soap | 4 | Reduce Stock |
The application includes a robust data validation and mapping layer in the Streamlit interface to handle inconsistent dataset formats from different sources (CSV/Excel uploads). This ensures that the model receives standardized input regardless of variations in column naming conventions.
Different users or systems may provide datasets with varying column names such as:
- product, product_name, item_name
- qty, quantity, sales_qty
- date, transaction_date, time
These inconsistencies can break the ML pipeline and cause prediction errors.
- Validation Checks
Before passing data to the model, the system ensures:
- Required columns exist
- Data types are correct
- No missing critical fields (e.g., product_id, quantity_sold)
- Date columns are properly parsed
- Automatic Data Cleaning
The pipeline automatically:
- Converts date columns to datetime format
- Handles missing values where applicable
- Ensures numeric consistency for prediction features
- Removes or flags invalid rows
- Fail-Safe Error Handling
If required columns are missing or cannot be mapped, the app:
- Displays a clear error message in Streamlit
- Suggests the expected column format to the user
The first version of the forecasting model was trained using a basic feature set with:
- price
- stock_after
- categorical product features
Baseline Performance Score:
| Metric | Score |
|---|---|
| MAE | 2.79 |
| MSE | 12.22 |
| R² Score | 0.076 |
The negative R² score indicated that the model struggled to capture meaningful sales patterns using the initial feature configuration.
Additional engineered features were later introduced, including:
- lag features
- rolling statistics
- product average sales
- price ratio
- trend
- temporal features (e.g., day of week)
These improvements significantly enhanced the model’s ability to learn demand patterns. Updated Performance:
| Metric | Score |
|---|---|
| MAE | 2.61 |
| MSE | 10.26 |
| R² Score | 0.374 |
Feature engineering substantially improved model performance, particularly the R² score, demonstrating the importance of temporal and historical sales features in retail demand forecasting.
To further improve forecasting performance, the project was upgraded from the baseline Random Forest model to an XGBoost regression model.
The improved version incorporated additional engineered features such as:
- lag features,
- rolling statistics,
- and temporal sales features.
The trained model was serialized and saved as: stocksense_v2.pkl to support reusable inference and deployment within the application pipeline.
| Metric | Score |
|---|---|
| Mean Squared Error (MSE) | 76.88 |
| Mean Absolute Error (MAE) | 6.39 |
| R² Score | 0.558 |
The XGBoost model achieved a significantly stronger R² score compared to previous experiments, indicating a much better ability to capture underlying sales patterns within the dataset.
- The model explained approximately 55.8% of the variance in product demand.
- The updated dataset introduced stronger retail demand patterns and variability, enabling the model to learn more meaningful relationships.
- Feature engineering significantly improved the model’s ability to capture historical sales behavior and temporal trends.
Although the error metrics (MAE and MSE) increased numerically due to the more complex and realistic dataset distribution, the higher R² score indicates that the model generalized much better to broader retail demand patterns.
| Version | Dataset | Model | Key Improvement | R² Score |
|---|---|---|---|---|
| V1 | Initial synthetic dataset | Random Forest | Basic preprocessing pipeline | -0.006 |
| V1.1 | Initial synthetic dataset | Random Forest | Added engineered temporal features | 0.374 |
| V2 | Improved retail simulation dataset | XGBoost | Enhanced dataset realism + engineered features | 0.558 |
The trained XGBoost model was exported and versioned as:
artifacts/models/stocksense_v2.pkl
This supports:
- reusable inference,
- deployment-ready prediction workflow,
- and model version tracking across experiments.
The transition to a more realistic retail sales dataset, combined with advanced feature engineering and the XGBoost model, produced the strongest forecasting performance achieved during development.
This experiment highlighted the importance of:
- realistic retail demand simulation,
- iterative model experimentation,
- and structured feature engineering in inventory forecasting systems.
The current XGBoost model serves as the primary forecasting engine powering the StockSense MVP recommendation system.
This stage of the StockSense project focused on improving the forecasting model performance through better temporal feature engineering, correction of grouping logic, and comparative evaluation of ensemble machine learning models.
The improvements made during this phase significantly increased the forecasting capability of the system and resolved several important modeling issues discovered during experimentation.
Initial Problem The original forecasting pipeline used: product_id for:
- lag feature generation
- rolling statistics
- temporal grouping
Example: df.groupby("product_id")
However, during dataset analysis, it was discovered that:
each row contained a unique product_id
even for the same product.
This caused a major issue because:
- lag features could not capture historical continuity
- rolling windows became ineffective
- temporal forecasting patterns were broken
Effectively, each product row was treated as an isolated observation rather than part of a historical product sequence.
The grouping logic was redesigned to use:
product_name
instead of:
product_id
Updated grouping example:
df.groupby("product_name")
This correction restored:
- proper historical continuity
- valid lag calculations
- meaningful rolling window statistics
- product-level temporal learning
| Metric | Score |
|---|---|
| MAE | 6.32 |
| MSE | 74.63 |
| R² Score | 0.637 |
- Data collection and loading
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Data splitting
- Feature engineering
- Baseline modeling
- Error analysis
- hyperparameter tuning and model optimization
- Final evaluation (test set)
- Build full preprocessing + model pipeline
- Artifact creation
- Unit testing
- Model packaging and deployment readiness
- Monitoring and maintenance plan
🚧 MVP in active development
This is currently a personal MVP project. Contributions, ideas, and feedback are welcome.
This project is open-source and available under the MIT License.
Built by Issa Muiz Machine Learning & Data Science Enthusiast