# Inventory Forecasting for Kikka Group Ltd.

Kikka Group Ltd. is an internationally recognized B2B provider of stylish, functional, and affordable baby products. As the company scales across 70+ countries, it faces a growing challenge in managing stock effectively.

Long production lead times, seasonal demand shifts, and product diversity make **inventory forecasting a critical task**. Historically, this has been handled manually by the CEO and product managers — a process that is time-consuming, error-prone, and increasingly unsustainable at scale.

To address these challenges, this project introduces a **machine learning-driven demand prediction system**, aimed at improving inventory decisions through:

- Accurate monthly sales forecasting
- Clear visualizations and error diagnostics
- Explainable AI to build trust with decision-makers
- Ethical safeguards to ensure human oversight

---

### Notebook Focus: Summary and Key insights gained from the project

This notebook focuses specifically on **_[EDA / Machine Learning / Explainability]_**, showcasing how data science can solve real operational problems in a dynamic business environment.

## What Does the Model Do?

The purpose of this machine learning model is to **predict the future monthly sales quantity** for each baby stroller product sold by Kikka Group Ltd. This prediction supports strategic inventory planning by answering the core operational question:

> “**How many units of each product should we reorder for next month?**”

Instead of relying solely on human judgment or past averages, the model analyzes a variety of inputs — recent sales trends, product type, time of year, and product lifecycle stage — to forecast demand. These insights are then used to recommend whether a product should be reordered and in what quantity.

The end goal is to:
- Avoid **stockouts** of fast-selling products (preventing lost sales)
- Minimize **overstocking** of slow-movers (reducing storage and write-off costs)
- Free up decision-makers from manual sales reviews, enabling **faster and more reliable restocking decisions**

---

## Data Source: ERP.net System

All input data used for training the model was extracted from **ERP.net**, Kikka Group’s centralized enterprise resource planning system. This platform contains detailed, time-stamped records of:

- Product sales by SKU and product group
- Inventory levels at each warehouse
- Documented invoices (used to verify actual completed sales)
- Seasonal purchasing behavior over a 2-year historical period

Direct access to ERP.net was granted by the company's leadership, allowing the development team to work with **real operational data** rather than artificial or academic datasets. This ensures the model is grounded in actual business workflows and reflects the **true complexity of Kikka Group’s B2B environment**.

By building a forecasting model on top of ERP data, we bridge the gap between raw business information and data-driven decision-making — a foundational step toward digital transformation at Kikka Group.

Additionally the system did not have any missing or unreliable data, which also proved to be helpful.

-------

##  Key Insights: What Really Affects Demand?

Through a combination of Exploratory Data Analysis (EDA) and model interpretation techniques, we identified the most influential factors driving demand predictions for baby stroller products. These insights were not just statistically relevant — they were also **aligned with domain knowledge from Kikka Group’s product managers and sales patterns**.

###  1. **Recent Sales Trends (Lag Features & Rolling Averages)**
Historical sales in the last 1–3 months proved to be **strong predictors of future demand**, especially for stable, well-established products. 
- `Lag_1`, `Lag_2`, `Lag_3`: Capture short-term momentum.
- `Rolling_Mean_3`, `Rolling_Mean_6`: Smooth out seasonal fluctuations.

These features helped the model “remember” demand cycles and adapt to temporary dips or surges.


###  2. **Seasonality & Calendar Effects**
Certain months — especially **November** — showed consistent spikes due to events like **Black Friday** and holiday preparations. 
- `Month_Num` and a binary `Is_November` flag were added to account for this.

This allows the model to **expect surges** even if the recent months were slower.


###  3. **Product Group Type**
The `Product Group Label`, such as *“Bebeshki letni kolichki”* or *“3 v 1 Kombinirani kolichki”*, was critical for understanding baseline demand levels.
- Some groups are inherently more popular or seasonal.
- Others are linked to specific customer segments or regions.

Encoding this correctly helped the model distinguish between product lines.


###  4. **Lifecycle Stage Flags**
We introduced business logic flags:
- `Is_New`: New products typically have **unpredictable early demand**.
- `Is_Discontinued`: Old products should show a **decline** and require caution before restocking.
- `Months_Since_Last_Sale`: Helps filter out inactive SKUs.

These features prevent the model from wrongly forecasting demand for **obsolete or just-launched products**.


###  5. **Belongs to Top 3 Group**
Some product groups consistently generate the most revenue. This boolean flag was designed to **give extra weight to top-performing lines**, helping the model focus attention where it matters most.


###  Summary
> The model doesn’t just predict demand based on raw numbers — it reflects how real-world operational and marketing factors influence product movement. By combining time series features, calendar intelligence, product metadata, and business rules, we achieved a more context-aware and realistic forecasting system.

## Building the Prediction Model: From Raw Data to Forecasts

Once the data was cleaned and key business signals were engineered, we proceeded to train a machine learning model capable of forecasting monthly sales quantities. The goal was not just accuracy, but also **interpretability**, **robustness to outliers**, and **alignment with real business constraints**.

---

### Step-by-Step Modeling Process

1. **Train-Test Split**  
   The dataset was split into a training and testing set using a random 80/20 ratio. This ensures that the model is evaluated on **unseen data**, simulating future months.

2. **Model Selection: Random Forest Regressor**  
   We selected **Random Forest** as our primary algorithm because:
   - It handles **non-linear relationships** well.
   - It is robust to **outliers** and **missing values**.
   - It provides **feature importance** out of the box.
   - It performs well even without aggressive hyperparameter tuning.

3. **Target Variable**  
   The model predicts the **exact number of units sold per product per month**. This continuous value allows downstream systems to determine **how many units to order**, not just whether to order or not.

4. **Features Used**  
   The model was trained on a range of inputs:
   - Time-based: `Lag_1`, `Rolling_Mean_3`, `Month_Num`
   - Product-based: `Product Group Label`, `Is_New`, `Is_Discontinued`
   - Metadata: `Current_Stock`, `Group_Code_Encoded`, `Belongs_to_Top_3`

---

### Model Evaluation

We used **two main metrics** to evaluate performance:

- **R² (R-squared):** How much of the variance in actual sales is explained by the model.
- **MAE (Mean Absolute Error):** On average, how far off the predictions are.

These were chosen because:
- R² gives us a high-level view of performance.
- MAE gives us a business-relevant sense of “how wrong” the forecasts are.

In Iteration 2, our Random Forest model achieved:

- **R² = 0.30** — capturing general sales trends, though not yet perfect.
- **MAE ≈ [insert value here]** — reasonable error tolerance given the product variety and seasonality.

---

### Output Format

The final prediction output is a **real number** (e.g., 114.6 units). This allows for rounding and business adjustment. In our Gradio demo, we also include:

- **Prediction Uncertainty (±)**: Standard deviation across all trees in the forest, helping managers understand confidence levels.

---

## Summary

This model provides a realistic, explainable, and business-informed way to forecast monthly demand across Kikka Group's product lines. While Iteration 2 focused on internal features, future versions may integrate **external factors** like pricing, competition, and holiday calendars for even better accuracy.

## Ethical Consideration

As with any AI-driven system, the predictions generated by this model should be used to support—not replace—human decision-making. While the model can identify patterns and suggest restocking quantities, it is not infallible, especially when facing new products, market shifts, or incomplete data. To ensure responsible use, product managers retain full control and can override any forecast when necessary. Additionally, the model provides an uncertainty estimate with each prediction to signal when confidence is low. This approach balances the efficiency of automation with the judgment of experienced professionals, ensuring that ethical standards and business context remain central to every inventory decision.

An important ethical concern in our project is the potential for bias toward high-selling or well-established products. Since the model learns from past sales, it may systematically under-prioritize newer items or products with less historical data—especially those targeted at niche or emerging markets. In a global B2B business like Kikka Group, this could lead to unequal product availability across regions, disadvantaging some distributors or limiting choice for customers in less represented markets. To address this, we flag low-data products for manual review and recommend that model predictions are always balanced with strategic and regional insights from product managers.


## Recommendations to client

As we prepare this model for operational use, it’s important to reflect on its practical strengths and limitations. Below are key pros and cons, followed by our recommendations for implementation.

### Pros
- Built on **real ERP sales data**, ensuring full business relevance.
- Accounts for **seasonality, lifecycle stage, and product types**, not just raw sales.
- Provides a **confidence range** (±) alongside each prediction, supporting better risk judgment.
- Designed with **business user interaction in mind** (Gradio UI, override logic).
- Model can be updated continuously as more sales data becomes available.

### Cons / Limitations
- Predictions for **new or rarely sold products** are less reliable due to limited historical data.
- External events like **promotions or shipping delays** are not yet incorporated.
- Some forecasts may **overfit past behavior** without recognizing market shifts or upcoming trends.

---

### Recommendations to Kikka Group

- Use the model as a **decision support tool**, not a fully automated ordering system — always allow manager overrides.
- Plan to **enrich future versions** with external data: promotions, competitor trends, holiday calendars.
- Monitor model performance quarterly and **retrain periodically** as inventory behavior evolves.
- Provide basic model literacy training to product managers to build trust and understanding.
- Consider gradually integrating the model into **ERP or BI tools** to streamline access and adoption.

While this project successfully delivered a working forecasting model grounded in real ERP data, its continued development should be considered carefully. In my professional opinion, even with additional investment in time, data integration, and tooling, the model may not consistently outperform the current manual system in place — a system based on the experience and strategic discussion between product managers and the CEO. Human intuition, cross-departmental insights, and market awareness still play a critical role in demand planning at Kikka Group.

That said, the model holds clear value as a **decision support tool**. It can automate routine analysis, flag anomalies, and provide a structured baseline that managers can challenge or refine. For companies aiming to scale operations or reduce dependency on a few key individuals, such tools are worth exploring — but should not be expected to fully replace collaborative, experience-based decision-making. If continued, this project should focus on **blending AI insight with human expertise**, not replacing it.

## Final Conclusion + Self Reflection

This project successfully delivered a functioning machine learning predictor that estimates monthly sales quantities for baby stroller products at Kikka Group Ltd. Built on real data from the company’s ERP.net system, the model captures important business dynamics such as product seasonality, sales momentum, and product lifecycle stage. The predictor is designed not to replace human judgment but to assist product managers by offering a reliable, explainable starting point for restocking decisions. With a practical Gradio interface and model confidence outputs, the tool bridges the gap between data science and business value.

###  Key Takeaways

- We built a **regression model** that predicts exact unit quantities per product per month.
- Feature engineering was driven by **domain understanding**, including flags for `Is_November`, `Is_New`, and `Product Group Label`.
- The model achieved a meaningful **R² score**, capturing general trends in sales behavior.
- **Explainability and uncertainty** were built into the system to support ethical, transparent decision-making.
- The Gradio app allows business users to test predictions without needing coding skills.
- The model is ready for real-world testing but can be further improved by integrating **external signals** such as pricing, competitor actions, and promotion calendars.

### Reflection


Having previously worked at Kikka Group, I initially assumed I had a strong understanding of the company’s operations, challenges, and workflows. This gave me a sense of confidence at the start of the project — I thought building a prediction model based on familiar sales data would be straightforward.

However, working with machine learning and AI turned out to be an entirely different challenge. The technical side of predictive modeling — especially **selecting the right features**, understanding **how they influence predictions**, and making the model outputs **interpretable and useful** — was far more complex than I had anticipated. Concepts like feature importance, overfitting, error metrics, and the subtleties of forecasting behavior forced me to think deeper and more critically.

This project taught me that even when you know the business, **turning operational knowledge into a working model is a completely separate skillset**. It requires both data literacy and humility — recognizing that AI is powerful, but not magic, and good results come only with iteration, experimentation, and reflection.

I leave this project with a much greater respect for what it takes to build trustworthy, usable AI tools in the real world.

Sources Notebook

- Notebook (master project) and all previous iterations: 'master-kikka-boo-predictor.ipynb'
- The Research Paper, Project Proposal and all sources outlined in those documents: 'research paper_v2-1.pdf', 'Project proposal_V3.pdf'
- Stake Holder Interviews