# Capstone: Customer Demand Forecasting

**Notebook 1 - Contents**<br>
[Introduction](#Introduction)<br>
[Problem Statement](#Problem-Statement)<br>
[Methodology](#Methodology)<br>
[Model Selection](#Model-Selection)<br>
[Metrics](#Metrics)<br>

## Introduction

[Demand Forecasting](https://en.wikipedia.org/wiki/Demand_forecasting) refers to the process of making sales estimation about customer demand for a specific product and services over a certain time period.  It is crucial process in business environment as it provides both quantitative and qualitative insights of the product positioning in the target market so as to faciliate companies to make informed decisions on business growth, strategies, pricing and marketing initiatives. Failing of which leads to detrimental impact on customer satisfaction, business competitiveness, profitability and longevity. 

### Methods of Demand Forecasting

1. Passive:
   - it is an easy and straigh-forward prediction of future values based simply on past historical data with assumption future values dependant on past values with more emphasis placed upon latest sales. Normally used by companies with long histical data and relatively stable sales.
3. Active:
   - this type of regular demand planning used by startup businesses, growing companies and competitive industries.  
5. Short-term:
   - this method involves forecating at a small window of time in order to inform the day-to-day decision e.g. inventory planning for a Black Friday promotion.
   - it is useful for managing a just-in-time (JIT) supply chain or a product lineup that changes frequently.
   - However, most businesses will only use it in conjunction with longer-term projections. 
7. Long-term:
   - It covers longer horizon of trends, seasonality and growth analysis which drive corporate level planning on production capacity, capital investment and expansion strategies.  
9. Macro and micro:
    - Macro leval forecasting looks into external forces e.g. economic conditions, competition and customer perference to identify opportunies and challenges facing organizations. Micron level drills down to specific market segments or region which the product and services serve or interested to explore.
11. Internal:
    - Internal capacity planning to meet the long-term and short term business demands which covers human resources and infrastucture.  

### Importance of Demand Forecasting

1. Resources optimization: If the company is able to identify expected demand levels for its product or services, it allows them to make necessary preparations, be it tightening belts, staying on course or expand production.  This geared businesses towards optimization of resources utilization (finances, human resource and time).  
2. Efficient inventory management: Accurate demand forecasting help companies kept stocks at right balance.  Insufficient stocks resulted in customer dissatisfaction and revenue loss, and may lead to loss of future businesses.  On the other hand, overstocking incurs additional storage and logistics costs and would lead to stock obsolescence.
3. Operational efficiency : Demand forecasting helps align production with anticipated demand and reduce production downtime and idle resources. Efficient production planning also reduces waste and lowers production costs, contributing to higher margins.
4. SME business: Demand forecasting is particularly crucial for small- and medium-sized enterprises (SMEs).  Flawed demand planning may cost the loss of opportunity to fill big orders.  Excessively ambitious forecasts, on the other hand, leads to scaling too rapidly for demand that could not materialize as expected. SMEs also do not have the size of big organizations to cushion the impact of bad forecasting with the success in other business areas.
5. Customer satisfaction: Demand forecasting enables businesses to meet customer demand consistently and timely. This is crucial for customer satisfaction.  Accurate forecasts enable businesses to fulfill orders promptly, resulting in happy and loyal customers. By understanding customer preferences and trends, companies can tailor their products and services to better meet the evolving needs of their target audience.


### Challenges of Demand Forecasting

1. Data collection: We need good past historical data to predict future trends.  However, data collection is not smooth-sailing especially companies with legacy systems and work processes.
2. Human errors and bias: data collection and forecasting process are subjected to human errors and biases. There is tendency to focus on certain data based on individual perception.  Stakeholders conflicting interests gets in the way of successful demand forecasting as the sales leaders may want miximize sales but finance manager takes conservative approach towards costs control.
3. Data quality: Data formats may not be ideal for forecasting if it lacks consistency and necessary details.  However it can be time consuming to gather historical data of granularity.  Even more challenging if the company has high employee turnover rate.
4. **Seasonality**: Even with sufficient data of quality, forecasting demand into future can still be challenging as most companies are using rule-based forecasting which does not predict seasonality.
5. **Complexity**: We are not able to accurately predict customer preference, competitors strategies, new technology, government policies, unexpected pandemic and many other factors (micro and macro) that are out of control, but predicting trends and directions are beneficial to assist with operational planning. 
6. Lack expertise: Some companies lack expertise in the sphere of data analysis, statistics and domain knowledge in demand forecasting.  Technology capability is also one  factor that is absent in companies.  Human resorces is necessary to train existing employees to manage the machine learning models and make sense out of it.
7. Supply chain: Supply chain have become more complex with global sourcing and interdependency.  Disruptions in the supply chain resulted in delays and shortages. This poses challenges on companies implementing accurate demand forecasting facing the increasingly intricate supply networks.  

## Problem Statement

I would like to explore machine learning algorithms to develop an effective **short-term demand forecasting model** to help organizations to plan for demand surge for next 3 months, as current rule-based not able to predict seasonality and cannot handle complex and nonlinear relationships.  Demand surge happens when there are upcoming promotions or special events (Superbowl, Amazon Prime Day etc) so that they can swiftly act upon inventory planning and logistics to minimize costs.  

## Methodology

### Data Sources

https://www.kaggle.com/datasets/earthfromtop/amazon-sales-fy202021

### Information sources:

1. https://en.wikipedia.org/wiki/Demand_forecasting
2. https://medium.com/@futureanalytica/importance-of-demand-forecasting-86c1a4753dc0
3. https://www.netsuite.com/portal/resource/articles/inventory-management/demand-forecasting.shtml
4. https://www.thefulfillmentlab.com/blog/demand-forecasting
5. https://www.unitedstateszipcodes.org/
6. https://www.machinelearningplus.com/time-series/time-series-analysis-python/
7. https://machinelearningmastery.com/random-forest-for-time-series-forecasting/
8. https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/

### Workflow

![workflow.png](attachment:c2ad9374-58d8-4360-b589-b98f4f32ebb0.png)

## Model Selection

1. Time Series:
   - The classical methods, ARIMA, ARIMAX, SARIMA and SARIMAX, are used for modelling.
   - ARIMA and SARIMA refers to univariate time series forecasting models that are capable of capturing short-term patterns and seasonality with a limited past history data of 12 months.  The models consider recent past observations and don't rely on large amounts of historical data like other machine learning approaches.
   - ARIMAX and SARIMAX are the extension of traditional ARIMA and SARIMA respecitively. These extended models include both past values of target responses and additional features to make prediction of future values.  It accounts for features that are exogenous which means external factors that might be potential demand driver. This enables incorporating external factors to establish relationships between the factors and demand, so that I could utilize the features available in the datasets in modelling.
2. Regression:
   - Random Forest is an ensemble decision tree model that is used for classification and regression predictive modelling.
   - it can capture complex relationships in dataset and they are robust for short-term forecasts. 
   - It selects input variable randomly to create decision trees hence prediction errors from each tree is more different and less correlated.
   - The final prediction is drawn by averaged across all decision trees and I believe this would result in better performance than other bagged decision tree models.

## Risks

1. Overfitting: this risk is deemed high when dealing with smaller datasets. Overfitting occurs when models capture noises in the dataset rather than underlying patterns.  Hence continous improvement through further data colleciton and feature engineering would solve the overfitting issue.
2. Data quality: inaccurate and incomplete data affect quality of forecasting models. Data preprocessing and cleaning can mitigate the risk.
3. Feature selection: select irrelevant input features or omit important features run the risk of affecting modelling predicitve capability.
4. Model complexity: complex models may not generalize well to new data and can lead to overfitting.
5. Interpretability: blackbox models that are difficult to explain may be a concern for application in practice.
6. External factors: unforeseen circumstances e.g. pandemic or economic crises can be challenging to anticipate and incorporate into modelling.

## Metrics

![RMSE.png](attachment:8cbb15cf-b193-4d05-8b7c-b1bea2324b82.png)

For optimisation and selection of models, **RMSE** (Root Mean Squared Errors) is used due to the reasons as follows: <br>
1. Simplicity: RMSE is a relatively simple and straightforward metric to compute, making it practical for evaluation of forecasting models.
2. Popularity: RMSE is a widely accepted metric in the field of time series forecasting and machine learning to compare the performance of different models across the industry.
3. Interpretability: RMSE is expressed in the same units as the target variable (order quantity), which makes it easy to interpret and stakeholders are able to grasp the magnitude of forecasting errors.
4. Sensitivity to errors: RMSE assigns more weight to larger errors (as errors are squared) compared to other metrics like MAE (mean absolute error) and MAPE (mean absolute percentage error).
   - In this case, the target variable (order quantity) has a range of peaks and lulls, larger errors are expected at the peak in absolute values.
   - Furthermore, we are more focused on capturing the demand spikes. Therefore RMSE is particularly suitable for highlighting larger forecasting errors that have significant financial impact in supply chain management.