## Exercise 2 - Demand Prediction

Accurate demand forecasts enable organizations to anticipate demand and consequently allocate the optimal amount of resources to minimize stagnant inventory. In recent years, different methods of machine learning have been applied to solve the demand prediction problem. 

Since machine leaning tries to replica what human think and do, the first thing after defining the problem is interviewing with people who do the task manually such as store management to get a better sense of the problem/features that they do care when they make decision. Next step would be understanding the data and gaining insights with visualization. This helps the data scientist get a better sense of current features or any new feature that needs to be built. Then data preparation which includes scaling, normalization, finding outliers, dealing with missing values and feature engineering is performed.

This type of models is data hungry and any feature could be helpful. Data points that I can think of are clustered into 5 pieces.

Item dataset
* id
* expiry date
* perishable
* main category
* category
* shipping cost
* unit sale
* price
* brand
* review on social media
* sentimental analysis on social media
* description


promotion dataset
* start date
* end date
* item id
* before price
* after price


transaction dataset 
* status
* status at
* store id
* item id


calendar dataset
* temperature
* days
* stat holidays


store dataset
* id
* location, city, province, country
* demographics in that local area includes population, education, income,...


Many of demand prediction methodologies depend on time windows. Therefore, it is a good idea to have features in time windows as well. This is the list of engineered features that could be added to the model:
Time window is last 1, 3, 7, 14, 30, 60 days

* sales of each item in time windows
* mean sales of item/store for each day of week 
* days since last appearance in store
* monthly unit sale for each item
* monthly unit sale for each store
* mean sale of three most associated item to this item 
* mean of sale for each item in all store in time windows
* mean number of zero sales days during several time windows for item/store pairs
* mean sales during several last periods for item/store for each day of month
* mean sales during several last periods for item/store main category
* mean sales during several last periods for item/store category
* mean sales during several last periods for item/store city
* mean sales during several last periods for item/store province
* mean sales during several last periods for item/store type

After all, it is time to build a model. I believe that building this model required tons of data for few years. Also, each item should be treated differently. There are different methodologies that are usually applied to demand prediction. In most cases, the data team starts with owning part of the system and gradually increase that portion. For instance, 70% of decisions will be made by machines and 30% will be made by humans at the beginning and finally machine will take over. 

### Methodologies to try out:

#### Deep Neural Networks and Recurrent Neural Networds, LSTM
LSTM and RNNs to predict next sequence of sale. 

#### Reinforcement Learning
Reinforcement Learning is a domain in artificial intelligence where the models don’t simply make predictions or classifications, but actually act on these predictions. This is done by rewarding and punishing the model for acting incorrectly. In this case, we typically establish punishments for letting an particular inventory item run out of stock, we also punish the model for stock too higher value for too long. For rewards, we primarily focus on ordering items within a safe window before the demand.

#### Association Rules for Basket Analysis
This method shows which items are bought together. It counts how often theses items occur together. Therefore, it is a good way to have red flag. Or restock related items at the same time. 

#### Autoregressive Integrated Moving Average (ARIMA). 
Moving Average Method is one of the oldest and most widely used methods of demand forecasting. In this method, the average sales of the previous 1, 3, 7, 14, 30, 60 days are used as the predictor for the sales of the next day. The predictions are multiplied by a factor that takes care of the difference in sales across the different days of the week. It is simple and gives good accuracy when done on a short-term horizon. However, it is not likely to predict well for a longer-duration span as it is not generalizing the trend mere following the past behavior with auto-regressive components.

#### High Performance Gradient Boosting (GBDT, GBRT, GBM or MART) framework based on decision tree 
It is designed to be run for large data size where it provides the maximum time performance while achieving the same accuracy as other Decision Trees Boosting methods such as XGBoost and pGBRT. It runs by splitting the tree at a leaf rather than at a level.

#### Factorization Machines 
works good with sparse data. Not very fast though but it could uncover the hidden relations between items or different features with each other. 

### What I would do:
The model could be a combination of all these models;
First moving average, then building a RNN and gradient boosting model that utilize that moving average at different lags. The choice of the method depends on how fast and accurate the model should perform in the business. For instance, arima is the basic model to build. The method is very fast, simple and should give a descent result for short time but not for long time. On the other side, building a neural network requires lots of data and sometimes long training time. However, it could uncover nonlinear relations very well.


### Evaluation: 

At the end the model should be evaluated and deployed if successful. Although iterations are necessary to tweak the parameters and improve the model. Deciding evaluation metric is actually the most important part in real world scenarios. We need to make sure it aligns with the business goal. I posit that the real value of a model to a business is a composite of (1) predictive model accuracy, (2) runtime, (3) scalability and (4) ease of use. Validation could be based on time windows like next 7, 14, 30 days
Forecasting models are usually evaluated based on the statistical measures such as Normalized Weighted Mean Squared Logarithmic Error(NWMSLE). Which weight is the weight given to SKU. Perishable items are given higher weights in evaluation
We can Use the first year’s data to build the engine by applying machine learning and the second year’s data to fine-tune it by pretending that you don’t know what happened that year. Having more years and more sources of data available, even better, because it will allow us to feed the engine more precisely.
Also, a side-by-side test is a good way to develop trust. Compare the results of the analytics model with human forecasters’ results in an area where historical data is available.
In any case, the automated system will learn from its experience and human input, tweaking the algorithm and becoming more accurate over time.
