<a href="https://colab.research.google.com/github/acedesci/scanalytics/blob/master/EN/S08_09_Retail_Analytics/S9_Module1B_Retail_Demand_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 8: Retail analytics - Prediction from trained models

## Module 1B: Predicting demands from trained models for decision models

In Module 1A in session 8, we designed our model on a predetermined subset of regressor variables and trained it by UPC. Now in this notebook, we will prepare our inputs for the optimization model by predicting the demands based on different price points.

In [None]:
import pandas
import sklearn
from sklearn import *

# 1. Data input

We have prepared the input files which contain the features to be predicted. The first file shows a small dataset whereas the second file consists of a large dataset, i.e.,

1.   **'InputFeatures_Prob1.csv'**. This is a small scale problem. The output of this will be used in the optimization model which you will see in Modules 2A (explicit model) and 2B (compact model).
2.   **'InputFeatures_Prob2.csv'**. This is a large-scale problem. This one contains a much higher number of variables and constraints to reflect real-life setting. We will use the output of this in the Module 2B.

In order to read the input, we provide two options here. Please run only either option 1 or option 2 (***not both***).

**Option 1: download from the URLs**. You can you can get it directly from the URLs as usual using the codes below to download 'InputFeatures_Prob1.csv' and save it in DataFrame


In [None]:
# small example
url = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/EN/S08_09_Retail_Analytics/InputFeatures_Prob1.csv'
# large example, please outcomment if you want to try
# url = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/EN/S08_09_Retail_Analytics/InputFeatures_Prob2.csv'

predDemand = pandas.read_csv(url)

# Dataset is now stored in a Pandas Dataframe predDemand
predDemand

**Option 2: Read the file directly from your drive.** You can upload the file from you PC. In this option, you must already download the file *'InputFeatures_Prob1.csv'* or 'InputFeatures_Prob2.csv' from Zonecours and save to your PC. Then you can upload it using the codes in the block below. After running the cell, click on "Choose Files" to upload it.

In [None]:
from google.colab import files
uploaded = files.upload()

import io
import pandas
predDemand = pandas.read_csv(io.BytesIO(uploaded['InputFeatures_Prob1.csv']))

# Dataset is now stored in a Pandas Dataframe predDemand
predDemand

# 2. Model retrieval

Next, we retrieve the best model that we previously trained and saved from the current working directory (cmd) based on one of the two two options below.

**Option 1 Local PC (Jupyter)**: if you model is saved on PC, you need to give the path to the saved models. Here we assume that it is located in the same folder as the notebook.

In [None]:
cwd = './'

**Option 2 Google Drive (Colab)**: if you model is saved on Google Drive, you can access the folder by mounting it and authorizing access.

In [None]:
# we need to remount Google Drive in order to load the data from it
from google.colab import drive
drive.mount('/content/drive')
cwd = '/content/drive/My Drive/'

Following the block above, we can now load the model that we previously trained and saved for each UPC.

In [None]:
import pickle

productList = predDemand['UPC'].unique()

regr = {}
for upc in productList:
    filename = cwd+str(upc)+'_demand_model.sav'
    # load the model to disk
    regr[upc] = pickle.load(open(filename, 'rb'))

# 3. Demand forecasting

In this cell, we also create a loop **for** each UPC. Here are the descriptions of each line in the for loop

*   The first line in the **for** loop loads the data on the explanatory variables (features) for each UPC.
*   The second line retrives the UPC value so that we can call and run the model for that UPC.
*   The third line takes the model object for the current UPC (*regr[upc]*) and predicts the demand. We also use the function *clip(0.0)* to make sure that the demand is non-negative (which is possible since the demand is a decreasing function of price and the regression function is unbounded) and function *round(1)* to round the predicted value to one digit.
*   The fourth line put the predicted demand into the series which will be added as a new column

Once the for loop terminated, we add a new column *'predictSales'* which shows the predicted demand.

In [None]:
feature_list = ['PRICE', 'PRICE_p2', 'FEATURE', 'DISPLAY','TPR_ONLY','RELPRICE']

X = {}
y_pred = {}

# prepare blank series which will be added as a new column to the DataFrame predDemand
predictedValueSeries = pandas.Series()

for upc in productList:
  # Line 1 of for loop: load the data on the explanatory variable
  X[upc] = predDemand.loc[predDemand['UPC']==upc][feature_list]

  # Line 2: retrieve the UPC value
  upcIndex = predDemand.loc[predDemand['UPC']==upc].index

  # Line 3: predice the demands and make sure the demand is non-negative
  y_pred[upc] = regr[upc].predict(X[upc]).clip(0.0).round(1)

  # Line 4: add the predicted demand to the series
  predictedValueSeries = predictedValueSeries._append(pandas.Series(y_pred[upc], index = upcIndex))

predDemand['predictSales'] = predictedValueSeries
print(predDemand.to_string())

Now we save the predicted sales into csv file to be used in the optimization model later on.

In [None]:
# Please save it as 'predictedSales_Prob1.csv' if 'InputFeatures_Prob1.csv' is used
# Otherwise, please save it as 'predictedSales_Prob2.csv' if 'InputFeatures_Prob2.csv' is used
predDemand.to_csv(cwd +'predictedSales_Prob1.csv')