<a href="https://colab.research.google.com/github/acedesci/scanalytics/blob/master/S8_9_retail_analytics/S9_Module1B_Retail_Demand_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting demands for optimization models

In the first part of today's session, we designed our model on a predetermined subset of regressor variables and trained it by UPC. Now in this notebook, we will prepare our inputs for the optimization model by predicting the demands based on different price points.

In [0]:
import pandas
import sklearn
from sklearn import *

# 1. Data input

We have prepared the input files which contain the features to be predicted. The first file shows a small dataset whereas the second file consists of a large dataset. 

In [0]:
# small example
url1 = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/S8_9_retail_analytics/predictionInput_Prob1.csv'

# large example
url2 = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/S8_9_retail_analytics/predictionInput_Prob2.csv'

predDemand = pandas.read_csv(url1)
print(predDemand)

feature_list = ['PRICE', 'PRICE_p2', 'FEATURE', 'DISPLAY','TPR_ONLY','RELPRICE']
productList = predDemand['UPC'].unique()


# 2. Model retrieval

In the next two cells, we retrieve the best model that we previously trained and saved on Google Drive.

In [0]:
# we need to remount Google Drive in order to load the data from it
import pickle

from google.colab import drive
drive.mount('/content/drive')
cwd = '/content/drive/My Drive/'

We load the model that we previously trained and saved for each UPC.

In [0]:
regr = {}
for upc in productList:
    filename = cwd+str(upc)+'_demand_model.sav'
    # save the model to disk
    regr[upc] = pickle.load(open(filename, 'rb'))

# 3. Demand forecasting

In this cell, we also create a loop **for** each UPC. The first line in the **for** loop loads the data on the explanatory variables (features) for each UPC and the 4th line uses the previously trained and saved model corresponding to that UPC to predict its sales. Don't forget to retrieve the data index (3rd line) so that we can record the predicted sales in the right order and that we can later double-check whether there is any index mismatch. The second last line of the cell is to add the predicted sales to the data table. We will see how this information comes in handy in Week 9. Don't forget to save this result.

In [0]:
X = {}
y_pred = {}
predictedValueSeries = pandas.Series()

for upc in productList:
  X[upc] = predDemand.loc[predDemand['UPC']==upc][feature_list] 
  upcIndex = predDemand.loc[predDemand['UPC']==upc].index
  y_pred[upc] = regr[upc].predict(X[upc]).clip(0.0).round(1) # we need to make sure the predicted values cannot be negative
  predictedValueSeries = predictedValueSeries.append(pandas.Series(y_pred[upc], index = upcIndex))

predDemand['predictSales'] = predictedValueSeries
print(predDemand.to_string())

Now we save the predicted sales into csv file to be used in the optimization model later on.

In [0]:
predDemand.to_csv(cwd +'predictedSales_Prob1.csv')