<a href="https://colab.research.google.com/github/acedesci/scanalytics/blob/master/S8_9_retail_analytics/DT_S8_Module1B_Retail_Demand_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas
import sklearn
from sklearn import *

# Block 1: Data input

In the first part of today's session, we designed our model on a predetermined subset of regressor variables and trained it by UPC. Now, for the 'new' data that we are going to see, we also organize that information by each UPC and its regressor variables.

In [7]:
# small example
url1 = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/S8_9_retail_analytics/predictionInput_Prob1.csv'

# large example
url2 = 'https://raw.githubusercontent.com/acedesci/scanalytics/master/S8_9_retail_analytics/predictionInput_Prob2.csv'

predDemand = pandas.read_csv(url1)
print(predDemand)

feature_list = ['PRICE', 'PRICE_p2', 'FEATURE', 'DISPLAY','TPR_ONLY','RELPRICE']
productList = predDemand['UPC'].unique()


    Unnamed: 0  avgPriceChoice         UPC  ...  DISPLAY  TPR_ONLY  RELPRICE
0            0             3.0  1600027528  ...        0         0  0.833333
1            1             3.0  1600027528  ...        0         0  1.000000
2            2             3.0  1600027528  ...        0         0  1.166667
3            3             3.0  1600027564  ...        0         0  0.833333
4            4             3.0  1600027564  ...        0         0  1.000000
5            5             3.0  1600027564  ...        0         0  1.166667
6            6             3.0  3000006340  ...        0         0  0.833333
7            7             3.0  3000006340  ...        0         0  1.000000
8            8             3.0  3000006340  ...        0         0  1.166667
9            9             3.0  3800031829  ...        0         0  0.833333
10          10             3.0  3800031829  ...        0         0  1.000000
11          11             3.0  3800031829  ...        0         0  1.166667

# Block 2: Model retrieval

In the next two cells, we retrieve the best model that we previously trained and saved on Google Drive.

In [3]:
# we need to remount Google Drive in order to load the data from it
import pickle

from google.colab import drive
drive.mount('/content/drive')
cwd = '/content/drive/My Drive/'

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


We load the model that we previously trained and saved for each UPC.

In [0]:
regr = {}
for upc in productList:
    filename = cwd+str(upc)+'_demand_model.sav'
    # save the model to disk
    regr[upc] = pickle.load(open(filename, 'rb'))

# Block 3: Demand forecasting

In this cell, we also create a loop **for** each UPC. The first line in the **for** loop loads the data on the explanatory variables (features) for each UPC and the 4th line uses the previously trained and saved model corresponding to that UPC to predict its sales. Don't forget to retrieve the data index (3rd line) so that we can record the predicted sales in the right order and that we can later double-check whether there is any index mismatch. The second last line of the cell is to add the predicted sales to the data table. We will see how this information comes in handy in Week 9. Don't forget to save this result.

In [8]:
X = {}
y_pred = {}
predictedValueSeries = pandas.Series()


for upc in productList:
  X[upc] = predDemand.loc[predDemand['UPC']==upc][feature_list] 
  print(X[upc])
  upcIndex = predDemand.loc[predDemand['UPC']==upc].index
  y_pred[upc] = regr[upc].predict(X[upc]).clip(0.0).round(1)
  print(regr[upc].coef_)
  predictedValueSeries = predictedValueSeries.append(pandas.Series(y_pred[upc], index = upcIndex))

predDemand['predictSales'] = predictedValueSeries
print(predDemand.to_string())

   PRICE  PRICE_p2  FEATURE  DISPLAY  TPR_ONLY  RELPRICE
0    2.5      6.25        0        0         0  0.833333
1    3.0      9.00        0        0         0  1.000000
2    3.5     12.25        0        0         0  1.166667
[-129.46228233   14.43342214    9.52236578   -2.52536055    0.
  -16.82157611]
   PRICE  PRICE_p2  FEATURE  DISPLAY  TPR_ONLY  RELPRICE
3    2.5      6.25        0        0         0  0.833333
4    3.0      9.00        0        0         0  1.000000
5    3.5     12.25        0        0         0  1.166667
[ 16.04399076  -2.65376672  16.70332847  15.12123867  -0.82767342
 -13.12215795]
   PRICE  PRICE_p2  FEATURE  DISPLAY  TPR_ONLY  RELPRICE
6    2.5      6.25        0        0         0  0.833333
7    3.0      9.00        0        0         0  1.000000
8    3.5     12.25        0        0         0  1.166667
[-21.6503457    2.45638246   8.99183479   2.55943835   0.4011117
  11.19664927]
    PRICE  PRICE_p2  FEATURE  DISPLAY  TPR_ONLY  RELPRICE
9     2.5      6.2

In [0]:
predDemand.to_csv(cwd +'predictedSales_Prob1.csv')