# Program 3 SALES FORECASTS AND LATENT FEATURE IDENTITY

##### PURPOSE:  
This program implements a embedded neural network in tensorflow to perform a store by store
sales forecast.  This is not an inferential program although it measures accuracy against a test set in mean 
average percentage error.  The program tests for potential sales accuracy if data additional to the base 
accounting data could be developed.  The latent features and store scaling measures the impace of unknown
features available to the store management but not available from the accounting data.


##### INPUT: 
Features developed in the prior programs.

##### OUTPUT: 
Sales forecasts and accutacy on a test set of known data and identification of latent (or unknown) variables on the sales forecasts by impact strength, store id, and date of occurrence.

In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

Import standard python and sklearn libraries

In [2]:
import pandas as pd
import numpy as np
from datetime import datetime
from sklearn_pandas import DataFrameMapper
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import mean_absolute_error as mae
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import os as os

Import tensorflow and tensorflow libraries

In [3]:
import tensorflow as tf
from tensorflow.keras import layers,callbacks,losses,optimizers,initializers,models,regularizers
from tensorflow.keras.layers import Dense,Dropout,BatchNormalization,Embedding,Flatten,concatenate,Input
from tensorflow.keras.callbacks import CSVLogger,ReduceLROnPlateau,ModelCheckpoint,EarlyStopping
from tensorflow.keras.models import Model
from tensorflow.keras.losses import mean_squared_error, mean_absolute_error,mean_absolute_percentage_error
from tensorflow.keras.optimizers import SGD,RMSprop,Adam,Adamax
#from tensorflow.train import Adam
from tensorflow.keras.initializers import RandomNormal,RandomUniform,TruncatedNormal
from tensorflow.keras.metrics import mean_absolute_percentage_error

Set seed for initializers

In [4]:
tf.set_random_seed(73)

Check for gpu and expect this output:

[
  name: "/cpu:0"device_type: "CPU",
  name: "/gpu:0"device_type: "GPU"
]


In [5]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4964303642562140285
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 6087873103136677902
physical_device_desc: "device: XLA_CPU device"
]


Read data file

In [6]:
dt = pd.read_feather(os.getcwd() + '/agg_data/' + 'Features.feather')

Identify categorical, continuous, data and target columns

In [7]:
def label_vars(dt):
    cat_vars = ['STORENUMBER', 'WEEKDAY', 'HOLIDAY', 'YEAR', 'WEEKOFYR', 'MONOFYR', 'DAY', 'BEFOREHOLIDAY',
       'AFTERHOLIDAY', 'DAYSAFTEROPEN', 'DAYSBEFORECLOSE', 'LTS','LTAS']
    cont_vars = ['DAYSINSAMPLE']
    dep = ['SALES']
    date = ['DATE']
    dt = dt[cat_vars + cont_vars + dep + date].copy()
    dt.sort_values(by=['STORENUMBER','DATE'],inplace=True)
    dt.reset_index(drop=True,inplace=True)
    return dt,cat_vars,cont_vars

In [8]:
df,cat_vars,cont_vars = label_vars(dt)

Create categorical embed maximum length,embedding dict, and categorical map function of labelencoder to set number of categories in each category feature.

In [9]:
def cat_data(df,cat_vars):
    cat_emb_max = [len(df[c].unique()) for c in cat_vars]
    cat_vars_dict = dict(zip(cat_vars,cat_emb_max))
    cat_map = [(c,LabelEncoder()) for c in cat_vars]
    return cat_vars_dict,cat_map

In [10]:
cat_vars_dict,cat_map = cat_data(df,cat_vars)

Create continuous data map function with minmaxwscaler and range default to 0,1

In [11]:
def cont_data(cont_vars,mn=0,mx=1):
    cont_map = [([c],MinMaxScaler(feature_range = (mn,mx),copy=False)) for c in cont_vars]
    return cont_map

In [12]:
cont_map = cont_data(cont_vars)

Fit map function to continuous and categorical features but do not apply transform until after data is split into train, validate and test.  This fits labels and scaled range to entire data set rather than train,validate and test separtely.  DataFrameMapper from sklearn-pandas will only transform the features by column label inluded in the category and continuous feature lists called cat_map or cont_map.

In [13]:
def vars_mapped(cat_map,cont_map,dt):
    cat_mapper = DataFrameMapper(cat_map)
    cat_map_fit = cat_mapper.fit(dt)
    cont_mapper = DataFrameMapper(cont_map)
    cont_map_fit = cont_mapper.fit(dt)
    return cat_map_fit,cont_map_fit

##### Ignore Data conversion warning 
It is expected since continuous feature is a int64 vector.

In [14]:
cat_map_fit,cont_map_fit = vars_mapped(cat_map,cont_map,dt)



Set train, test, validate dates with validation as last 56 days and test at last 14 days

In [15]:
def split_dates(dt,vstart=72,tstart=14):
    dates = list(dt.DATE.unique())
    dates.sort()
    dates_validate = dates[-vstart:-tstart]
    dates_test = dates[-tstart:]
    dates_train = dates[:-vstart]
    return dates_train,dates_validate,dates_test,dates

In [16]:
dates_train,dates_validate,dates_test,dates = split_dates(dt)

Split dataset by train,validate,test

In [17]:
def split_data(dt,dates_train,dates_validate,dates_test):      
    data = dt.sort_values(by=['STORENUMBER','DATE'])
    data_train = data.loc[data.DATE.isin(dates_train)]
    data_validate = data.loc[data.DATE.isin(dates_validate)]
    data_test = data.loc[data.DATE.isin(dates_test)]
    return data_train,data_validate,data_test,data  

In [18]:
data_train,data_validate,data_test,data = split_data(dt,dates_train,dates_validate,dates_test)

Encode and scale data and reshape into array of vectors. 
___________________________________________________________________________________________________________
Since the input layer of the neural network is a horizontally concatenated layer of each categorical variable in its own embedding input shared with the continuous variables each in its own dense input the train, validate and test data needs to be reshaped into a list of vectors for each feature.  To keep the array in mixed dtypes (i.e., int and float), input data is a list of arrays with each element in the list being a vector for the shared input features.

In [19]:
def map_shape_data(data_train,data_validate,data_test,cat_map_fit,cont_map_fit):
    #set target variables
    y_tr = data_train.SALES.values.reshape(-1,1)
    y_val = data_validate.SALES.values.reshape(-1,1)
    y_ts = data_test.SALES.values.reshape(-1,1)
    #transform categorical data
    cat_train = cat_map_fit.transform(data_train).astype(np.int64)
    cat_validate = cat_map_fit.transform(data_validate).astype(np.int64)
    cat_test = cat_map_fit.transform(data_test).astype(np.int64)
    #transform continuous variables
    cont_train = cont_map_fit.transform(data_train).astype(np.float32)
    cont_validate = cont_map_fit.transform(data_validate).astype(np.float32)
    cont_test = cont_map_fit.transform(data_test).astype(np.float32)
    #combine categorical and continuous data into array of vectors
    data_tr = np.hsplit(cat_train,cat_train.shape[1])+np.hsplit(cont_train,cont_train.shape[1])
    data_val = np.hsplit(cat_validate,cat_validate.shape[1])+np.hsplit(cont_validate,cont_validate.shape[1])
    data_ts = np.hsplit(cat_test,cat_test.shape[1])+np.hsplit(cont_test,cont_test.shape[1])
    return y_tr,y_val,y_ts,data_tr,data_val,data_ts

In [20]:
y_tr,y_val,y_ts,data_tr,data_val,data_ts = map_shape_data(data_train,data_validate,data_test,cat_map_fit,cont_map_fit)

Function to create single input vector (input_shape = 1) for categorical input layer

In [21]:
def cat_input(feat,cat_vars_dict):
    name = feat[0]
    c1 = len(feat[1].classes_)
    c2 = cat_vars_dict[name]
    if c2 > 50:c2 = 50
    inp = Input(shape=(1,),dtype='int64',name=name + '_in')
    #no third dimension for a time distributed series so flattened into column of 1
    #embedding layer is map of number of classes (c) to number of embedded features (c2)
    u = Flatten(name=name+'_flt')(Embedding(c1,c2,input_length=1)(inp))
    return inp,u

Create list of Input,Flatten,and Embedding layers for the categorical features

In [22]:
embs = [cat_input(feat,cat_vars_dict) for feat in cat_map_fit.features]

Instructions for updating:
Colocations handled automatically by placer.


The deprecation warning is an incompatibility between keras and tensorflow.keras.  The error message is an outstanding bug in tensorflow and does not occur in keras.  Tensorflow has an open issue report regarding this error message.

Function to create Input and Dense layer for continuous features

In [23]:
def cont_input(feat):
    name = feat[0][0]
    inp = Input((1,), name=name+'_in')
    d = Dense(1, name = name + '_d')(inp)
    return inp,d

Create list of Input and Dense layers for continuous features

In [24]:
conts = [cont_input(feat) for feat in cont_map_fit.features]

Build a four layer model using a shared input layer for the categorical and continuous variables.  The hideen 2 layers are high node counts because sample count in input data is large. 

In [25]:
def embed_model(conts,embs):
    #concatenate the inputs and embedded layers with the inputs and continuous dense layers
    #referred to as 'shared layers' in tensorflow.keras documentation
    x = concatenate([emb for inp,emb in embs] + [d for inp,d in conts])
    #apply L2 normalization using the BatchNormalization method on continuous features
    x = Dense(1000, activation='relu',kernel_initializer='uniform',bias_initializer='zeros')(x)
    #apply small dropout for first normalization
    x = Dropout(rate=0.1)(x)
    #apply additional L2 normalization using the BatchNormalization method
    x =	BatchNormalization()(x)
    x = Dense(500, activation='relu',kernel_initializer='uniform',bias_initializer='zeros')(x)
    #apply small dropout for normalization
    x =	Dropout(rate=0.1)(x)
    #apply L2 normalization using the BatchNormalization method
    x = BatchNormalization()(x)
    x = Dense(250,activation='relu',kernel_initializer='uniform',bias_initializer='zeros')(x)  
    x =	Dropout(rate=0.1)(x)
    #apply L2 normalization using the BatchNormalization method
    x = BatchNormalization()(x)
    x = Dense(1, activation='relu',kernel_initializer='uniform',bias_initializer='zeros')(x)
    model = Model([inp for inp,emb in embs] + [inp for inp,d in conts], x)
    model.compile(optimizer='Adam',loss='mean_absolute_error',metrics=['mape'])
    return model

Implement logger,reduce the learning rate when loss function change gets small,add early stopping and build model

In [26]:
csv_logger = CSVLogger('SF_data/SF_Error.csv')
rlr = ReduceLROnPlateau(monitor='val_loss',factor=0.1,patience=3,min_lr=0.0001)
mc = ModelCheckpoint('SF_data/SFBestModel',save_best_only=True)
model = embed_model(conts,embs)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


This next process is cpu/gpu intensive.  This code should be run on a gpu.

In [27]:
model.fit(data_tr,y_tr,batch_size=64,epochs=20,verbose=1,validation_data = (data_val,y_val),callbacks=[csv_logger,rlr,mc])

Train on 28229 samples, validate on 2553 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fa8a4503c18>

Load the weights from the modelcheckpoint save of best weights - 'save_best_only=True' 

In [28]:
model.load_weights('SF_data/SFBestModel')

Function to make predictions

In [29]:
def prediction(model_data,model=model):
    pred = model.predict(model_data)
    return pred

Perform predictions on each data set for graphing purposes

In [30]:
pred_test = prediction(data_ts)
pred_val = prediction(data_val)

Function to convert list of arrays to list of scalers

In [31]:
def array_to_list(arr):
    listed = [item for sublist in arr for item in sublist]
    return listed

Function to form scaled predicted and actual results by date for stores and to form latent variable dataframe

In [32]:
def results_to_dataframe(data_validate,data_test,pred_test,pred_val,data):
    pred_test = array_to_list(pred_test)
    pred_val = array_to_list(pred_val)
    preds = pred_val + pred_test
    dr = pd.concat([data_validate,data_test],axis=0)
    dr['DATE'] = data.DATE
    dr['STORENUMBER'] = data.STORENUMBER
    dr['SCALED_ACTUAL_SALES'] = dr.SALES
    dr['SCALED_PREDICTED_SALES'] = preds
    dr['LTS'] = data.LTS
    dt['LTAS'] = data.LTAS
    dl = dr.loc[:,['DATE','STORENUMBER','SCALED_ACTUAL_SALES','SCALED_PREDICTED_SALES','LTS','LTAS']]
    dr = dr.loc[:,['DATE','STORENUMBER','SCALED_ACTUAL_SALES','SCALED_PREDICTED_SALES']]
    dr.to_csv('agg_data/Scaled_Results.csv',index=False)
    dl.to_csv('agg_data/Scaled_Latents.csv',index=False)
    return

Store dataframe of predictions using scaled sales

In [33]:
results_to_dataframe(data_validate,data_test,pred_test,pred_val,data)

##### End of code: Close this file using File 'Close and Halt' from dropdown menu