# <center> Categorical Embedding on Rossman data </center>

___

Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. 

Provided Data :

- train.csv - historical data including Sales <br>
- store_states.csv - State where the store is located in Germany

## Objective : Forecast sales using store, promotion, and competitor data

___

### Load required libraries

In [1]:
import pandas as pd
import numpy as np

from keras.models import Sequential, Model
from keras.layers import Embedding, Input, Dense, Activation, concatenate, Flatten, Reshape, Concatenate

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


### Read data

In [2]:
train_data = pd.read_csv("train.csv")
state_data = pd.read_csv("store.csv")

  interactivity=interactivity, compiler=compiler, result=result)


### Understanding Data

In [3]:
print(f"\nTrain Shape : {train_data.shape}")
print(f"\nStore States Shape : {state_data.shape}")


Train Shape : (1017209, 9)

Store States Shape : (1115, 2)


In [4]:
print('\n', ' '*20, 'Train Data - Top 5 Records')
display(train_data.head())
print('\n', ' '*20, 'Stores States Data - Top 5 Records')
display(state_data.head())


                      Train Data - Top 5 Records


Unnamed: 0,Store,DayOfWeek,Date,Sales,Customers,Open,Promo,StateHoliday,SchoolHoliday
0,1,5,2015-07-31,5263,555,1,1,0,1
1,2,5,2015-07-31,6064,625,1,1,0,1
2,3,5,2015-07-31,8314,821,1,1,0,1
3,4,5,2015-07-31,13995,1498,1,1,0,1
4,5,5,2015-07-31,4822,559,1,1,0,1



                      Stores States Data - Top 5 Records


Unnamed: 0,Store,State
0,1,HE
1,2,TH
2,3,NW
3,4,BE
4,5,SN


#### Store is a common attribute in both the datasets which can be used to merge datasets.<br>

In [5]:
np.unique(train_data['Store']).size == np.unique(state_data['Store']).size

True

In [6]:
train_data = pd.merge(train_data, state_data)

In [7]:
train_data.head(5)

Unnamed: 0,Store,DayOfWeek,Date,Sales,Customers,Open,Promo,StateHoliday,SchoolHoliday,State
0,1,5,2015-07-31,5263,555,1,1,0,1,HE
1,1,4,2015-07-30,5020,546,1,1,0,1,HE
2,1,3,2015-07-29,4782,523,1,1,0,1,HE
3,1,2,2015-07-28,5011,560,1,1,0,1,HE
4,1,1,2015-07-27,6102,612,1,1,0,1,HE


In [8]:
train_data.shape

(1017209, 10)

#### Convert Date attribute in to appropriate type

In [9]:
train_data['Date'] = pd.to_datetime(train_data['Date'], format='%Y-%m-%d')

# Extract Date Month and Year from the Date attribute
train_data['Year'] = train_data['Date'].dt.year
train_data['Month'] = train_data['Date'].dt.month
train_data['Day'] = train_data['Date'].dt.day

In [10]:
col = ['Store', 'DayOfWeek', 'Promo', 'Year', 'Month', 'Day', 'State']

train_data[col] = train_data[col].astype('category')

In [11]:
train_data.head(5)

Unnamed: 0,Store,DayOfWeek,Date,Sales,Customers,Open,Promo,StateHoliday,SchoolHoliday,State,Year,Month,Day
0,1,5,2015-07-31,5263,555,1,1,0,1,HE,2015,7,31
1,1,4,2015-07-30,5020,546,1,1,0,1,HE,2015,7,30
2,1,3,2015-07-29,4782,523,1,1,0,1,HE,2015,7,29
3,1,2,2015-07-28,5011,560,1,1,0,1,HE,2015,7,28
4,1,1,2015-07-27,6102,612,1,1,0,1,HE,2015,7,27


In [12]:
train_data.dtypes

Store                  category
DayOfWeek              category
Date             datetime64[ns]
Sales                     int64
Customers                 int64
Open                      int64
Promo                  category
StateHoliday             object
SchoolHoliday             int64
State                  category
Year                   category
Month                  category
Day                    category
dtype: object

#### Missing Data

    Finding missing values in train, state, and stores data

In [13]:
train_data.isnull().sum()

Store            0
DayOfWeek        0
Date             0
Sales            0
Customers        0
Open             0
Promo            0
StateHoliday     0
SchoolHoliday    0
State            0
Year             0
Month            0
Day              0
dtype: int64

### Subsetting only required columns for model building

- Only <u>'Store', 'DayOfWeek', 'Promo', 'Year', 'Month', 'Day', 'State'</u> attributes effect sales attribute.

In [14]:
train_data_X = train_data[['Store', 'DayOfWeek', 'Promo', 'Year', 'Month', 'Day', 'State']]

In [15]:
train_data_y = train_data['Sales']

In [16]:
print(f"The shape of train_data_X is {train_data_X.shape}")
print(f"The shape of train_data_y is {train_data_y.shape}")

The shape of train_data_X is (1017209, 7)
The shape of train_data_y is (1017209,)


In [17]:
for i in ['Store', 'DayOfWeek', 'Promo', 'Year', 'Month', 'Day', 'State']:
    print("{} has : {} unique values".format(i, np.size(np.unique(train_data_X[i]))))

Store has : 1115 unique values
DayOfWeek has : 7 unique values
Promo has : 2 unique values
Year has : 3 unique values
Month has : 12 unique values
Day has : 31 unique values
State has : 12 unique values


#### Defining Custom Function for Preprocessing and to calculate Error Metrics (Mean Absolute Error)

    As Sales in the data set spans 4 orders of magnitude, we used log(Sale) and rescaled it to the same range as the neural network output with log(Sale)/log(Salemax).

In [18]:
max_log_y = np.max(np.log(train_data_y))
max_log_y

  result = getattr(ufunc, method)(*inputs, **kwargs)


10.634676867382668

##### Below cell is for explanation purpose

In [19]:
temp = train_data_y[:1][0]
log_temp = np.log(temp)
tran_temp = log_temp/max_log_y
inv_tran_temp = tran_temp * max_log_y
org_temp = np.exp(inv_tran_temp)

print("Actual Sales values              :{}".format(temp))
print("Log of Actual Sales values       :{}".format(log_temp))
print("Transformed Sales values         :{}".format(tran_temp))
print("Inverse Transformed Sales values :{}".format(org_temp))

Actual Sales values              :5263
Log of Actual Sales values       :8.56845648535378
Transformed Sales values         :0.8057091524457939
Inverse Transformed Sales values :5263.000000000004


In [20]:
# Normalizing the sales by dividing with maximum of sales. Default base of log function is e.
def val_for_fit(val):
    return np.log(val)/max_log_y

# Denormalizing the predicted values back to original scale by multiplying with max and taking exponential
def val_for_pred(val):
    return np.exp(val * max_log_y)

#### Applying Label Encoder for all Categorical Attributes

In [21]:
train_data_X.head()

Unnamed: 0,Store,DayOfWeek,Promo,Year,Month,Day,State
0,1,5,1,2015,7,31,HE
1,1,4,1,2015,7,30,HE
2,1,3,1,2015,7,29,HE
3,1,2,1,2015,7,28,HE
4,1,1,1,2015,7,27,HE


In [22]:
train_data_X_LE = train_data_X.apply(LabelEncoder().fit_transform)

In [23]:
train_data_X_LE.head()

Unnamed: 0,Store,DayOfWeek,Promo,Year,Month,Day,State
0,0,4,1,2,6,30,4
1,0,3,1,2,6,29,4
2,0,2,1,2,6,28,4
3,0,1,1,2,6,27,4
4,0,0,1,2,6,26,4


#### Preprocessing the data for MLP

In [24]:
enc = OneHotEncoder(handle_unknown='ignore')

In [25]:
train_data_X_OHE = enc.fit_transform(train_data_X_LE)

In [26]:
train_data_X_OHE = enc.transform(train_data_X)

In [27]:
train_data_X_OHE.shape

(1017209, 1182)

#### Splitting the Dataset into train and validation

In [28]:
X_train_CE, X_val_CE, X_train_OHE, X_val_OHE, y_train, y_val = train_test_split(train_data_X_LE.values, 
                                                                                train_data_X_OHE, 
                                                                                train_data_y.values, 
                                                                                test_size=0.1, 
                                                                                random_state=123)

### MLP

In [29]:
model1 = Sequential()
model1.add(Dense(1000, input_dim=1182, activation='relu'))
model1.add(Dense(500, activation='relu'))
model1.add(Dense(1, activation='linear'))

model1.compile(loss='mean_absolute_error', optimizer='rmsprop')

#### Normalizing the sales (target variable)

In [30]:
model1.fit(X_train_OHE, val_for_fit(y_train), 
           validation_data=(X_val_OHE, val_for_fit(y_val)),
           epochs = 5, batch_size = 256)

  This is separate from the ipykernel package so we can avoid doing imports until


Train on 915488 samples, validate on 101721 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x10db06e10>

In [31]:
y_pred_val = model1.predict(X_val_OHE).flatten()
y_pred_val 

array([ 2.2404268e+06, -2.4261233e+03, -1.1412385e+04, ...,
       -5.4243302e+08, -1.6050476e+04,  2.2357038e+06], dtype=float32)

In [32]:
y_pred_val = val_for_pred(y_pred_val)
y_pred_val

array([inf,  0.,  0., ...,  0.,  0., inf], dtype=float32)

In [33]:
model1_val_err = (np.sum(np.absolute((y_val - y_pred_val) / y_val))/len(y_val)) * 100
model1_val_err

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.


nan

#### Preprocessing the data for MLP with Categorial embedding

#### Caregorical Embeddings
    
    We map categorical variables to a Euclidean spaces, which are the entity embeddings of the categorical variables.  The mapping is learned by a neural network during the standard supervised training process.  Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables

In [34]:
def preprocessing(X):
    X_list = split_features(X)
    return X_list

def split_features(X):

    X_list = []

    X_list.append(X[:, 0]) # store_index
    X_list.append(X[:, 1]) # day_of_week
    X_list.append(X[:, 2]) # promo
    X_list.append(X[:, 3]) # year)
    X_list.append(X[:, 4]) # month)
    X_list.append(X[:, 5]) # day)
    X_list.append(X[:, 6]) # State)

    return X_list

#### Adding specific Embedding layer to each Categorical Variable

As mentioned above we map categorical variables to Eucledian space there by mapping similar values close to each other in the embedding space. <br>

This is achieved by adding a specific Embedding layer to each categorical attribute in the dataset. The EE mapping for each layer is inspired from a research paper published on Categorical Embeddings https://arxiv.org/pdf/1604.06737.pdf

In [35]:
input_store = Input(shape=(1,))
output_store = Embedding(1115, 10, name='store_embedding')(input_store)
output_store = Reshape(target_shape=(10,))(output_store)

input_dow = Input(shape=(1,))
output_dow = Embedding(7, 6, name='dow_embedding')(input_dow)
output_dow = Reshape(target_shape=(6,))(output_dow)

input_promo = Input(shape=(1,))

input_year = Input(shape=(1,))
output_year = Embedding(3, 2, name='year_embedding')(input_year)
output_year = Reshape(target_shape=(2,))(output_year)

input_month = Input(shape=(1,))
output_month = Embedding(12, 6, name='month_embedding')(input_month)
output_month = Reshape(target_shape=(6,))(output_month)

input_day = Input(shape=(1,))
output_day = Embedding(31, 10, name='day_embedding')(input_day)
output_day = Reshape(target_shape=(10,))(output_day)

input_germanstate = Input(shape=(1,))
output_germanstate = Embedding(12, 6, name='state_embedding')(input_germanstate)
output_germanstate = Reshape(target_shape=(6,))(output_germanstate)

output_embeddings = [output_store, output_dow, input_promo,
                     output_year, output_month, output_day, output_germanstate]

output_model = Concatenate()(output_embeddings)
output_model = Dense(1000, activation='relu')(output_model)
output_model = Dense(500, activation='relu')(output_model)
output_model = Dense(1, activation='sigmoid')(output_model)

input_model = [input_store, input_dow, input_promo,
               input_year, input_month, input_day, input_germanstate]

In [36]:
model2 = Model(inputs=input_model, outputs=output_model)
model2.compile(loss='mean_absolute_error', optimizer='adam')

In [37]:
model2.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
input_6 (I

#### Normalizing the Sales(target) variable

In [38]:
model2.fit(preprocessing(X_train_CE), val_for_fit(y_train), validation_data=(preprocessing(X_val_CE), val_for_fit(y_val)), 
           epochs = 5, batch_size = 1024)

  This is separate from the ipykernel package so we can avoid doing imports until


Train on 915488 samples, validate on 101721 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1a32c14be0>

In [39]:
preproc_X_val = preprocessing(X_val_CE)

y_pred_val = model2.predict(preproc_X_val).flatten()

y_pred_val = val_for_pred(y_pred_val)

In [40]:
model2_val_err = (np.sum(np.absolute((y_val - y_pred_val) / y_val))/len(y_val))*100
model2_val_err

  """Entry point for launching an IPython kernel.


inf