# Entity-embedding-rossmann 

This is a Pytorch implementation with sklearn model interface for which most DS are familiar (`model.fit(X, y)`  and `model.predict(X, y)`)

This implementation reproduces the code used in the paper **"[Entity Embeddings of Categorical Variables](http://arxiv.org/abs/1604.06737)"** and extends its functionality to other Machine Learning problems. 

The original Keras code used as a benchmark can be found in: 
**Entity Embeddings of Categorical Variables [REPO](https://github.com/entron/entity-embedding-rossmann)**.

# Notes

This repo aims to provide an Entity Embedding Neural Network out-of-the-box model for Regression and Classification tasks.

To this date this repo has implemented:

- Regression (tested on original implementation in here).
- Binary Classification (used `EntEmbNNBinary` instead of `EntEmbNNRegression`) (tested on personal projects).

In [1]:
import pandas as pd
import datasets
import eval_utils
import numpy as np

from EENNRegression import EntEmbNNRegression

X, y, X_test, y_test = datasets.get_X_train_test_data()

# This normalization comes from original Entity Emb. original Code
y_max = max(y.max(), y_test.max())
y = np.log(y) / np.log(y_max)
y_test = np.log(y_test) / np.log(y_max)

for data in [X, X_test]:
    data.drop('Open', inplace=True, axis=1)

In [2]:
models = []
for _ in range(5):
    m = EntEmbNNRegression(
        cat_emb_dim={
            'Store': 10,
            'DayOfWeek': 6,
            'Promo': 1,
            'Year': 2,
            'Month': 6,
            'Day': 10,
            'State': 6},
        alpha=0,
        dense_layers=[1000, 500],
        drop_out_layers=[0., 0.],
        drop_out_emb=0.,
        loss_function='L1Loss',
        train_size=1., 
        verbose=True)

    m.fit(X, y)
    models.append(m)
    print('\n')

test_y_pred = np.array([model.predict(X_test) for model in models])
test_y_pred = test_y_pred.mean(axis=0)

print('MAPE: %s' % eval_utils.MAPE(
    y_true=y_test.values.flatten(),
    y_pred=test_y_pred))

	[1] Test: MSE:0.000197 MAE: 0.010257 gini: 0.901585 R2: 0.878147 MAPE: 0.012573
	[2] Test: MSE:0.000156 MAE: 0.00897 gini: 0.890218 R2: 0.903692 MAPE: 0.011022
	[3] Test: MSE:0.000146 MAE: 0.008785 gini: 0.916184 R2: 0.909749 MAPE: 0.010714
	[4] Test: MSE:0.000157 MAE: 0.00908 gini: 0.890544 R2: 0.902871 MAPE: 0.011217
	[5] Test: MSE:0.000132 MAE: 0.008313 gini: 0.918975 R2: 0.918285 MAPE: 0.010129
	[6] Test: MSE:0.000122 MAE: 0.007781 gini: 0.921821 R2: 0.924628 MAPE: 0.009557
	[7] Test: MSE:0.000157 MAE: 0.009234 gini: 0.907448 R2: 0.903 MAPE: 0.011397
	[8] Test: MSE:0.000115 MAE: 0.007533 gini: 0.93635 R2: 0.929313 MAPE: 0.009253
	[9] Test: MSE:0.000116 MAE: 0.007687 gini: 0.933982 R2: 0.928413 MAPE: 0.009381
	[10] Test: MSE:0.000107 MAE: 0.0073 gini: 0.950313 R2: 0.933775 MAPE: 0.008926


	[1] Test: MSE:0.000313 MAE: 0.013938 gini: 0.894436 R2: 0.807053 MAPE: 0.016888
	[2] Test: MSE:0.00016 MAE: 0.009192 gini: 0.937388 R2: 0.901161 MAPE: 0.011281
	[3] Test: MSE:0.000161 MAE: 0.009

## **Original output from [REPO](https://github.com/entron/entity-embedding-rossmann) code**:
    
`
Using TensorFlow backend.
Number of samples used for training: 200000
Fitting NN_with_EntityEmbedding...
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
200000/200000 [==============================] - 13s 64us/step - loss: 0.0142 - val_loss: 0.0119
Epoch 2/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0096 - val_loss: 0.0109
Epoch 3/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0089 - val_loss: 0.0113
Epoch 4/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0082 - val_loss: 0.0101
Epoch 5/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0077 - val_loss: 0.0101
Epoch 6/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0074 - val_loss: 0.0100
Epoch 7/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0072 - val_loss: 0.0099
Epoch 8/10
200000/200000 [==============================] - 12s 59us/step - loss: 0.0071 - val_loss: 0.0096
Epoch 9/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0069 - val_loss: 0.0092
Epoch 10/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0068 - val_loss: 0.0095
Result on validation data:  0.10152226095724903
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
200000/200000 [==============================] - 13s 63us/step - loss: 0.0140 - val_loss: 0.0117
Epoch 2/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0093 - val_loss: 0.0107
Epoch 3/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0084 - val_loss: 0.0109
Epoch 4/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0079 - val_loss: 0.0096
Epoch 5/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0076 - val_loss: 0.0097
Epoch 6/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0074 - val_loss: 0.0097
Epoch 7/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0072 - val_loss: 0.0097
Epoch 8/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0070 - val_loss: 0.0093
Epoch 9/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0069 - val_loss: 0.0094
Epoch 10/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0068 - val_loss: 0.0093
Result on validation data:  0.10194501967184522
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
200000/200000 [==============================] - 13s 64us/step - loss: 0.0141 - val_loss: 0.0121
Epoch 2/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0093 - val_loss: 0.0100
Epoch 3/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0084 - val_loss: 0.0098
Epoch 4/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0079 - val_loss: 0.0095
Epoch 5/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0076 - val_loss: 0.0098
Epoch 6/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0074 - val_loss: 0.0097
Epoch 7/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0072 - val_loss: 0.0098
Epoch 8/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0071 - val_loss: 0.0092
Epoch 9/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0070 - val_loss: 0.0093
Epoch 10/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0069 - val_loss: 0.0097
Result on validation data:  0.10076855799458961
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0141 - val_loss: 0.0114
Epoch 2/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0093 - val_loss: 0.0105
Epoch 3/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0084 - val_loss: 0.0108
Epoch 4/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0079 - val_loss: 0.0099
Epoch 5/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0076 - val_loss: 0.0098
Epoch 6/10
200000/200000 [==============================] - 12s 60us/step - loss: 0.0074 - val_loss: 0.0099
Epoch 7/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0072 - val_loss: 0.0100
Epoch 8/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0071 - val_loss: 0.0095
Epoch 9/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0070 - val_loss: 0.0096
Epoch 10/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0068 - val_loss: 0.0099
Result on validation data:  0.10973501886112967
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
200000/200000 [==============================] - 13s 63us/step - loss: 0.0144 - val_loss: 0.0116
Epoch 2/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0094 - val_loss: 0.0109
Epoch 3/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0084 - val_loss: 0.0103
Epoch 4/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0079 - val_loss: 0.0099
Epoch 5/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0076 - val_loss: 0.0104
Epoch 6/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0074 - val_loss: 0.0099
Epoch 7/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0072 - val_loss: 0.0099
Epoch 8/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0070 - val_loss: 0.0099
Epoch 9/10
200000/200000 [==============================] - 12s 62us/step - loss: 0.0069 - val_loss: 0.0097
Epoch 10/10
200000/200000 [==============================] - 12s 61us/step - loss: 0.0068 - val_loss: 0.0100
Result on validation data:  0.10491748954856149
`

## **XGB performance**:

In [4]:
import xgboost as xgb

X_train, y_train, X_test, y_test = datasets.get_X_train_test_data()
dtrain = xgb.DMatrix(
    X_train.apply(lambda x: x.cat.codes),
    label=np.log(y_train))
evallist = [(dtrain, 'train')]
param = {'nthread': 12,
         'max_depth': 7,
         'eta': 0.02,
         'silent': 1,
         'objective': 'reg:linear',
         'colsample_bytree': 0.7,
         'subsample': 0.7}

num_round = 3000
bst = xgb.train(param, dtrain, num_round, evallist, verbose_eval=False)

xgb_test_y_pred = bst.predict(
    xgb.DMatrix(X_test.apply(lambda x: x.cat.codes))
)
xgb_test_y_pred = np.exp((xgb_test_y_pred))
print('MAPE: %s' % eval_utils.MAPE(
    y_true=y_test, 
    y_pred=xgb_test_y_pred))

MAPE: 0.14712066617861289


## **Original output from [REPO](https://github.com/entron/entity-embedding-rossmann) code**:

`
.
.
.
[2987]  train-rmse:0.148847
[2988]  train-rmse:0.148845
[2989]  train-rmse:0.148842
[2990]  train-rmse:0.148839
[2991]  train-rmse:0.148834
[2992]  train-rmse:0.148819
[2993]  train-rmse:0.148768
[2994]  train-rmse:0.148762
[2995]  train-rmse:0.148741
[2996]  train-rmse:0.148705
[2997]  train-rmse:0.148667
[2998]  train-rmse:0.148622
[2999]  train-rmse:0.148584
Result on validation data:  0.14691216270195093
`