In this notebook, I will present how a simple AutoEncoder recommends the next item for the given basket. The dataset used in thise tutorial is available [here](https://www.kaggle.com/mittalvasu95/the-bread-basket). I chose this dataset for this tutorial because the dataset is small enough to implement our recommendation system quickly.

### Before start
- First of all, I really appreciate [@Aditya Mittal](https://www.kaggle.com/mittalvasu95) providing this dataset.
- I am sorry for my poor English in advance.

In [1]:
import time
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

In [2]:
SEED = 1234
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

In [3]:
df = pd.read_csv('./bread basket.csv')
print("The shape of df: ", df.shape)
df.head()

The shape of df:  (20507, 5)


Unnamed: 0,Transaction,Item,date_time,period_day,weekday_weekend
0,1,Bread,30-10-2016 09:58,morning,weekend
1,2,Scandinavian,30-10-2016 10:05,morning,weekend
2,2,Scandinavian,30-10-2016 10:05,morning,weekend
3,3,Hot chocolate,30-10-2016 10:07,morning,weekend
4,3,Jam,30-10-2016 10:07,morning,weekend


## Split data into train, validation and test
- There are 9,465 transactions
- For this kind of data, we shouldn't split data randomly because we want to predict "future" transactions when "past" transactions are given.
- Let's use the last 1,000 transactions as test data and last 1,000 transaction of remaining transactions as validation

In [4]:
print("The number of unique transactions: ", df['Transaction'].nunique())

The number of unique transactions:  9465


In [5]:
df['dataset'] = 'train'
df.loc[df['Transaction'].isin(df['Transaction'].unique()[-1000:]), 'dataset'] = 'test'
df.loc[df['Transaction'].isin(df['Transaction'].unique()[-2000:-1000]), 'dataset'] = 'valid'

print("The number of train transactions: ", df.loc[df['dataset'] == 'train', 'Transaction'].nunique())
print("The number of validation transactions: ", df.loc[df['dataset'] == 'valid', 'Transaction'].nunique())
print("The number of test transactions: ", df.loc[df['dataset'] == 'test', 'Transaction'].nunique())

The number of train transactions:  7465
The number of validation transactions:  1000
The number of test transactions:  1000


## Apply label encoding to `Item`
- There are many ways to implement label-encoding. Among them, I use `pandas.Categorical`

In [6]:
label_encoder = pd.Categorical(df['Item'])
label_encoder = {k: v for v, k in enumerate(label_encoder.categories)}
df['Item_encoded'] = df['Item'].apply(lambda x: label_encoder[x])

- (optional) It is helpful to print `label_encoder` for understanding

~~~python
print(label_encoder)
~~~

## Create `torch.nn.Dataset`
- Honestly, it is not necessary to make `torch.nn.Dataset` for small dataset. (But, I'm sure it is worth using it!)

In [7]:
class BasketDataset(Dataset):
    def __init__(self, df, dim_input, mode):
        super(BasketDataset, self).__init__()
        self.df = df
        self.dim_input = dim_input
        self.mode = mode
        self.indices = df['Transaction'].unique()
        
    def __len__(self):
        return len(self.indices)
        
    def __getitem__(self, index):
        transaction_id = self.indices[index]
        items = df.loc[df['Transaction'] == transaction_id, 'Item_encoded'].values
        
        X = torch.zeros(self.dim_input, dtype=torch.float32)
        y = torch.zeros(self.dim_input, dtype=torch.float32)
        X[items] = 1
        y[items] = 1
        
        return X, y
        

- (optional) Print `X` and `y` generated by BasketDataset

~~~python
dataset = BasketDataset(df=df[df['dataset'] == 'train'],
                        dim=94,
                        mode='train')

X, y = dataset[0]
print('X: ', X)
print('y: ', y)
~~~

## AutoEncoder
- I use a very simple AutoEncoder, that is, it has only one hidden layer.
- Dropout is used to only Encoder.
- Activation functions of Encoder and Decoder are sigmoid.
- It sounds like very poor model, but it is very powerful !

In [8]:
class Encoder(nn.Module):
    def __init__(self, dim_input, dim_latent, dropout):
        super(Encoder, self).__init__()
        self.latent_layer = nn.Linear(dim_input, dim_latent)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        x = self.latent_layer(x)
        x = torch.sigmoid(x)
        x = self.dropout(x)
        return x
    
class Decoder(nn.Module):
    def __init__(self, dim_output, dim_latent):
        super(Decoder, self).__init__()
        self.output_layer = nn.Linear(dim_latent, dim_output)
        
    def forward(self, x):
        x = self.output_layer(x)
        x = torch.sigmoid(x)
        return x
    
class AutoEncoder(nn.Module):
    def __init__(self, dim_input, dim_latent, dropout):
        super(AutoEncoder, self).__init__()
        self.encoder = Encoder(dim_input, dim_latent, dropout)
        self.decoder = Decoder(dim_input, dim_latent)
    
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x


- (optional) To the best of my knowledge, the original AutoEncoder shares weights of encoder and decoder, which called `Tied AutoEncoder`. Build  `TiedAutoEncoder` and compare it's performance with AutoEncoder (Honestly, I don't know how to implement `TiedAutoEncoder`T_T).

## Fitter
- Next, we will make a class that trains, evaluates, and predicts for given model and data loaders.
- Let's make a helper class storing and averaging the losses first.

In [9]:
class AverageMeter:
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.value = 0
        self.avg = 0
        self.sum = 0
        self.count = 0
    
    def update(self, value, n):
        self.value = value
        self.sum += value * n
        self.count += n
        self.avg = self.sum / self.count

In [10]:
class Fitter:
    def __init__(self, model, lr, n_epochs):
        self.model = model
        self.lr = lr
        self.n_epochs = n_epochs
        
        
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model.to(self.device)
        
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=self.lr)
        self.criterion = torch.nn.BCELoss()
        
        self.best_summary_loss = 10 ** 5
        
    def fit(self, train_loader, valid_loader):
        for epoch in range(self.n_epochs):
            # Train
            t = time.time()
            summary_loss = self.train_one_epoch(train_loader)
            print(
                f'\rEpoch:{epoch + 1}/{self.n_epochs} | ' +
                f'Train loss: {summary_loss.avg:.7f} | ' +
                f'Elapsed time: {time.time() - t:.3f} |'
            )
            
            # Evaluation
            t = time.time()
            summary_loss = self.evaluate(valid_loader)
            print(
                f'\rEpoch:{epoch + 1}/{self.n_epochs} | ' +
                f'Validation loss: {summary_loss.avg:.7f} | ' +
                f'Elapsed time: {time.time() - t:.3f}'
            )
        # End for (n_epochs)
        
    def train_one_epoch(self, train_loader):
        self.model.train()
        summary_loss = AverageMeter()
        t = time.time()
        
        for step, (X, y) in enumerate(train_loader):
            print(
                f'Train step: {step + 1}/{len(train_loader)} | ' +
                f'Summary loss: {summary_loss.avg:.7f} | ' +
                f'Time: {time.time() - t:.3f} |', end='\r'
            )
            X = X.to(self.device)
            y = y.to(self.device)
            batch_size = X.shape[0]
            
            self.optimizer.zero_grad()
            output = self.model(X)
            loss = self.criterion(output, y)
            loss.backward()
            summary_loss.update(loss.detach().item(), batch_size)
            self.optimizer.step()
        # End for (one epoch)
        return summary_loss
        
    def evaluate(self, valid_loader):
        self.model.eval()
        summary_loss = AverageMeter()
        t = time.time()
        
        with torch.no_grad():
            for step, (X, y) in enumerate(valid_loader):
                print(
                    f'Valid step: {step + 1}/{len(valid_loader)} | ' +
                    f'Summary loss: {summary_loss.avg:.7f} | ' + 
                    f'Time: {time.time() - t:.3f} |', end='\r'
                )

                X = X.to(self.device)
                y = y.to(self.device)
                batch_size = X.shape[0]

                self.optimizer.zero_grad()
                output = self.model(X)
                loss = self.criterion(output, y)
                summary_loss.update(loss.detach().item(), batch_size)
            # End for (One epoch)
        # End with (validataion)
        return summary_loss

- (optional, but necessary) You will see that our losses are very small. Why? There are many zeros in our target `y`, so that if our model predicts all values as 0 the average of losses go to 0. We can alleviate this problem by providing different weights to 0 and 1. To this end, we should make our own loss function.

~~~python
def weighted_binary_cross_entropy(output, target, weights=None):
    '''
    code from https://discuss.pytorch.org/t/solved-class-weight-for-bceloss/3114/2
    '''    
    if weights is not None:
        assert len(weights) == 2
        
        loss = weights[1] * (target * torch.log(output)) + \
               weights[0] * ((1 - target) * torch.log(1 - output))
        
    else:
        loss = target * torch.log(output) + (1 - target) * torch.log(1 - output)

    return torch.neg(torch.mean(loss))
~~~

## Let's train our model
- Define `BasketDataset` and pass it through `torch.nn.DataLoader`
- Define our `AutoEncoder` model
- Combine and train our model and data loader

In [11]:
DIM_INPUT = 94 # The number of unique items
DIM_LATENT = 64 # The number of nodes of the latent layer
BATCH_SIZE = 16
DROPOUT = 0.1
LR = 0.001
N_EPOCHS = 10

In [12]:
train_dataset = BasketDataset(df=df[df['dataset'] == 'train'],
                              dim_input=DIM_INPUT,
                              mode='train')
valid_dataset = BasketDataset(df=df[df['dataset'] == 'valid'],
                              dim_input=DIM_INPUT,
                              mode='train')

train_loader = DataLoader(train_dataset,
                          batch_size=BATCH_SIZE,
                          shuffle=True,
                          drop_last=True)

valid_loader = DataLoader(valid_dataset,
                          batch_size=BATCH_SIZE,
                          shuffle=False,
                          drop_last=False)

In [13]:
model = AutoEncoder(DIM_INPUT, DIM_LATENT, DROPOUT)

In [14]:
fitter = Fitter(model, LR, N_EPOCHS)

In [15]:
fitter.fit(train_loader, valid_loader)

Epoch:1/10 | Train loss: 0.1284798 | Elapsed time: 3.735 |3 |
Epoch:1/10 | Validation loss: 0.0714454 | Elapsed time: 0.380
Epoch:2/10 | Train loss: 0.0666606 | Elapsed time: 3.408 |6 |
Epoch:2/10 | Validation loss: 0.0658097 | Elapsed time: 0.414
Epoch:3/10 | Train loss: 0.0605446 | Elapsed time: 3.435 |3 |
Epoch:3/10 | Validation loss: 0.0582660 | Elapsed time: 0.385
Epoch:4/10 | Train loss: 0.0511943 | Elapsed time: 3.407 |4 |
Epoch:4/10 | Validation loss: 0.0480558 | Elapsed time: 0.397
Epoch:5/10 | Train loss: 0.0414551 | Elapsed time: 3.319 |6 |
Epoch:5/10 | Validation loss: 0.0391147 | Elapsed time: 0.408
Epoch:6/10 | Train loss: 0.0333258 | Elapsed time: 3.456 |4 |
Epoch:6/10 | Validation loss: 0.0321758 | Elapsed time: 0.386
Epoch:7/10 | Train loss: 0.0268864 | Elapsed time: 3.486 |4 |
Epoch:7/10 | Validation loss: 0.0265547 | Elapsed time: 0.399
Epoch:8/10 | Train loss: 0.0219704 | Elapsed time: 3.425 |3 |
Epoch:8/10 | Validation loss: 0.0221803 | Elapsed time: 0.400
Epoch:9/

## Recommend items for given basket
- I will show you some good results.
- Note that actually our model is not good because of the following:
    - Since almost all customers buy only 1~2 items, then our model cannot learn latent space enough.
    - It tends to recommend popular items such as `bread` or `coffee` (due to data imbalance)

In [16]:
label_decoder = {v: k for k, v in label_encoder.items()}

In [17]:
test_dataset = BasketDataset(df=df[df['dataset'] == 'test'],
                              dim_input=DIM_INPUT,
                              mode='test')

- Let's take a look the 707th transaction in the test data.

In [18]:
sample_id = 707

X, y = test_dataset[sample_id]

print([label_decoder[item.item()] for item in torch.where(X == 1)[0]])

['Bread', 'Cake', 'Coffee', 'Extra Salami or Feta', 'Juice', 'Salad', 'Spanish Brunch']


- Let's assume the customer picks from `Bread` to `Salad` only. Then our model can recommend `Spanish Brunch`?
- To this end, create `X_denoised` basket that has no `Spanish Brunch`

In [19]:
X_denoised = X.clone()
X_denoised[torch.where(X == 1)[0][-1]] = 0

basket = [label_decoder[item.item()] for item in torch.where(X_denoised == 1)[0]]
print(basket)

['Bread', 'Cake', 'Coffee', 'Extra Salami or Feta', 'Juice', 'Salad']


- Among the model's output, `Spanish Brunch` has the maximum logit value, except for the items tht are already in the basket.
- That is, our model recommends `Spanish Brunch` to the customer.

In [20]:
device = torch.device('cpu')
model.to(device)
model.eval()

output = model(X_denoised).detach().numpy()
TopK = np.argsort(-output)[:10]

print([label_decoder[item] for item in TopK if label_decoder[item] not in basket])

['Spanish Brunch', 'Jammie Dodgers', 'Frittata', 'Scone', 'Sandwich', 'Muffin']


- The 700th transaction in the test data

In [21]:
sample_id = 700

X, y = test_dataset[sample_id]
print([label_decoder[item.item()] for item in torch.where(X == 1)[0]])

['Coffee', 'Drinking chocolate spoons ', 'Juice', 'Mineral water', 'Salad', 'Sandwich']


In [22]:
X_denoised = X.clone()
X_denoised[torch.where(X == 1)[0][-1]] = 0

basket = [label_decoder[item.item()] for item in torch.where(X_denoised == 1)[0]]
print(basket)

['Coffee', 'Drinking chocolate spoons ', 'Juice', 'Mineral water', 'Salad']


In [23]:
device = torch.device('cpu')
model.to(device)
model.eval()

output = model(X_denoised).detach().numpy()
TopK = np.argsort(-output)[:10]

print([label_decoder[item] for item in TopK if label_decoder[item] not in basket])

['Sandwich', 'Spanish Brunch', 'Scone', 'Soup', 'Alfajores', 'Chicken Stew', 'Hearty & Seasonal']


## To do
- Visualize the latent space of items. Are they clustered properly? 
- Try the Denoising AutoEncoder that masks some items of an input, but still have to reconsturct the original input. For example, let's assumt an input $X$ has items `['Coffee', 'Drinking chocolate spoons ', 'Juice', 'Mineral water', 'Salad', 'Sandwich']`. The input and output of AutoEncoder are $X$ itself. However, the Denoising AutoEncoder has to reconstruct $X$ for the given denoised input $X_{\text{denoised}}$ whose some items are masked, for example $X_{\text{denoised}}$=`['Coffee', 'Juice', 'Salad', 'Sandwich']`. Denoising AutoEncoder provides more robust model.