# Neural Approaches to Recommendation Systems

Recommender Systems are one of the most popular applications of Machine Learning systems. Due to their widespread success, they are quickly becoming ubiquitous to a lot of businesses. Traditionally, collaborative filtering and matrix factorization techniques were used to solve these problems.

In the last couple of years, this trend has been changing. Due to the massive success of effectively training deep neural nets, new approaches have been developed by leveraging the tools and modeling flexibility from the Deep Learning ecosystem.

This hack session gives a primer into these concepts using neural network architectures.

For those who are interested in an intuitive explanation to collaborative filtering and embeddings, please refer to this brilliant **fast.ai lesson by Jeremy and Rachel - http://course.fast.ai/lessons/lesson4.html**

# Recommendation Engines


## Common Applications
    - Feed (News Feed on Facebook, Feed on Twitter, Explore on Instagram, Home Page of Amazon etc.)
    - Rule of thumb: 
        - Large item inventory
        - Thus discoverability is a problem.
        - Recsys will make their mark.
    - Traditional methods in recsys
        - User - User similarity
        - Item - Item similarity
        - Hybrid models - Collaborative Filtering
        - Matrix factorization
    - Entry of Neural Approaches
        - Latent factors in earlier approaches analogous to Embeddings in deep learning ecosystem
        - GPU training, superior optimization techniques (Adam etc).
        - Flexibility of adding layers, ease of adding additional metadata and joint-training is a plus in this approach.


### Let's dive!

Table of Contents:
    0. Installations
    1. Import the necessary libraries (print versions of the libraries)
    2. Read the necessary datasets
    3. Create the interactions frame
    4. Split the frame into train and validation sets
    5. Create the keras network (after creating necessary embeddings)
    6. Train the network and monitor accuracy on validation
    7. Make the network deeper by adding dense layers and re-train the network

# 0. Installations
    - wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh # Anaconda Python 3.6 installer
    - conda install -c conda-forge keras # Install Keras

# 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

# Setting seed before importing keras to ensure reproducibility
np.random.seed(2017)
import keras as K

print("Pandas version: ", pd.__version__)
print("Numpy version: ", np.__version__)
print("Keras version: ", K.__version__)


Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


('Pandas version: ', u'0.18.1')
('Numpy version: ', '1.11.1')
('Keras version: ', '1.1.0')


# 2. Read the necessary datasets

In [2]:
# Reading in the datasets
train = pd.read_csv("../input/train.csv"); print(train.shape)
test = pd.read_csv("../input/test.csv"); print(test.shape)

print("# Users: {} | # Articles: {}".format(train.User_ID.nunique(), train.Article_ID.nunique()))

diff = np.setdiff1d(train.User_ID.unique(), test.User_ID.unique())
train = train[~train.User_ID.isin(diff)].reset_index(drop=True) # Drop train-only users.

(679051, 4)
(291023, 3)
# Users: 73489 | # Articles: 214027


In [3]:
train.head()

Unnamed: 0,User_ID,Article_ID,Rating,ID
0,20080828074,1219102233,0,20080828074_1219102233
1,20080820760,1219151095,0,20080820760_1219151095
2,20080824760,1219295837,5,20080824760_1219295837
3,20080820470,1219098705,0,20080820470_1219098705
4,20080821438,1219144384,0,20080821438_1219144384


# 3. Create the interactions frame

In [4]:
# Creating one dataframe of the interactions
ratings = pd.concat([train, test])

users = ratings.User_ID.unique() # unique users
articles = ratings.Article_ID.unique()

# Create userid & itemid to index mappings
userid2idx = {o:i for i,o in enumerate(users)}
articlesid2idx = {o:i for i,o in enumerate(articles)}

ratings.Article_ID = ratings.Article_ID.apply(lambda x: articlesid2idx[x])
ratings.User_ID = ratings.User_ID.apply(lambda x: userid2idx[x])

n_users = ratings.User_ID.nunique()
n_articles = ratings.Article_ID.nunique()

In [5]:
train.head()

Unnamed: 0,User_ID,Article_ID,Rating,ID
0,20080828074,1219102233,0,20080828074_1219102233
1,20080820760,1219151095,0,20080820760_1219151095
2,20080824760,1219295837,5,20080824760_1219295837
3,20080820470,1219098705,0,20080820470_1219098705
4,20080821438,1219144384,0,20080821438_1219144384


# 4. Split the frame into train and validation sets

In [6]:
X_train = ratings[0:len(train)]; print(X_train.shape)
X_test = ratings[len(train):len(ratings)]; print(X_test.shape)

# Split the data into train and validation sets
np.random.seed(2017)
msk = np.random.rand(len(X_train)) < 0.8 # 20 %
trn = X_train[msk]
val = X_train[~msk]

(618437, 4)
(291023, 4)


In [8]:
ratings.Article_ID.nunique()

253933

# 5. Create the keras network 
    - After creating necessary embeddings for each User_ID and Article_ID

In [9]:
n_factors = 50
import keras.backend as K

def rmse(y_true, y_pred):
    score = K.sqrt(K.mean(K.pow(y_true - y_pred, 2)))
    return score

from keras.layers import Input, Embedding, Dense, Dropout, merge, Flatten
from keras.regularizers import l2
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import Callback, TensorBoard

def embedding_input(name, n_in, n_out, reg):
    inp = Input(shape=(1,), dtype='int64', name=name)
    return inp, Embedding(n_in, n_out, input_length=1, W_regularizer=l2(reg))(inp)

def create_bias(inp, n_in):
    x = Embedding(n_in, 1, input_length=1)(inp)
    return Flatten()(x)

user_in, u = embedding_input('user_in', n_users, n_factors, 1e-3)
article_in, a = embedding_input('article_in', n_articles, n_factors, 1e-3)

ub = create_bias(user_in, n_users)
ab = create_bias(article_in, n_articles)

x = merge([u, a], mode='dot')
x = Flatten()(x)
x = merge([x, ub], mode='sum')
x = merge([x, ab], mode='sum')

model = Model([user_in, article_in], x)
model.compile(Adam(5e-3), loss='mse', metrics=[rmse])

In [10]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
user_in (InputLayer)             (None, 1)             0                                            
____________________________________________________________________________________________________
article_in (InputLayer)          (None, 1)             0                                            
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 1, 50)         2296950     user_in[0][0]                    
____________________________________________________________________________________________________
embedding_2 (Embedding)          (None, 1, 50)         12696650    article_in[0][0]                 
___________________________________________________________________________________________

# 6. Train the network and monitor accuracy on validation

In [16]:
model.fit([trn.User_ID, trn.Article_ID], trn.Rating,
          nb_epoch=10, batch_size=8192,
          validation_data=([val.User_ID, val.Article_ID], val.Rating))

Train on 494544 samples, validate on 123893 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f81f358ce90>

In [17]:
model.predict([val.User_IDer_ID, val.Article_ID])

array([[ 0.07008214],
       [ 1.28627229],
       [-0.33980882],
       ..., 
       [-0.1143198 ],
       [ 1.66368413],
       [ 0.91138458]], dtype=float32)

In [18]:
model.get_weights()[0].shape

(45939, 50)

# 7. Make the network deeper by adding dense layers and re-train the network

Instead of taking a dot product between user embeddings and article embeddings, we could use these embedding features, concatenate them and use them as a feature set for any downstream Machine Learning algorithm that is differentiable. A logistic regression / NN that have a differentiable loss function is a perfect fit.

In [8]:
user_in, u = embedding_input('user_in', n_users, n_factors, 1e-3)
article_in, a = embedding_input('article_in', n_articles, n_factors, 1e-3)

x = merge([u, a], mode='concat')
x = Flatten()(x)

# Dense connections
# x = Dropout(0.5)(x)
x = Dense(500, activation='relu')(x)
# x = Dropout(0.75)(x)
x = Dense(1)(x)

model = Model([user_in, article_in], x)
model.compile(Adam(5e-3), loss='mse', metrics=[rmse])

model.fit([trn.User_ID, trn.Article_ID], trn.Rating,
          nb_epoch=10, batch_size=2048,
          validation_data=([val.User_ID, val.Article_ID], val.Rating))

Train on 494544 samples, validate on 123893 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f3095bb08d0>

In [10]:
user_in, u = embedding_input('user_in', n_users, n_factors, 1e-3)
article_in, a = embedding_input('article_in', n_articles, n_factors, 1e-3)

x = merge([u, a], mode='concat')
x = Flatten()(x)

# Dense connections
# x = Dropout(0.5)(x)
x = Dense(1000, activation='relu')(x)
# x = Dropout(0.75)(x)
x = Dense(1)(x)

model = Model([user_in, article_in], x)
model.compile(Adam(5e-3), loss='mse', metrics=[rmse])

model.fit([trn.User_ID, trn.Article_ID], trn.Rating,
          nb_epoch=10, batch_size=2048,
          validation_data=([val.User_ID, val.Article_ID], val.Rating))

Train on 494544 samples, validate on 123893 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f3073281350>

# Check summary and embeddings !

In [13]:
[x.shape for x in model.get_weights()]

[(45939, 50), (253933, 50), (45939, 1), (253933, 1)]

# Using side information

Often, along with the user-interaction data, other information such as user metadata and item metadata is also given. With the above networks, it's trivial to add this metadata to our model. Let's see how.

In [14]:
# 2. Read the necessary datasets
user = pd.read_csv("../input/user.csv")
article = pd.read_csv("../input/article.csv")
train = pd.read_csv("../input/train.csv"); print(train.shape)
test = pd.read_csv("../input/test.csv"); print(test.shape)

diff = np.setdiff1d(train.User_ID.unique(), test.User_ID.unique())
train = train[~train.User_ID.isin(diff)].reset_index(drop=True) # Drop train-only users.

train = train.merge(user, how='left'); print(train.shape)
train = train.merge(article, how='left'); print(train.shape)
test = test.merge(user, how='left'); print(test.shape)
test = test.merge(article, how='left'); print(test.shape)



# For simplicity, impute with 0.
# Ideally, you should either do mean / median imputation for numeric vars & mode imputation for cat vars.
train = train.fillna(0)
test = test.fillna(0)


# 3. Create the interactions frame
ratings = pd.concat([train, test])

# Scaling numeric columns
from sklearn.preprocessing import scale
ratings.VintageMonths = scale(ratings.VintageMonths)

users = ratings.User_ID.unique()
articles = ratings.Article_ID.unique()
age = ratings.Age.unique()
var1 = ratings.Var1.unique()

# Create userid & itemid to index mappings
userid2idx = {o:i for i,o in enumerate(users)}
articlesid2idx = {o:i for i,o in enumerate(articles)}
age2idx = {o:i for i,o in enumerate(age)}
var12idx = {o:i for i,o in enumerate(var1)}

ratings.Article_ID = ratings.Article_ID.apply(lambda x: articlesid2idx[x])
ratings.User_ID = ratings.User_ID.apply(lambda x: userid2idx[x])
ratings.Age = ratings.Age.apply(lambda x: age2idx[x])
ratings.Var1 = ratings.Var1.apply(lambda x: var12idx[x])

n_users = ratings.User_ID.nunique()
n_articles = ratings.Article_ID.nunique()
n_age = ratings.Age.nunique()
n_var1 = ratings.Var1.nunique()


# 4. Split the frame into train and validation sets
X_train = ratings[0:len(train)]; print(X_train.shape)
X_test = ratings[len(train):len(ratings)]; print(X_test.shape)

# Split the data into train and validation sets
np.random.seed(2017)
msk = np.random.rand(len(X_train)) < 0.8
trn = X_train[msk]
val = X_train[~msk]

(679051, 4)
(291023, 3)
(618437, 6)
(618437, 9)
(291023, 5)
(291023, 8)
(618437, 9)
(291023, 9)


In [15]:
train.head()

Unnamed: 0,User_ID,Article_ID,Rating,ID,Var1,Age,VintageMonths,NumberOfArticlesBySameAuthor,NumberOfArticlesinSameCategory
0,20080828074,1219102233,0,20080828074_1219102233,A,30-40,25.0,88,289
1,20080820760,1219151095,0,20080820760_1219151095,A,30-40,23.0,156,187
2,20080824760,1219295837,5,20080824760_1219295837,A,20-30,9.0,3,159
3,20080820470,1219098705,0,20080820470_1219098705,A,0,19.0,43,503
4,20080821438,1219144384,0,20080821438_1219144384,A,30-40,17.0,39,264


## Add age and other numeric variables

In [12]:
# TODO
user_in, u = embedding_input('user_in', n_users, n_factors, 1e-3)
article_in, a = embedding_input('article_in', n_articles, n_factors, 1e-3)

meta_input_f0 = Input(shape=[1], name='meta_input_f0') # Age
meta_input_f1 = Input(shape=[1], name='meta_input_f1') # NumberOfArticlesBySameAuthor
meta_input_f2 = Input(shape=[1], name='meta_input_f2') # NumberOfArticlesinSameCategory
meta_input_f3 = Input(shape=[1], name='meta_input_f3') # VintageMonths
meta_input_f4 = Input(shape=[1], name='meta_input_f4') # Var1

x = merge([u, a], mode='concat')
x = Flatten()(x)

# Dense connections
x = Dropout(0.5)(x)
x = Dense(1000, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(1)(x)

model = Model([user_in, article_in, meta_input_f0, meta_input_f1, meta_input_f2, meta_input_f3, meta_input_f4], x)
model.compile(Adam(5e-4), loss='mse', metrics=[rmse])

model.fit([trn.User_ID, trn.Article_ID, trn.Age, trn['NumberOfArticlesBySameAuthor\r'], trn['NumberOfArticlesinSameCategory\r'], trn['VintageMonths'], trn['Var1']], trn.Rating,
          nb_epoch=10, batch_size=2048,
          validation_data=([val.User_ID, val.Article_ID, val.Age, val['NumberOfArticlesBySameAuthor\r'], val['NumberOfArticlesinSameCategory\r'], val['VintageMonths'], val['Var1']], val.Rating))

Train on 494544 samples, validate on 123893 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f308b2a63d0>

# But, what if you don't have ratings!!!

## Do you need ratings? I have a lot of logs, API hits, app clickstream data

## Credit where it's due
- A brilliant [fast.ai](course.fast.ai) course by Jeremy and Rachel. Refer to Lesson 4 for Collaborative Filtering lecture.
- Scikit-Learn core member [class](https://m2dsupsdlclass.github.io/lectures-labs/) on deep learning. 
- Reference: Keras [Merge Layer](https://faroit.github.io/keras-docs/1.0.4/getting-started/sequential-model-guide/)

## For those who'd like to get *deeper*
- Deep Recommender models using PyTorch - [Spotlight](https://github.com/maciejkula/spotlight). The [Keras](https://github.com/maciejkula/triplet_recommendations_keras) implementation.
- [YouTube Recommendation Engine](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf)  (Combination of techniques)
- [Google Play Apps Recommendations Engine](https://arxiv.org/pdf/1606.07792.pdf)
- RecSys conference 2017 had a lot of [talks](https://towardsdatascience.com/recsys-2017-2d0879351097) where deep learning was the primary theme. Official reviews [here](https://medium.com/@ACMRecSys/recsys2017-summaries-and-reviews-f2bea3f0e519).