In [2]:
import pandas as pd 
import numpy as np
import scipy 
from lightfm.cross_validation import random_train_test_split
import lightfm
from lightfm.evaluation import auc_score
from lightfm.evaluation import precision_at_k
import warnings
warnings.filterwarnings("ignore")

## Model Building 
From our data exploration and cleaning we have three resultant sparse matrices, one for implicit user interactions based on whether or not an anime exists in a user list, a second for explicit feedback based on ratings given by users, and a third with features specific to the existant anime. All anime is filtered such that only those with greater than 25 user interactions are modeled and predicted. 

### Explicit vs. Implicit
In ranking system modeling a bifurcation between models which are based off what has been come to be known as "Explicit Feedback" and those models which are based off "Implicit Feedback" has developed. Explicit feedback, as the name indicates, is feedback provided by the user toward the item which they interact with, in our case this is the score provided for user anime entries, but in other cases it may be likes, stars, etc. 

Reflexively, implicit feedback is feedback based on non-rating information such as if a user viewed an item, clickthrough rates, % watch completion etc. The data sets which were constructed from our datacleaning use implicit information as the user interactions matrix (whether an anime appears on a users list) and includes explicit information (the user's scores standardized) as the user features. Finally, the item features matrix is based off of the information provided about each anime, studio, airing status, etc. 

### LightFM 
LightFM is a python package which includes a number of useful models/tools for producing recommendation systems in python. The basis for the model is a siamese neural network which takes two embedding layers as input, user and item features. When no item features or user features are presented the model reduces to a simple collaborative filtering model. We will exploit this functions in our model building. 


### Warp, Logistic, and BPR
LightFM includes a variety of different loss functions for model building. The three used in this notebook are WARP, BPR, and logistic. Logistic is the standard cross entropy loss from neural networks of the form: $- \frac{1}{N}\sum_i^n \sum_j^m y_{ij}(log(p_{ij})) $. The problem with logistic loss for recommendation systems is we are not strictly interested in the probability of a user liking any given item, we are instead interested in the ranking of items for a given user. BPR and WARP instead learn model parameters based on the distance between a positively ranked item and negative ranked items resulting in a ranked list of user items. These loss functions then optimize for AUC (the conventional AUC from binary classification) and precision@k respectively, where AUC evaluates the entirety of the ranking list and precision@k the top k entries. For more information on these model evaluation metrics and loss functions you can see this wonderful [blog post](https://www.ethanrosenthal.com/2016/11/07/implicit-mf-part-2/) by Ethan Rosenthal.


In [3]:
user_interactions = scipy.sparse.load_npz("user_interactions.npz")
user_features = scipy.sparse.load_npz("user_features.npz")
item_features = scipy.sparse.load_npz("item_features.npz")


train, test = random_train_test_split(user_interactions)    

### Collaborative Filtering 
We'll start our model building and evaluation with basic collaborative filtering, above we load in the matrices from our data cleaning. Additionally, we split our data using the conventional train-test crossvalidation technique. As noted above we can use LightFM to conduct collaborative filtering by sending only the interactions matrix to the fit method. We'll fit three models one using logistic loss, one with bpr loss, and one with warp loss. We'll test precision@k and AUC on all three also we will make note of training time. 

Results shown below indicate that WARP loss tends to perfom the best on our data set though all loss functions tend to have similar results. In terms of time bpr tends to take the least training time. We can expand our model by adding in item features, we will do this in the next section. 

In [15]:
model = lightfm.LightFM(loss = "logistic")
model.fit(interactions=train) 

train_auc = auc_score(model,
                      train
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                     test
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test,k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.8865798
Hybrid test set AUC: 0.86981326
Precision: train 0.58, test 0.14.


In [31]:
model = lightfm.LightFM(loss = "bpr")
model.fit(interactions=train) 

train_auc = auc_score(model,
                      train
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.8366562
Hybrid test set AUC: 0.82246995
Precision: train 0.59, test 0.14.


In [17]:
model = lightfm.LightFM(loss = "warp")
model.fit(interactions=train) 

train_auc = auc_score(model,
                      train
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.89768
Hybrid test set AUC: 0.8786266
Precision: train 0.61, test 0.15.


### Hybrid Model
We briefly mentioned that LightFM computes its predictions based on a hybrid system, the term hybrid originates from the concept of introducing content based methods for recommendation (recommendations based on the content of the items.) LightFM does this by computing an embedding layer based off an auto-encoder for latent item and user representations. These are then passed through a fully-connected dense layer. Our model evaluation is shown below, we test the same metrics as above and find worse results with just item features. Let's try to improve our model by re-introducing explicit feedback, user scores, in the form of user features. 

In [18]:
model = lightfm.LightFM(loss = "logistic")
model.fit(interactions=train, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, item_features= item_features, k=10).mean()
test_precision = precision_at_k(model, test, item_features = item_features, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.542206
Hybrid test set AUC: 0.5415658
Precision: train 0.00, test 0.00.


In [21]:
model = lightfm.LightFM(loss = "warp")
model.fit(interactions=train, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, item_features= item_features, k=10).mean()
test_precision = precision_at_k(model, test, item_features = item_features, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.6943574
Hybrid test set AUC: 0.6773119
Precision: train 0.20, test 0.05.


In [22]:
model = lightfm.LightFM(loss = "bpr")
model.fit(interactions=train, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, item_features= item_features, k=10).mean()
test_precision = precision_at_k(model, test, item_features = item_features, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.6644745
Hybrid test set AUC: 0.64306194
Precision: train 0.13, test 0.03.


### Adding User Features 


In [28]:
model = lightfm.LightFM(loss = "logistic")
model.fit(interactions=train, user_features = user_features, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features,
                      user_features=user_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features, 
                     user_features = user_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, user_features = user_features, item_features= item_features, k=10).mean()
test_precision = precision_at_k(model, test, user_features = user_features, item_features = item_features, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.5475582
Hybrid test set AUC: 0.54597723
Precision: train 0.00, test 0.00.


In [29]:
model = lightfm.LightFM(loss = "bpr")
model.fit(interactions=train, user_features = user_features, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features,
                      user_features=user_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features, 
                     user_features = user_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, user_features = user_features, item_features= item_features, k=10).mean()
test_precision = precision_at_k(model, test, user_features = user_features, item_features = item_features, k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.66184837
Hybrid test set AUC: 0.6535506
Precision: train 0.14, test 0.03.


In [6]:
model = lightfm.LightFM(loss = "warp")
model.fit(interactions=train, user_features = user_features, item_features = item_features) 

train_auc = auc_score(model,
                      train,
                      item_features=item_features,
                      user_features=user_features
                      ).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
                      test,
                     item_features=item_features, 
                     user_features = user_features
                        ).mean()
print('Hybrid test set AUC: %s' % test_auc)

train_precision = precision_at_k(model, train, user_features = user_features, item_features = item_features, k=10).mean()
test_precision = precision_at_k(model, test, user_features = user_features, item_features=item_features,  k=10).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))

Hybrid training set AUC: 0.60125935
Hybrid test set AUC: 0.59559494
Precision: train 0.07, test 0.02.
