# Siamese Neural Network Recommendation for Friends (for Website)

This notebook presents the final code that will be used for the Movinder [website](https://movinder.herokuapp.com/) when `Get recommendation with SiameseNN!` is selected by user.

In [1]:
import pandas as pd
import json
import datetime, time
from sklearn.model_selection import train_test_split
import itertools
import os
import zipfile
import random
import numpy as np

import requests
import matplotlib.pyplot as plt

import scipy.sparse as sp
from sklearn.metrics import roc_auc_score

---

## (1) Read data

In [2]:
movies = json.load(open('movies.json'))
friends = json.load(open('friends.json'))
ratings = json.load(open('ratings.json')) 
soup_movie_features = sp.load_npz('soup_movie_features_11.npz')
soup_movie_features = soup_movie_features.toarray()

## (1.2) Simulate new friend's input

The new group of friends will need to provide information that will be later used for training the model and predicting the ratings they will give to other movies. The friends will have a new id `new_friend_id`. They will provide a rating specified in the dictionary with the following keys: `movie_id_ml` (id of the movie rated), `rating` (rating of that movie on the scale from 1 to 5), and `friend_id` that will be the friends id specified as `new_friend_id`. In addition to this rating information, the users will have to provide to the system the information that includes their average age in the group `friends_age` and gender `friends_gender`.

In [3]:
new_friend_id = len(friends)

In [4]:
new_ratings = [{'movie_id_ml': 302.0, 'rating': 4.0, 'friend_id': new_friend_id},
              {'movie_id_ml': 304.0, 'rating': 4.0, 'friend_id': new_friend_id},
              {'movie_id_ml': 307.0, 'rating': 4.0, 'friend_id': new_friend_id}]
new_ratings

[{'movie_id_ml': 302.0, 'rating': 4.0, 'friend_id': 191},
 {'movie_id_ml': 304.0, 'rating': 4.0, 'friend_id': 191},
 {'movie_id_ml': 307.0, 'rating': 4.0, 'friend_id': 191}]

In [5]:
new_friend = {'friend_id': new_friend_id, 'friends_age': 25.5, 'friends_gender': 0.375}
new_friend

{'friend_id': 191, 'friends_age': 25.5, 'friends_gender': 0.375}

In [6]:
# extend the existing data with this new information
friends.append(new_friend)
ratings.extend(new_ratings)

---

## (2) Train the LightFM Model

We will be using the [LightFM](http://lyst.github.io/lightfm/docs/index.html) implementation of SiameseNN to train our model using the user and item (i.e. movie) features. First, we create `scipy.sparse` matrices from raw data and they can be used to fit the LightFM model.

In [7]:
from lightfm.data import Dataset
from lightfm import LightFM
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score

## (2.1) Build ID mappings

We create a mapping between the user and item ids from our input data to indices that will be internally used by this model. This needs to be done since the LightFM works with user and items ids that are consecutive non-negative integers. Using `dataset.fit` we assign internal numerical id to every user and item we passed in.

In [8]:
dataset = Dataset()

item_str_for_eval = "x['title'],x['release'], x['unknown'], x['action'], x['adventure'],x['animation'], x['childrens'], x['comedy'], x['crime'], x['documentary'], x['drama'],  x['fantasy'], x['noir'], x['horror'], x['musical'],x['mystery'], x['romance'], x['scifi'], x['thriller'], x['war'], x['western'], *soup_movie_features[x['soup_id']]"
friend_str_for_eval = "x['friends_age'], x['friends_gender']"



In [9]:
dataset.fit(users=(int(x['friend_id']) for x in friends),
            items=(int(x['movie_id_ml']) for x in movies),
            item_features=(eval("("+item_str_for_eval+")") for x in movies),
            user_features=((eval(friend_str_for_eval)) for x in friends))

num_friends, num_items = dataset.interactions_shape()
print(f'Mappings - Num friends: {num_friends}, num_items {num_items}.')

Mappings - Num friends: 192, num_items 1251.


## (2.2) Build the interactions and feature matrices

The `interactions` matrix contains interactions between `friend_id` and `movie_id_ml`. It puts 1 if friends `friend_id` rated movie `movie_id_ml`, and 0 otherwise.

In [10]:
(interactions, weights) = dataset.build_interactions(((int(x['friend_id']), int(x['movie_id_ml']))
                                                      for x in ratings))

print(repr(interactions))

<192x1251 sparse matrix of type '<class 'numpy.int32'>'
	with 59123 stored elements in COOrdinate format>


The `item_features` is also a sparse matrix that contains movie ids with their corresponding features. In the item features, we include the following features: movie title, when it was released, all genres it belongs to, and vectorized representation of movie keywords, cast members, and countries it was released in.

In [11]:
item_features = dataset.build_item_features(((x['movie_id_ml'], 
                                              [eval("("+item_str_for_eval+")")]) for x in movies) )
print(repr(item_features))

<1251x2487 sparse matrix of type '<class 'numpy.float32'>'
	with 2502 stored elements in Compressed Sparse Row format>


The `user_features` is also a sparse matrix that contains movie ids with their corresponding features. The user features include their age, and gender.

In [12]:
user_features = dataset.build_user_features(((x['friend_id'], 
                                              [eval(friend_str_for_eval)]) for x in friends) )
print(repr(user_features))

<192x342 sparse matrix of type '<class 'numpy.float32'>'
	with 384 stored elements in Compressed Sparse Row format>


## (2.3) Building a model

After some hyperparameters tuning, we end up to having the best model performance with the following values:

- Epocks = 150
- Learning rate = 0.015
- Max sampled = 11
- Loss type = WARP

References:
- The WARP (Weighted Approximate-Rank Pairwise) lso for implicit feedback learning-rank. Originally implemented in [WSABIE paper](http://www.thespermwhale.com/jaseweston/papers/wsabie-ijcai.pdf).
- Extension to apply to recommendation settings in the 2013 k-order statistic loss [paper](http://www.ee.columbia.edu/~ronw/pubs/recsys2013-kaos.pdf) in the form of the k-OS WARP loss, also implemented in LightFM.

In [13]:
epochs = 150
lr = 0.015
max_sampled = 11

loss_type = "warp"  # "bpr"


model = LightFM(learning_rate=lr, loss=loss_type, max_sampled=max_sampled)

model.fit_partial(interactions, epochs=epochs, user_features=user_features, item_features=item_features)
train_precision = precision_at_k(model, interactions, k=10, user_features=user_features, item_features=item_features).mean()

train_auc = auc_score(model, interactions, user_features=user_features, item_features=item_features).mean()

print(f'Precision: {train_precision}, AUC: {train_auc}')


Precision: 0.9588541984558105, AUC: 0.9013209342956543


In [14]:
def predict_top_k_movies(model, friends_id, k):
    n_users, n_movies = train.shape
    if use_features:
        prediction = model.predict(friends_id, np.arange(n_movies), user_features=friends_features, item_features=item_features)#predict(model, user_id, np.arange(n_movies))
    else:
        prediction = model.predict(friends_id, np.arange(n_movies))#predict(model, user_id, np.arange(n_movies))
    
    movie_ids = np.arange(train.shape[1])
    return movie_ids[np.argsort(-prediction)][:k]

In [15]:
dfm = pd.DataFrame(movies)
dfm = dfm.sort_values(by="movie_id_ml")

In [16]:
k = 10
friends_id = new_friend_id
movie_ids = np.array(dfm.movie_id_ml.unique())#np.array(list(df_movies.movie_id_ml.unique())) #np.arange(interactions.shape[1])
print(movie_ids.shape)

n_users, n_items = interactions.shape

scores = model.predict(friends_id, np.arange(n_items), user_features=user_features, item_features=item_features)
# scores = model.predict(friends_id, np.arange(n_items))

known_positives = movie_ids[interactions.tocsr()[friends_id].indices]
top_items = movie_ids[np.argsort(-scores)]

print(f"Friends {friends_id}")
print("     Known positives:")

for x in known_positives[:k]:
    print(f"        {x} | {dfm[dfm.movie_id_ml==x]['title'].iloc[0]}" )
    
print("     Recommended:")
for x in top_items[:k]:
    print(f"        {x} | {dfm[dfm.movie_id_ml==x]['title'].iloc[0]}" )

(1251,)
Friends 191
     Known positives:
        301 | in & out
        302 | l.a. confidential
        307 | the devil's advocate
     Recommended:
        48 | hoop dreams
        292 | rosewood
        255 | my best friend's wedding
        286 | the english patient
        284 | tin cup
        299 | hoodlum
        125 | phenomenon
        1 | toy story
        315 | apt pupil
        7 | twelve monkeys


This is an example of recommended movies output that will be used in the website to give users a movie recommendation based on the information they supplied to the model.

Movinder website: [https://movinder.herokuapp.com/](https://movinder.herokuapp.com/).