# Recommenders 1

Here we explore simple matrix factorization models for recommendation 

## Collaborative Filtering

On the internet, people rate items (or simply click on them, showing preference). All these ratings can be seen as a matrix coding for all ratings of size (n_user,n_item).It's worth noting that this matrix is highly sparse: No one rates everything, people only rate a subset of items.

Recommendation (more specifically **collaborative filtering**) goal can be seen as predicting how someone will rate one item. This is akin to filling this matrix, by predicting all missing ratings. The most used algorithms for this task are matrix factorization ones: they take advantage from the fact that a matrix can be decomposed into two sub-matrices: one coding for the users and one coding for the items.

## Why predicting ratings and not directly items:

Theoritically, the goal of a recommender system is to cherry pick interesting items for one user within a huge collection. Rating prediction is a surrogate problem of item prediction. To get what item to recommend you can, for exemple, simply sort best rated items. It has been shown that improving rating prediction actually improve item prediction.



## Data: [smallest movie-lens dataset](https://grouplens.org/datasets/movielens/)

In this practical we use a small dataset of user ratings on movies. Specifically, we treat the dataset as list of $(user,item,rating)$ triplets.


## Goals of this practical:

- Visualize different learnt latent item space
- Implement different matrix factorization algorithms for explicit collaborative filtering
- Understand and Implement both prediction and ranking evaluation metrics

----
## Warmup (read & run): Visualize learnt item embeddings with PCA & T-SNE

### Load data

We need read each line from the CSV to extract $(user,item,rating)$ triplets. Also, each user and item id's has to be remapped to be contiguous.

In [1]:
import numpy as np

# Rating matrices are highly sparse (there are many missing ratings).
# Therefore, the use of sparse matrices is recommended.

from scipy.sparse import dok_matrix

# Data file is a CSV
with open("dataset/ml-latest-small/ratings.csv") as ratings:
    rating_data = ratings.readlines()

# We separate the header from actual data
header = rating_data[0]
print(header)

# Data is CSV: userID,itemID,rating,timestamp
rating_data = rating_data[1:] #contains list(tuple(uid,iid,rating,timestamp))


# Helper to transform one "str" line to a tuple
def line2tuple(l):
    uid,iid,rating, timestamp = l.strip().split(",")
    return (int(uid),int(iid),float(rating), int(timestamp))

# generator for one line. (enables direct unpacking)
def tl(l):
    for x in l:
        yield(line2tuple(x))

# We first remap users and item to ids between (0,len(user)) and (0,len(item))
u_dic = {}
i_dic = {}
        
all_data = []
    
    
for uid,iid,rating,ts in tl(rating_data):  # iterating on all data
    uk = u_dic.setdefault(uid,len(u_dic))
    ik = i_dic.setdefault(iid,len(i_dic))
    all_data.append((uk,ik,float(rating)))

num_user = len(u_dic)
num_item = len(i_dic)

print("there are "+str((num_user,num_item)) +" users and items")


userId,movieId,rating,timestamp

there are (610, 9724) users and items


## Visualizing item latent space
Factorizing a matrices into two submatrices yields embeddings of users and items. These embeddings encode the similarity that can exists between users or between items. They can be visualized with algorithms such as PCA and T-SNE. In this first part, we aim at visualizing obtained item embeddings following a factorization on the entire rating matrix.


- To do so, we propose to use the following function `save_embeddings` and [the tensorflow projector](http://projector.tensorflow.org/) to visualize obtained latent features with pca and t-sne.




In [3]:
# This function saves embeddings (a numpy array) and associated labels into tsv files.

def save_embeddings(embs,dict_label,path):
    """
    embs is Numpy.array(N,size)
    dict_label is {str(word)->int(idx)} or {int(idx)->str(word)}
    """
    def int_first(k,v):
        if type(k) == int:
            return (k,v)
        else:
            return (v,k)

    np.savetxt(f"{path}_vectors.tsv", embs, delimiter="\t")

    #labels 
    if dict_label:
        sorted_labs = np.array([lab for idx,lab in sorted([int_first(k,v) for k,v in dict_label.items()])])
        print(sorted_labs)
        with open(f"{path}_metadata.tsv","w") as metadata_file:
            for x in sorted_labs: #hack for space
                if len(x.strip()) == 0:
                    x = f"space-{len(x)}"
                    
                metadata_file.write(f"{x}\n")
                

### Factorization on full rating matrix: decompose rating matrix with NMF and visualize items
Three things to do:
    - Create sparse matrix of ratings (from $(user,item,rating)$ tuples)
    - Factorize it with NMF into U (user sub-matrix) and I (item sub-matrix)
    - Extract item labels (movie titles) and save embeddings/labels with preceding function 

In [None]:
from sklearn.decomposition import NMF

num_user = len(u_dic)
num_item = len(i_dic)

# (1) Create sparse matrix from all ratings
Full = dok_matrix((num_user, num_item), dtype=np.float32)

for uid,iid,rating in all_data:
    Full[uid,iid] = float(rating)
    
    
# (2) Factorizing matrix

model = NMF(n_components=25, init='random', random_state=0, max_iter=350)
U = model.fit_transform(Full) #users
I = model.components_      #items

I = I.transpose()
I.shape

# (3) Loading labels and saving embeddings + vectors


# data is csv with header
with open("dataset/ml-latest-small/movies.csv") as movies:
    movie_data = movies.readlines()

# print header
print(movie_data[0])

# id-> title dictionnary
movie_names = {}

for x in movie_data[1:]:
    iid, title, *rest =  x.split(',')
    
    iid = float(iid)
    

    if iid in i_dic:
        movie_names[i_dic[iid]] = title


####################################
 
# Saving into "item_50_vectors.tsv" and "item_50_metadata.tsv".
save_embeddings(I,movie_names,"item_50")



### Visualize on TensorFlow Projector:
[Saved vectors/label can be visualized in tensorflow projector](http://projector.tensorflow.org/)

> Files are saved by default as "item_50_vectors.tsv" and "item_50_metadata.tsv" in root jupyter directory

-------------------------


# (Explicit) Collaborative Filtering  by predicting ratings:

The Netflix competition introduced this predictive framework: the goal is to predict missing ratings using existing ones.

##  Vanilla NMF AND SVD for rating prediction

First, we test two simple models:
- Non Negative Matrix Factorization
- SVD

Given a real rating $r_{ui}$ and a predicted rating $\hat{r_{ui}}$ on an **observed** couple $(u,i)$ 

Rating prediction's goal is to minimize mean the prediction error over each $N$ **observed** ratings
## $$\min\limits_{U,I} \frac{1}{N}\sum\limits_{(u,i)} (r_{ui} -  \hat{r_{ui}})^2 $$


### ** Building Train/Test set **

To train/test our models, we first split our data into two parts

In [None]:
# We take 10% of the train set as test data
train_mat = dok_matrix((num_user, num_item), dtype=np.float32)
test = []
train = []
    
for i,(uid,iid,rating) in enumerate(all_data):
    if i%10 == 0: #one out of 10 is for test
        test.append((uid,iid,rating))
    else:
        train.append((uid,iid,rating))
        train_mat[uid,iid] = rating
    
print("Number of train examples: ", train_mat.nnz)
print("Number of test examples: ", len(test))


## ** Evaluating error **:

The goal is to minimize the mean difference between the $n$ predicted and true ratings (respectively $\hat{r_{ui}}, r_{ui}$). 

Two common metrics are the Mean Squared Error (MSE) $$\frac{1}{n}\sum^1_n (r_{ui}-\hat{r_{ui}})^2$$ and the Mean Absolute Error (MAE) $$\frac{1}{n}\sum^1_n abs(r_{ui}-\hat{r_{ui}})$$.

## 1 ) Complete error functions (TODO)

In [6]:
# take as input two lists of ratings

def MSE_err(truth,pred):
    """
    computes MSE from real-pred difference
    """
    return np.mean(np.power(np.array(truth-pred),2))

def MAE_err(truth,pred):
    """
    computes MAE from real-pred difference
    """
    return np.mean(np.abs(np.array(truth-pred)))
        




Here, we propose have the simple following **prediction rule**: the predicted rating is simply the dot product of the considered user and item embeddings

## $$\hat{r_{ui}} = <U_u^\top.I_i> $$

This is akin to decomposing the training matrix into two sub-matrices using SVD.

## Pointers on what to do: 

1. complete prediction function
- fit nmf model and get User and Item profiles to make a prediction
- fit svd model and get User and Item profiles to make a prediction

In [7]:
from sklearn.decomposition import NMF, TruncatedSVD


print("----------------------NMF---------------------------")



## NMF model
model = NMF(n_components=100, solver='cd' ,random_state=0, max_iter=100,alpha=5,l1_ratio=0.5)

#get submatrices
U_nmf = model.fit_transform(train_mat)
I_nmf = model.components_.transpose()

print(U_nmf.shape)
print(I_nmf.shape)

def pred_func_nmf(uid,iid):
    
    Uu = U_nmf[uid]
    Ii = I_nmf[iid]
    
    return np.dot(Uu,Ii)  


## Getting the truth values
truth_tr = np.array([rating for (uid,iid),rating in train_mat.items()])
truth_te = np.array([rating for uid,iid,rating in test])

prediction_tr = np.array([pred_func_nmf(u,i) for (u,i),rating in train_mat.items()])
prediction_te = np.array([pred_func_nmf(u,i) for u,i,rating in test])


print("Training Error:")
print("MSE:",  MSE_err(prediction_tr,truth_tr))
print("MAE:",  MAE_err(prediction_tr,truth_tr))
    
print("Test Error:")
print("MSE:",  MSE_err(prediction_te,truth_te))
print("MAE:",  MAE_err(prediction_te,truth_te))




----------------------NMF---------------------------
(610, 100)
(9724, 100)
Training Error:
MSE: 3.6961360604156157
MAE: 1.313830754659812
Test Error:
MSE: 10.425430823030808
MAE: 3.0319268899620258


In [8]:
print("----------------------SVD---------------------------")

## SVD Model

model = TruncatedSVD(n_components=150)

#get submatrices
U_svd = model.fit_transform(train_mat)
I_svd = model.components_.transpose()


def pred_func_svd(uid,iid):
    
    Uu = U_svd[uid]
    Ii = I_svd[iid]
    
    return np.dot(Uu,Ii)  

    
prediction_tr = np.array([pred_func_svd(u,i) for (u,i),rating in train_mat.items()])
prediction_te = np.array([pred_func_svd(u,i) for u,i,rating in test])


print("Training Error:")
print("MSE:",  MSE_err(prediction_tr,truth_tr))
print("MAE:",  MAE_err(prediction_tr,truth_tr))
    
print("Test Error:")
print("MSE:",  MSE_err(prediction_te,truth_te))
print("MAE:",  MAE_err(prediction_te,truth_te))




----------------------SVD---------------------------
Training Error:
MSE: 1.6994271
MAE: 0.7944529
Test Error:
MSE: 11.761561385214991
MAE: 3.2461926790194213


### Save and visualize nmf & svd embeddings:

In [9]:
save_embeddings(I_svd,movie_names,"svd")
save_embeddings(I_nmf,movie_names,"nmf")

['Toy Story (1995)' 'Grumpier Old Men (1995)' 'Heat (1995)' ...
 'Hazard (2005)' 'Blair Witch (2016)' '31 (2016)']
['Toy Story (1995)' 'Grumpier Old Men (1995)' 'Heat (1995)' ...
 'Hazard (2005)' 'Blair Witch (2016)' '31 (2016)']


----------------

# Normalization and Regularization


Trying to decompose a sparse matrix this way is highly prone to overfitting: The train error is low, but the test error is much higher. Traditionnally, models are highly regularized to achieve better performances.

$$
\min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  I_i^\top U_u)^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2)}_\text{regularization}
$$

Here we were trying to factorize rating patterns within latent profiles. However, it is common to mean normalize each rating to model "mean deviation" instead. In the next part of this practical session, we will explore two mean normalization techniques for rating prediction. 

**Mean normalization** is a common way to improve performance in rating prediction. In fact, the global average is one of the simplest baseline to try when doing rating prediction


## (TODO) Mean baseline:


To get the mean baseline:

- (1) compute training set mean
- (2) complete prediction function

The prediction rule simply is:
## $$\hat{r_{ui}} = \mu  $$

In [10]:
print("----------------------MEAN ONLY---------------------------")


# compute mean of list(train_mat.values())
mean = np.mean([r for _,r in train_mat.items()])


def pred_func_mean(uid,iid):    
    return mean

print("mean rating is ", mean)


prediction_tr = np.array([pred_func_mean(u,i) for (u,i),rating in train_mat.items()])
prediction_te = np.array([pred_func_mean(u,i) for u,i,rating in test])

print("Training Error:")
print("MSE:",  MSE_err(prediction_tr,truth_tr))
print("MAE:",  MAE_err(prediction_tr,truth_tr))
    
print("Test Error:")
print("MSE:",  MSE_err(prediction_te,truth_te))
print("MAE:",  MAE_err(prediction_te,truth_te))

----------------------MEAN ONLY---------------------------
mean rating is  3.5020936
Training Error:
MSE: 1.0865998
MAE: 0.8272643
Test Error:
MSE: 1.0891692158238202
MAE: 0.8258433086632831


Indeed, using only the mean ratings to predict future ratings has much MUCH better performances than our two previous models...

### (TODO) Taking the mean rating into account:

Most work on rating prediction model mean ratings. The goal is to only factor deviation from the mean instead of modeling the full rating pattern.

To take into account the mean, the easiest way is to substract it by doing the following:

- (1) compute training set mean
- (2) remove mean from training rating matrix
- (3) factorize the normalized matrix
- (4) when predicting a rating, simply add the mean

The prediction rule becomes:
## $$\hat{r_{ui}} = \mu  + <U_u^\top.I_i> $$

In [11]:
from sklearn.decomposition import NMF, TruncatedSVD


# (1) compute mean of list(train_mat.values())
mean = mean




# (2) remove mean from training matrix
tmn = dok_matrix((num_user, num_item), dtype=np.float32)

for (uid,iid), rating in train_mat.items():
    tmn[uid,iid] = rating - mean

# (3) factorize matrix
model_norm = TruncatedSVD(n_components=150)

#get submatrices
U_msvd = model.fit_transform(tmn)
I_msvd = model.components_.transpose()

def pred_func_msvd(uid,iid): 
    
    Uu = U_msvd[uid]
    Ii = I_msvd[iid]
    
    return np.dot(Uu,Ii) + mean


prediction_tr = np.array([pred_func_msvd(u,i) for (u,i),rating in train_mat.items()])
prediction_te = np.array([pred_func_msvd(u,i) for u,i,rating in test])


print("Training Error:")
print("MSE:",  MSE_err(prediction_tr,truth_tr))
print("MAE:",  MAE_err(prediction_tr,truth_tr))
    
print("Test Error:")
print("MSE:",  MSE_err(prediction_te,truth_te))
print("MAE:",  MAE_err(prediction_te,truth_te))
    
    


Training Error:
MSE: 0.16967031
MAE: 0.22270362
Test Error:
MSE: 1.0510823654476742
MAE: 0.8103558682865777


## (Seminal) Koren 2009 model:

the following model propose to even more refine the bias. In addition to the mean bias, it adds a bias for each user and for each item. The goal is to only factorize the mean rating deviation. This model also enables a more finely grained baseline estimate:


> Suppose that you want
a first-order estimate for user Joe’s rating of the movie
Titanic. Now, say that the average rating over all movies, µ,
is 3.7 stars. Furthermore, Titanic is better than an average
movie, so it tends to be rated 0.5 stars above the average.
On the other hand, Joe is a critical user, who tends to rate
0.3 stars lower than the average. Thus, the estimate for
Titanic’s rating by Joe would be 3.9 stars (3.7 + 0.5 - 0.3)

The prediction rule is the following:

### $$r_{ui} = \mu + \mu_i + \mu_u + U_u^\top .I_i $$



In [16]:
from sklearn.decomposition import NMF, TruncatedSVD


def group_by_user(tuple_list):
    r_dic = {}
    for uid,iid,rating in tuple_list:
        list_rev = r_dic.get(uid,[])
        list_rev.append(rating)
    
        r_dic[uid] =list_rev
    return r_dic


def group_by_item(tuple_list):
    r_dic = {}
    for uid,iid,rating in tuple_list:
        list_rev = r_dic.get(iid,[])
        list_rev.append(rating)
    
        r_dic[iid] =list_rev
    return r_dic





# (1) compute mean of list(train_mat.values())
mean = mean


u_means = {u:(np.mean(ratings - mean)) for u,ratings in group_by_user(train).items()}
i_means = {i:(np.mean(ratings) - mean) for i,ratings in group_by_item(train).items()}




# (2) remove means from training matrix
tmn_k = dok_matrix((num_user, num_item), dtype=np.float32)

for (uid,iid), rating in train_mat.items():
    tmn_k[uid,iid] = rating - mean - u_means.get(uid,0) - i_means.get(iid,0)

# (3) factorize matrix
model_kor = TruncatedSVD(n_components=150)


U_ksvd = model.fit_transform(tmn_k)
I_ksvd = model.components_.transpose()

def pred_func_ksvd(uid,iid):
    Uu = U_ksvd[uid]
    Ii = I_ksvd[iid]
    
    return np.dot(Uu,Ii) + mean + u_means.get(uid,0) + i_means.get(iid,0)


prediction_tr = np.array([pred_func_ksvd(u,i) for (u,i),rating in train_mat.items()])
prediction_te = np.array([pred_func_ksvd(u,i) for u,i,rating in test])


print("Training Error:")
print("MSE:",  MSE_err(prediction_tr,truth_tr))
print("MAE:",  MAE_err(prediction_tr,truth_tr))
    
print("Test Error:")
print("MSE:",  MSE_err(prediction_te,truth_te))
print("MAE:",  MAE_err(prediction_te,truth_te))
    
    


Training Error:
MSE: 0.11655318919197581
MAE: 0.1822708474618395
Test Error:
MSE: 0.8050965983705777
MAE: 0.6806878934807243


### Save and visualize nmf & svd embeddings:

In [12]:
save_embeddings(###,movie_names,"svd")
save_embeddings(###,movie_names,"nmf")

SyntaxError: unexpected EOF while parsing (<ipython-input-12-8f64c2d03bdb>, line 2)

This is, by far, the best model (test mse $\approx$ 0.804) .


### Takeways:

In general, when predicting ratings, it is critical to model rating bias. Its also worth noting that other biases exists. Notably, people don't rate items at random. Taking into account **what** people rate is at least as important as taking into account **how** people rate. See [SVD++](http://www.cs.rochester.edu/twiki/pub/Main/HarpSeminar/Factorization_Meets_the_Neighborhood-_a_Multifaceted_Collaborative_Filtering_Model.pdf), a model which models, in addition to rating biais, implicit information.

------------

# Collaborative Filtering and Ranking Metrics

Most of the time recommender systems present $k$ items to the users. Therefore, recommenders are often evaluated by how many relevant items are in their $k$ set (using ranking metrics).

Different ranking methods exists such as the [Mean Reciprocal Rank](http://en.wikipedia.org/wiki/Mean_reciprocal_rank) or the [normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain)

Here we focus on the latter as it can take into account rating scores.

## (TODO) Implementing nDCG

In [17]:


# The dcg@k is the sum of the relevance, penalized gradually
def dcg_at_k(r, k):
    """Score is discounted cumulative gain (dcg)
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
        
    """
    r = np.asfarray(r)[:k]
    if r.size:
        return np.sum(r / np.log2(np.arange(2, r.size + 2)))
        
    return 0.

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# dcg_at_k(r, 1) => 3.0
# dcg_at_k(r, 2) => 4.2618595071429155
    

# And it's normalized version
def ndcg_at_k(r, k):
    """
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
    """
    dcg_max =  dcg_at_k(sorted(r)[::-1],k)  # TO COMPLETE 
    if not dcg_max:
        return 0.
    return dcg_at_k(r, k) / dcg_max

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# ndcg_at_k(r, 1) => 1.0
# ndcg_at_k(r, 4) => 0.794285
    
r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]    
ndcg_at_k(r, 4)   

0.7942854176010882

## (TODO) compute nDCG on test lists.

One first thing to do is to compute the nDCG on our tests list. This enables us to answer the following question:

- Can our model properly sort what the user has seen ? (But hasn't been train on)

To compare ourselves, we first build a random baseline which simply shuffles test ratings.


=> What is nDCG for every model we tried ? ?


### Pointers:

Here, we first compute average ndcg on **test set only**. The goal is simply to see how well our model sorts what has been seen. Here are the steps to follow:

1. We group each test set $(user,item,rating)$ by user -> this gives you a list of items per user
2. Then, using previous models prediction functions, you predict each items' rating
3. Using those provided ratings, you can sort the items
4. Using these sorted items, you can have the sorted list of real ratings to compute ndcg.


In [62]:
from random import shuffle

#1) Group (uid,iid,rating) per uid
def group_by_user(tuple_list):
    r_dic = {}
    for uid,iid,rating in tuple_list:
        list_rev = r_dic.get(uid,[])
        list_rev.append((uid,iid,rating))
    
        r_dic[uid] =list_rev
    return r_dic #returns {uid:[(uid,iid,rating),...],...}



userg_train = group_by_user(train)  #returns {uid:[(uid,iid,rating),...],...}
userg_test = group_by_user(test)


# Function to compute a random shuffle ndcg
def random_ndcg(uid_group_tuples,k=10):
    mean_ndcg = 0
    num_users = 0
    
    #for each test set
    for _,list_rating in uid_group_tuples.items():

        #shuffle real ratings.
        real_ratings = [rating for uid,iid,rating in list_rating]
        shuffle(real_ratings)
        pred_objects = real_ratings

        mean_ndcg += ndcg_at_k(pred_objects,k)
        num_users += 1

    return  mean_ndcg/num_users



#Function to compute ndcg on test set
def mean_ndcg_UI(U,I,pred_function,uid_group_tuples,k=10):
    mean_ndcg = 0
    num_users = 0
    
    #for each test set
    for _,list_rating in uid_group_tuples.items():
        
        #2)compute predictions
        pred_ratings = [(pred_function(uid,iid)) for uid,iid,rating in list_rating]
        
        #3)to sort real ratings
        real_ratings = [rating for uid,iid,rating in list_rating]
        pred_objects = [ real_ratings[rid] for rid in np.argsort(pred_ratings)[::-1]]
        
        #4)and compute ndcg
        mean_ndcg += ndcg_at_k(pred_objects,k)
        num_users += 1

    return  mean_ndcg/num_users
    
    

print("train") 
print("-"*10)
print("mean == random", random_ndcg(userg_train))
print("ndcg nmf", mean_ndcg_UI(U_nmf,I_nmf,pred_func_nmf,userg_train)) 
print("ndcg svd", mean_ndcg_UI(U_svd,I_svd,pred_func_svd,userg_train))
print("ndcg svd + mean", mean_ndcg_UI(U_msvd,I_msvd,pred_func_nmf,userg_train))
print("ndcg svd koren", mean_ndcg_UI(U_ksvd,I_ksvd,pred_func_ksvd,userg_train))

print(" ")

print("test")    
print("-"*10) 
print("mean == random", random_ndcg(userg_test))
print("ndcg nmf", mean_ndcg_UI(U_nmf,I_nmf,pred_func_nmf,userg_test)) 
print("ndcg svd", mean_ndcg_UI(U_svd,I_svd,pred_func_svd,userg_test))
print("ndcg svd + mean", mean_ndcg_UI(U_msvd,I_msvd,pred_func_nmf,userg_test))
print("ndcg svd koren", mean_ndcg_UI(U_ksvd,I_ksvd,pred_func_ksvd,userg_test))






train
----------
mean == random 0.7619539212066418
ndcg nmf 0.8908062109038278
ndcg svd 0.9271868672687147
ndcg svd + mean 0.8908062109038278
ndcg svd koren 0.9736244669125971
 
test
----------
mean == random 0.8923979008671923
ndcg nmf 0.9277563420114506
ndcg svd 0.9140889738153642
ndcg svd + mean 0.9277563420114506
ndcg svd koren 0.9319512282311233


## nDCG on full item set
When computing the nDCG on the test list, you assume that you already know what the user will rate. In a real setting, this information is unknown. All you know is what the user already rated and it's up to you to find both  **what** and **how** people will rate.

One obvious baseline here is the **item popularity**, where we simply rank items by their number of ratings. Unseen items in test set are considered not interesting and have a score of 0

Let's see how our model perform in terms of nDCG on the full (unseen) item set:

### Pointers:

Here, we compute average ndcg on **unseen item set**. The goal is simply to see how well our model sorts what has not been seen. Here are the steps to follow:

1. Using previous models prediction functions, you predict each unseen items' rating
3. Using those provided ratings, you can sort the items
4. Using these sorted items, you can have the sorted list of real ratings to compute ndcg.

In [71]:
from collections import Counter

counts = Counter(iid for _,iid,_ in train)

#the popularity predictor
def pop_pred(uid,iid):
    return counts[iid]

#Random ndcg, return a shuffled lists of all possible ratings
def random_ndcg_full(k=10,default=0):
    mean_ndcg = 0
    num_users = 0
    
    for _,list_rating in userg_test.items():

        #all possible ratings
        real_ratings = [rating for uid,iid,rating in list_rating] + [default]*(num_item - len(list_rating) - len(userg_train[uid]))
        shuffle(real_ratings)
        pred_objects = real_ratings
        
        mean_ndcg += ndcg_at_k(pred_objects,k)
        num_users += 1

    return  mean_ndcg/len(userg_test)


def mean_ndcg_UI_FULL(U,I,pred_function,k=10,default=0):
    mean_ndcg = 0
    
    test_users = set(uid for uid,_,_ in test)
    
    for user in test_users:
        u_train_set = set(iid for _,iid,_ in userg_train[uid])
        u_test_dic =  {iid:rating for uid,iid,rating in userg_test[uid]}
        
        pred_ratings = []
        real_ratings = []
        
        
        for item in range(num_item):
            
            if item in u_train_set:
                continue
            else:
                p_rating = pred_function(user,item)
            
            pred_ratings.append(p_rating)
            real_ratings.append(u_test_dic.get(item,default))
            
        pred_objects = [ real_ratings[rid] for rid in np.argsort(pred_ratings)[::-1]]
        
        
        mean_ndcg += ndcg_at_k(pred_objects,k)

    return  mean_ndcg/len(test_users)

print("mean == random", random_ndcg_full())
print("ndcg pop", mean_ndcg_UI_FULL(U_nmf,I_nmf,pop_pred)) 
print("ndcg nmf", mean_ndcg_UI_FULL(U_nmf,I_nmf,pred_func_nmf)) 
print("ndcg svd", mean_ndcg_UI_FULL(U_svd,I_svd,pred_func_svd))
print("ndcg svd + mean", mean_ndcg_UI_FULL(U_msvd,I_msvd,pred_func_nmf))
print("ndcg svd koren", mean_ndcg_UI_FULL(U_msvd,I_msvd,pred_func_ksvd))


mean == random 0.0010350278796436149
ndcg pop 0.42951812056426664
ndcg nmf 0.33706435442735555
ndcg svd 0.25726680207566305
ndcg svd + mean 0.33706435442735555
ndcg svd koren 0.023264815183591633


Obviously the random baseline is way out. However, more surprisingly, our best model for rating prediction has the worst score when computing such nDCG, while it had the best one one unseen data. 

## Takeways:

When doing recommendation you have two tasks: 

- predicting how someone will like something (this is the rating prediction)
- predicting what someone will see

Matrix factorization algorithms trained on rating prediction **without** taking into account what has been rated do not perform well on predicting what will be seen. Different algorithms exists to solve the task of predicting **what** will be seen. The most famous one is [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/abs/1205.2618). 



## (Bonus) w2v as a recommender:

Word2vec can be used to encode item co-occurences. Then, using the cosine similarity you can predict which items are similar to seen training items. This is a simple yet effective way to provide item recommendations


### 1) First, we train a word2vec model where the text is replaced by items id.

In [72]:
import gensim
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

#words are in fact item ids
text = [ [str(iid) for _,iid,rating in l if rating >= 3] for u,l in userg_train.items()]

# the following configuration is the default configuration
w2v = gensim.models.word2vec.Word2Vec(sentences=text,
                                size=100, window=2,               ### here we train a cbow model 
                                min_count=3,                      
                                sample=0.001, workers=3,
                                sg=1, hs=0, negative=15,        ### set sg to 1 to train a sg model
                                cbow_mean=0,
                                iter=5)

2018-12-07 12:22:46,374 : INFO : collecting all words and their counts
2018-12-07 12:22:46,375 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-12-07 12:22:46,386 : INFO : collected 6037 word types from a corpus of 43730 raw words and 610 sentences
2018-12-07 12:22:46,387 : INFO : Loading a fresh vocabulary
2018-12-07 12:22:46,402 : INFO : min_count=3 retains 2648 unique words (43% of original 6037, drops 3389)
2018-12-07 12:22:46,402 : INFO : min_count=3 leaves 39443 word corpus (90% of original 43730, drops 4287)
2018-12-07 12:22:46,410 : INFO : deleting the raw counts dictionary of 6037 items
2018-12-07 12:22:46,411 : INFO : sample=0.001 downsamples 33 most-common words
2018-12-07 12:22:46,411 : INFO : downsampling leaves estimated 38502 word corpus (97.6% of prior 39443)
2018-12-07 12:22:46,412 : INFO : estimated required memory for 2648 words and 100 dimensions: 3442400 bytes
2018-12-07 12:22:46,418 : INFO : resetting layer weights
2018-12-07 12:22:4

### 2) The similarity can then be used to find similar items

In [73]:

def get_similar_ids(iid,num=5):
    if str(iid) in w2v.wv.vocab:
        return [int(iid) for iid,_ in w2v.most_similar(str(iid),topn=num)] 
    else:
        return []

get_similar_ids(1)

2018-12-07 12:22:55,473 : INFO : precomputing L2-norms of word weight vectors


[484, 502, 294, 259, 2663]

In [75]:

def mean_ndcg_w2v_FULL(n=2,k=10,default=0):
    mean_ndcg = 0
    
    test_users = set(uid for uid,_,_ in test)
    
    for user in test_users:
        u_train_set = set(iid for _,iid,rating in userg_train[uid] if rating >= 4)
        u_test_dic =  {iid:rating for uid,iid,rating in userg_test[uid]}
        
        pred_items = set(iid for ids in u_train_set for iid in get_similar_ids(ids,n) if iid not in u_train_set)
        not_pred = set(iid for x in range(num_item) if (x not in u_train_set and x not in pred_items))
        pred_items = list(pred_items) 

        pred_objects = [ u_test_dic.get(rid,default) for rid in list(pred_items) + list(not_pred)]
        
        
        mean_ndcg += ndcg_at_k(pred_objects,k)

    return  mean_ndcg/len(test_users)

print("ndcg w2v simple", mean_ndcg_w2v_FULL())
    

ndcg w2v simple 0.13632320547928628


### Takeaways 
This works ok, but still not better than popularity baseline. Our dataset is small and our method to get items from similarity is far too simple.


### Finally:
Try and visualize learnt embeddings (is there something to see ?)

In [None]:
###save_embeddings(,movie_names,"w2v")