# Implementing PageRank
During this practical session, you will implement PageRank for recommendation.

PageRank is decomposed in several steps:
- Create a toy dataset
- Build the item graph
- Implement PageRank
- Run PageRank on the dataset to validate its behavior
- Implement a recommender based on the PageRank
- Adapt the MovieLens validation to use Recall as a validation metric
- Compare it to other approches

## Building a toy dataset
In MovieLens, we know what movies users have seen and how they rated them. For each user, we know when the movie has been seen and what is the rating he gave to it. There are lot of questions here:
- directed on undirected? We have the timestamp of movies so we could have directed edges based on time.
- threshold or not? Even if a user has disliked a movie, he has seen it so it means it matched his interests. Do we want to show movies the user could find interesting, or movies that he would like?
- time window? Should we create links between all the movies a user has seen, or only some of them?

But, for now, we are only interested in the technical function of PageRank. Create a small dataset on which you will be able to iterate fast.

In [13]:
import numpy as np


# User item matrix of size nu x ni
# For this exercise, the matrix will be undirected (each edge is reflexive).
# You can change n and m if you want
nu = 5
ni = 10
X = np.zeros((nu, ni), dtype=int)

assert(X.shape == (nu, ni))

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

## Building the adjacency graph
We want to build the adjacency graph corresponding to this matrix. It will be of shape ni x ni.

In [14]:
adjacency_matrix = np.zeros((ni, ni), dtype=int)

assert(adjacency_matrix.shape == (ni, ni))

## Implement PageRank
Reminder: PageRank takes as input an adjacency matrix and outputs a rank for each of the node of the graph. You can implement the stopping criterion as you wish. An interface is proposed but feel free to customize it.

In [17]:
def page_rank(adjacency_matrix, g=0.85, n_iter=100, eps=None):
    return np.zeros((ni,), dtype=int)

my_rank = page_rank(adjacency_matrix, n_iter=1)
assert(my_rank.shape == (ni,))

## Validate PageRank
Now validate you implementation. The best way is to have a reference pageRank matrix with which you can compare yours.

## Turn your PageRank into a recommender
What we want here is simple: Now that we have a Markov Chain extracted from the adjacency matrix, given a start product, do a random walk in the graph and output the number of required products. Having restart is a plus.

In [20]:
def recommend(adjacency_matrix, my_rank, n_products, restart=False):
    return np.arange(n_products, dtype=int)

recommendation = recommend(adjacency_matrix, my_rank, 10)
assert(np.unique(recommendation).shape == (10,))

Optional: Create a sklearn estimator based on your algorithm

## Code the recall metric
Reminder:
    $$\frac{number\_of\_relevant\_items\_predicted}{number\_of\_relevant\_items\_for\_the\_user}$$

In [None]:
def recall(Y_predicted, Y_true):
    return 0.

## MovieLens evaluation
For the first session, we coded an evaluation framework based on MovieLens. The goal was to infer movie ratings. The setting of the problem has now changed: the goal is to predict 10 products that a user is likely to watch.
Adapt the system to be able to evaluate recall on this problem.