# Recommender systems lab

## Goals of this lab:

* Hands-on experience with implementing some simple content-based and collaborative filtering algorithms using a publicly available dataset (movielens)
* Comparison of different similarity metrics in the context of content-based recommendation
* Implementation of a collaborative filtering recommender system using the SVD
* Hands-on experience using numpy and pandas


---





---


In [None]:
import numpy as np
import pandas as pd

## Downloading the data

The dataset we will use is the small movielens dataset, available here

https://grouplens.org/datasets/movielens/

under the name

Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018.

    README.html
    ml-latest-small.zip (size: 1 MB)

---


---

# Content-based Filtering

The goal of this part of the lab is to create a content-based filtering recommender system that takes into account

* the genres
* the years

of the movies as features.

A key to any content-based filtering recommender system is a measure of similarity between feature vectors. We will create such systems using

* the cosine similarity
* the jaccard similarity
* the euclidean simmilarity

separately and recommend movies given some movie.

## Data importing and cleaning

We need to import the data and create the dataframe containing the features.

---

## Cosine similarity

The cosine similarity is given by the cosine of the angle between two feature vectors.

### Compute the cosine similarity from the features

### Make some recommendations using the cosine similarity

The movies Rent-A-Cop (1988) [movieId 3667] and Man, The (2005) [movieId 37477] come from different eras but have the same genres. What do you notice about the recommendations between these two movies?

---

## Jaccard similarity

Recall the jaccard similarity between two vectors $v_i$ and $v_j$ with binary components from the lecture as the ratio 

$$\frac{|v_i\cap v_j|}{|v_i\cup v_j|}.$$

We want to use the jaccard similarity computed between the features of two movies (in this case, genres and year) to construct our similarity matrix. Remember, the jaccard similarity is only defined for binary features vectors...

### Compute the jaccard similarity from the features

### Make some recommendations using the jaccard similarity

As before, contrast the recommendations given for some movies. Note that the jaccard similarity matrix does not need to be normalized (why?).

---

## Euclidean similarity

Finally, we will compute the euclidean distance between two feature vectors $v_i$ and $v_j$,

$$d(i,j) = \|v_i-v_j\|_2,$$

and use this value to construct a euclidean similarity matrix,

$$s(i,j) = \frac{1}{1+d(i,j)}.$$

### Compute the euclidean distance between the features

### Compute the similarity matrix from the distances

### Make some recommendations using the euclidean similarity

---




---

# Collaborative filtering using SVD

The goal of this part of the lab is to create a collaborative filtering system using 

* the user ratings 

and matrix factorization, in particular

* the top k singular value decomposition (SVD).

After, we will generate recommendations of 10 movies for a given user, and compare to their favorite movies that they've seen. Be careful to exclude the movies the user has already seen (assume the user has rated every movie they've seen), and compare to the user's 10 favorite movies.

## Data import and cleaning

## Top k SVD

To construct the matrix of predicted rankings, we are going to take the SVD of the user-movie rating matrix and keep only the top k components to make a rank-k approximation. We choose k=50 for now but experiment with different values and see the effect it has.

## Making recommendations

Using the SVD you can form a matrix of predicted rankings. We will use this to predict what are the highest rated movies that the user has not actually rated, and recommend those to the user. We also want to compare these movies to the user's 10 favorite movies.