# `AA Workshop 12` — Coding Challenge

Complete the tasks below to practice collaborative filtering techniques from `W12_Recommender_Systems.ipynb`.

Guidelines:
- Work in order. Run each cell after editing with Shift+Enter.
- Keep answers short; focus on making things work.
- If a step fails, read the error and fix it.

By the end you will have exercised:
- implementing item- and user-based approaches to predict ratings
- generating recommendations for a specific user

## Task 1 - Predict a specific rating

Let's apply what we learned about collaborative filtering. We will use the same datasets as in the workshop notebook, i.e. `ratings.csv` and `movies.csv` from https://grouplens.org/datasets/movielens/. Again, we only want to consider movies with five or more ratings. The user with `userId = 15` has not yet rated the movie named _Beauty and the Beast (1991)_. First, check out some movies the user has rated with the highest score (5). Then, apply and compare item-item and user-user approaches using Pearson correlation and Cosine similarity as similarity measures to predict whether the user will likely enjoy or dislike this movie _Beauty and the Beast (1991)_. Given the users most and least favorite movies, did you expect the predicted rating for _Beauty and the Beast (1991)_?

In [1]:
# your code here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.sparse as sp
import seaborn as sns

# load MovieLens data
df = pd.read_csv("../data/ratings.csv")
df_mov = pd.read_csv("../data/movies.csv", index_col="movieId")
# convert to matrix with user IDs as rows and movie IDs as columns
X = np.asarray(sp.coo_matrix((df["rating"], (df["userId"]-1, df["movieId"]-1))).todense())
print(X.shape)

valid_movies = (X!=0).sum(axis=0) >= 5
movie_to_title = dict(zip(range(len(valid_movies)), df_mov.loc[np.where(valid_movies)[0]+1]["title"]))
X = X[:,valid_movies]
print(X.shape)

(671, 163949)
(671, 3496)


In [3]:
user_15_ratings = df[df["userId"] == 15]
top_rated = user_15_ratings[user_15_ratings["rating"] == 5.0]
print("\n".join([df_mov.loc[movie_id]["title"] for movie_id in top_rated["movieId"]]))

Seven (a.k.a. Se7en) (1995)
Usual Suspects, The (1995)
Antonia's Line (Antonia) (1995)
Taxi Driver (1976)
Amateur (1994)
Hoop Dreams (1994)
Star Wars: Episode IV - A New Hope (1977)
Léon: The Professional (a.k.a. The Professional) (Léon) (1994)
Pulp Fiction (1994)
Three Colors: Red (Trois couleurs: Rouge) (1994)
Fugitive, The (1993)
Blade Runner (1982)
Thirty-Two Short Films About Glenn Gould (1993)
Silence of the Lambs, The (1991)
Fargo (1996)
Lone Star (1996)
Godfather, The (1972)
Singin' in the Rain (1952)
Vertigo (1958)
Rear Window (1954)
North by Northwest (1959)
Casablanca (1942)
Citizen Kane (1941)
2001: A Space Odyssey (1968)
Secrets & Lies (1996)
One Flew Over the Cuckoo's Nest (1975)
Star Wars: Episode V - The Empire Strikes Back (1980)
Clockwork Orange, A (1971)
Apocalypse Now (1979)
Star Wars: Episode VI - Return of the Jedi (1983)
Godfather: Part II, The (1974)
Manhattan (1979)
Graduate, The (1967)
Chinatown (1974)
Manchurian Candidate, The (1962)
Back to the Future (1985)

In [10]:
def predict_user_user(X, W, user_means, i, j):
    """ Return prediction of X_(ij). """
    return user_means[i] + (np.sum((X[:,j] - user_means) * (X[:,j] != 0) * W[i,:]) / 
                            np.sum((X[:,j] != 0) * np.abs(W[i,:])))

In [None]:
W = np.ones((X.shape[0], X.shape[0]))
# Find Beauty and the beast id
id = df_mov[df_mov["title"] == "Beauty and the Beast (1991)"].index[0] - 1
print(id)
print(movie_to_title[])
np.where(valid_movies)[0].tolist().index(id)
user_means = np.array([X[i,X[i,:]!=0].mean() for i in range(X.shape[0])])
movie_means = np.array([X[X[:,i]!=0,i].mean() for i in range(X.shape[1])])


594
Beauty and the Beast (1991)


In [11]:
print("User 15, ", movie_to_title[np.where(valid_movies)[0].tolist().index(594)], predict_user_user(X, W, user_means, 14, 594))

User 15,  Beauty and the Beast (1991) 2.669126302568572


## Task 2 - Recommend five movies

Task 1 should have told you that _Beauty and the Beast (1991)_ is likely not the best recommendation to give to the user with `userId = 15`. Again, apply and compare item-item and user-user approaches using Pearson correlation and Cosine similarity as similarity measures to recommend the five movies with the highest predicted rating.

In [None]:
# your code here


