<a href="https://colab.research.google.com/github/CREVIOS/SSI_2020/blob/master/Lecture_6_1_Recommender_Systems_%5BSOLUTIONS%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Codealong - Recommender Systems

In this codealong, we will be writing a simplified version of the predict function of a recommender system!

## 0. Setup

Run the following code blocks, each explained above the code, to set up our imports and data set

Here, we are doing a lot of the imports that you all are already familiar with.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Here, we install a new library called `surprise`! Don't worry about this library too much, we're only using it today for some help with recommender systems.

*Note: You might get a prompt asking you to confirm you want to download our dataset. Simply type Y to confirm if this comes up.*

In [None]:
! pip install surprise
from surprise.prediction_algorithms.matrix_factorization import NMF
from surprise import Dataset
from surprise.model_selection import cross_validate

Now, we load in the dataset from `surprise` and convert it to a DataFrame, remove the timestamp column that we don't care about, and display it.

In [None]:
data = Dataset.load_builtin('ml-100k')
df = pd.DataFrame(data.raw_ratings, columns=['user_id', 'item_id', 'rating', 'timestamp'])
df = df.loc[:, df.columns != 'timestamp']
display(df)

## 1. Create and fit an NMF Recommender

Once again, don't worry too much about this part. Since it's incredibly hard to code the predict behavior from scratch, we are using `surprise` to make our lives easier.

In [None]:
trainset = data.build_full_trainset()

algo = NMF()
algo.fit(trainset)

This block of code demonstrates getting a prediction from the NMF class for a user_id and an item_id. The `pred` returned is a special tuple that we have to index into (at index 3) to get the predicted rating that we want.

In [None]:
uid = str(112)  # raw user id (as in the ratings file). They are **strings**!
iid = str(13)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid)
print(pred)
print(pred[3])

## Prediction

## 2. Predicting with a recommender system!

Now, it's finally time for us to write some code! In the following cells, write code to fill out an array with the estimated rating for every user, item pair. This array will be of size (number of users) x (number of items), so first write code to calculate those values!

In [None]:
NUM_USERS = len(np.unique(df["user_id"])) # TODO: Write code to figure out how many users we have
NUM_ITEMS = len(np.unique(df["item_id"])) # TODO: Write code to calculate how many items we have

In [None]:
# TODO: Write code to fill out an array of ratings, and display that array
ratings = np.zeros((NUM_USERS, NUM_ITEMS))
for uid in range(NUM_USERS):
  if (uid % 100 == 0):
    print("On user_ID {} out of {}".format(uid, NUM_USERS))
  for iid in range(NUM_ITEMS):
    ratings[uid, iid] = algo.predict(str(uid), str(iid), verbose=False)[3]

display(ratings)