# DSCI 563: Unsupervised Learning 

## Lecture 8: Recommender Systems Part 2

UBC Master of Data Science program, 2020-21

Instructor: Varada Kolhatkar

### Imports 

In [1]:
import os
import random
import sys
import time

import numpy as np
import pandas as pd

sys.path.append("code/.")
from sklearn.decomposition import PCA
from sklearn.model_selection import cross_validate, train_test_split

pd.set_option("display.max_colwidth", 0)

<br><br>

## Table of contents

- [Learning outcomes](#lo)
- [Video 1: Motivation](#1)
- [Video 2: Content-based filtering](#2)
- [Video 3: Hybrid approaches](#3)
- [Video 4: Sparse utility matrix](#4)
- [Video 5: Final comments and summary](#5)
- [Questions for class discussion](#questions)

<br><br>

## Learning outcomes <a name="lo"></a>

From this lecture, students are expected to be able to:
- Formulate the rating prediction problem as a supervised machine learning problem. 
- Create a content-based filter given ratings data and item features to predict missing ratings in the utility.  
- Discuss differences, advantages and disadvantages between content-based filtering and collaborative filtering.
- Explain the idea of hybrid approaches at a high level. 
- Create and work with a sparse utility matrix for datasets with large number of items and users.   
- Name different kinds of data which occur in the context of recommendation systems. 
- Discuss important considerations in recommendation systems beyond error rate. 

## Video 1: Recap and Motivation <a name="1"></a>

### Recap: Recommender systems problem 

- We are usually given ratings data. 
- We use this data to create **utility matrix** which encodes interactions between users and items. 
- The utility matrix has many missing entries. 
- We defined recommendation systems problem as **matrix completion problem**. 

### Collaborative filtering approach  

- We use PCA-like approach to learn the latent features of users and items.
- The key idea is that the loss function only includes the available ratings. 
- So instead of using the regular PCA or `TruncatedSVD`, we used SVD implementation in the `surprise` package, which only considers available ratings in the loss function. 

### Collaborative filtering

- "Unsupervised" learning 
    - We only have labels $Y$ (ratings $y_{ij}$ for user $i$ and item $j$) but no features.
- What if a new item shows up? 
    - You won't have any ratings information for that item.     

### Content-based filtering

- Supervised machine learning approach 
- Content-based filtering is suitable to predict ratings for new items.

<br><br><br><br>

## Video 2: Content-based filtering <a name="2"></a>

 

### Item and user features

- In collaborative filtering we assumed that we only have ratings data. 
- Usually there is some information on items and users available. 
- Examples
    - Netflix can describe movies as action, romance, comedy, documentaries. 
    - Amazon could describe books according to topics: math, languages, history. 
    - Tinder could describe people according to age, location, employment.
- Can we use this information to predict ratings in the utility matrix?   
    - Yes!

### Toy example: Movie recommendation

- Let's consider movie recommendation problem with the following toy data. (Toy example inspired by Trevor 🙂!)

### Ratings data

In [2]:
toy_ratings = pd.read_csv("data/toy_ratings.csv")
toy_ratings

Unnamed: 0,user_id,movie_id,rating
0,Sam,Lion King,4
1,Sam,Jerry Maguire,4
2,Sam,Roman Holidays,5
3,Ryan,Titanic,2
4,Ryan,Inception,4
5,Ryan,The Social Dilemma,5
6,Pat,Titanic,3
7,Pat,Lion King,4
8,Pat,Bambi,4
9,Pat,Cast Away,3


In [3]:
N = len(np.unique(toy_ratings["user_id"]))
M = len(np.unique(toy_ratings["movie_id"]))
print("Number of users (N)                : %d" % N)
print("Number of movies (M)               : %d" % M)

Number of users (N)                : 4
Number of movies (M)               : 10


In [4]:
user_key = "user_id"
item_key = "movie_id"

In [5]:
user_mapper = dict(zip(np.unique(toy_ratings[user_key]), list(range(N))))
item_mapper = dict(zip(np.unique(toy_ratings[item_key]), list(range(M))))
user_inverse_mapper = dict(zip(list(range(N)), np.unique(toy_ratings[user_key])))
item_inverse_mapper = dict(zip(list(range(M)), np.unique(toy_ratings[item_key])))

### Utility matrix

Let's create a utility matrix for our toy dataset. 

In [7]:
def create_Y_from_ratings(data, N, M):
    Y = np.zeros((N, M))
    Y.fill(np.nan)
    for index, val in data.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        Y[n, m] = val["rating"]

    return Y

### Utility matrix

In [8]:
Y = create_Y_from_ratings(toy_ratings, N, M)
utility_mat = pd.DataFrame(Y, columns=item_mapper.keys(), index=user_mapper.keys())
utility_mat

Unnamed: 0,A Beautiful Mind,Bambi,Cast Away,Inception,Jerry Maguire,Lion King,Malcolm x,Roman Holidays,The Social Dilemma,Titanic
Jim,,,,,,4.0,4.0,,5.0,2.0
Pat,3.0,4.0,3.0,,5.0,4.0,,,,3.0
Ryan,,,,4.0,,,,,5.0,2.0
Sam,,,,,4.0,4.0,,5.0,,


In [24]:
avg = np.nanmean(Y)

In [30]:
avg_n = np.nanmean(Y, axis=1)
avg_n[np.isnan(avg_n)] = avg
avg_m = np.nanmean(Y, axis=0)
avg_m[np.isnan(avg_m)] = avg

In [37]:
def reconstruct_svd(U, V, avg_n, avg_m):
    return U @ V + 0.5 * avg_n[:, None] + 0.5 * avg_m[None]

**Goal**: Predict missing entries in the utility matrix. 

In [38]:
from sklearn.decomposition import PCA, TruncatedSVD

data = utility_mat.to_numpy()
data_svd = np.nan_to_num(data)
model = TruncatedSVD(n_components=4)

In [39]:
model.fit(data_svd)

TruncatedSVD(n_components=4)

In [40]:
U = model.transform(data_svd)

In [41]:
pd.DataFrame(Z, index=user_mapper.keys())

Unnamed: 0,0,1,2,3
Jim,5.196491,5.00506,1.072715,-2.791978
Pat,7.908308,-3.118076,-3.425183,-0.066244
Ryan,2.477745,5.233368,-0.858441,3.276541
Sam,5.411752,-2.645537,4.36828,1.277578


In [42]:
V = model.components_

In [43]:
pd.DataFrame(W, columns=item_mapper.keys())

Unnamed: 0,A Beautiful Mind,Bambi,Cast Away,Inception,Jerry Maguire,Lion King,Malcolm x,Roman Holidays,The Social Dilemma,Titanic
0,0.189843,0.253124,0.189843,0.079306,0.489621,0.592666,0.166326,0.21652,0.30704,0.312659
1,-0.135255,-0.18034,-0.135255,0.302682,-0.378434,-0.043872,0.289477,-0.191262,0.740198,0.160825
2,-0.314224,-0.418965,-0.314224,-0.105004,0.010617,0.246572,0.131213,0.667904,0.032762,-0.301119
3,-0.009854,-0.013139,-0.009854,0.649867,0.23697,-0.313504,-0.553759,0.316742,0.120135,0.0382


In [46]:
data_recon = reconstruct_svd(U, V, avg_n, avg_m)
pd.DataFrame(data_recon, index=user_mapper.keys())

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Jim,3.375,3.875,3.375,3.875,4.125,7.875,7.875,4.375,9.375,5.041667
Pat,6.333333,7.833333,6.333333,3.833333,9.083333,7.833333,3.833333,4.333333,4.333333,6.0
Ryan,3.333333,3.833333,3.333333,7.833333,4.083333,3.833333,3.833333,4.333333,9.333333,5.0
Sam,3.666667,4.166667,3.666667,4.166667,8.416667,8.166667,4.166667,9.666667,4.666667,3.333333


In [22]:
Z[0,:]@W[:, 0]

1.8145207558717402e-15

### Movie features

- Suppose we also have movie features. 

In [8]:
movie_feats_df = pd.read_csv("data/toy_movie_feats.csv", index_col=0)
movie_feats_df

Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary
A Beautiful Mind,0,1,1,0,0,0
Bambi,0,0,1,0,1,0
Cast Away,0,1,1,0,0,0
Inception,1,0,1,0,0,0
Jerry Maguire,0,1,1,1,0,0
Lion King,0,0,1,0,1,0
Malcolm x,0,0,0,0,0,1
Roman Holidays,0,1,1,1,0,0
The Social Dilemma,0,0,0,0,0,1
Titanic,0,1,1,0,0,0


In [9]:
Z = movie_feats_df.to_numpy()
Z.shape

(10, 6)

- How can we use these features to predict missing ratings? 

### Overall idea

- Using the ratings data and movie features, we'll build **profiles for different users**. 
- Let's consider an example user **Pat**. 

### Pat's ratings

- We don't know anything about Pat but we know her ratings to movies. 

In [10]:
utility_mat.loc["Pat"]

A Beautiful Mind      3.0
Bambi                 4.0
Cast Away             3.0
Inception            NaN 
Jerry Maguire         5.0
Lion King             4.0
Malcolm x            NaN 
Roman Holidays       NaN 
The Social Dilemma   NaN 
Titanic               3.0
Name: Pat, dtype: float64

- We also know about movies and their features. 
- If Pat gave a high rating to _Lion King_, it means that she liked the features of the movie. 

In [11]:
movie_feats_df.loc["Lion King"]

Action         0
Romance        0
Drama          1
Comedy         0
Children       1
Documentary    0
Name: Lion King, dtype: int64

- Can we build a profile for users based on what features they like and what features they don't like and predict ratings for movies using this information? 

### Supervised approach to rating prediction 

- We treat ratings prediction problem as a set of regression problems. 
- Given movie information, we create user profile for each user.
- Build regression model for each user and learn regression weights for each user. 

### Supervised approach to rating prediction 

For each user $i$ create a user profile as follows. 

- Consider all movies rated by $i$ and create `X` and `y` for the user: 
    - Each row in `X` contains the movie features of movie $j$ rated by $i$. 
    - Each value in `y` is the corresponding rating given to the movie $j$ by user $i$. 
- Fit a regression model using `X` and `y`. 
- Apply the model to predict ratings for new items! 

### Let's build user profiles 

- Build `X` and `y` for all users. 

In [12]:
from collections import defaultdict


def get_lr_data_per_user(ratings_df, d):
    lr_y = defaultdict(list)
    lr_X = defaultdict(list)
    lr_items = defaultdict(list)

    for index, val in ratings_df.iterrows():
        n = user_mapper[val[user_key]]
        m = item_mapper[val[item_key]]
        lr_X[n].append(Z[m])
        lr_y[n].append(val["rating"])
        lr_items[n].append(m)

    for n in lr_X:
        lr_X[n] = np.array(lr_X[n])
        lr_y[n] = np.array(lr_y[n])

    return lr_X, lr_y, lr_items

In [13]:
d = movie_feats_df.shape[1]
X_train_usr, y_train_usr, rated_items = get_lr_data_per_user(toy_ratings, d)

- What's going is be shape of each `X` and `y`?

### Examine user profiles 

- Let's examine some user profiles. 

In [14]:
def get_user_profile(user_name):
    X = X_train_usr[user_mapper[user_name]]
    y = y_train_usr[user_mapper[user_name]]
    items = rated_items[user_mapper[user_name]]
    movie_names = [item_inverse_mapper[item] for item in items]
    print("Profile for user: ", user_name)
    profile_df = pd.DataFrame(X, index=movie_names, columns=movie_feats_df.columns)
    profile_df["ratings"] = y
    return profile_df

### Pat's profile

In [15]:
get_user_profile("Pat")

Profile for user:  Pat


Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary,ratings
Titanic,0,1,1,0,0,0,3
Lion King,0,0,1,0,1,0,4
Bambi,0,0,1,0,1,0,4
Cast Away,0,1,1,0,0,0,3
Jerry Maguire,0,1,1,1,0,0,5
A Beautiful Mind,0,1,1,0,0,0,3


- Pat seems to like Children's movies and movies with Comedy. 
- Seems like she's not so much into romantic movies.  


### Ryan's profile

In [16]:
get_user_profile("Ryan")

Profile for user:  Ryan


Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary,ratings
Titanic,0,1,1,0,0,0,2
Inception,1,0,1,0,0,0,4
The Social Dilemma,0,0,0,0,0,1,5


- Ryan hasn't rated many movies. There are not many rows. 
- Ryan seems to like documentaries and action movies. 
- Seems like he's not so much into romantic movies.  

### Regression models for users

In [17]:
from sklearn.linear_model import Ridge


def train_for_usr(user_name, model=Ridge()):
    X = X_train_usr[user_mapper[user_name]]
    y = y_train_usr[user_mapper[user_name]]
    model.fit(X, y)
    return model


def predict_for_usr(model, movie_names):
    feat_vecs = movie_feats_df.loc[movie_names].values
    preds = model.predict(feat_vecs)
    return preds

### Regression model for Pat 

- What are the regression weights learned for Pat? 

In [18]:
user_name = "Pat"
pat_model = train_for_usr(user_name)
col = "Coefficients for %s" % user_name
pd.DataFrame(pat_model.coef_, index=movie_feats_df.columns, columns=[col])

Unnamed: 0,Coefficients for Pat
Action,0.0
Romance,-0.25641
Drama,0.0
Comedy,0.820513
Children,0.25641
Documentary,0.0


### Predictions for Pat
- How would Pat rate some movies she hasn't seen? 

In [19]:
movies_to_pred = ["Roman Holidays", "Malcolm x"]
pred_df = movie_feats_df.loc[movies_to_pred]
pred_df

Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary
Roman Holidays,0,1,1,1,0,0
Malcolm x,0,0,0,0,0,1


In [20]:
user_name = "Pat"
preds = predict_for_usr(pat_model, movies_to_pred)
pred_df[user_name + "'s predicted ratings"] = preds
pred_df

Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary,Pat's predicted ratings
Roman Holidays,0,1,1,1,0,0,4.179487
Malcolm x,0,0,0,0,0,1,3.615385


### Regression model for Ryan 

- What are the regression weights learned for Ryan? 

In [21]:
user_name = "Ryan"
ryan_model = train_for_usr(user_name)
col = "Coefficients for %s" % user_name
pd.DataFrame(ryan_model.coef_, index=movie_feats_df.columns, columns=[col])

Unnamed: 0,Coefficients for Ryan
Action,0.25
Romance,-0.75
Drama,-0.5
Comedy,0.0
Children,0.0
Documentary,0.5


### Predictions for Ryan

- What are the predicted ratings for Ryan for a list of movies?

In [22]:
user_name = "Ryan"
preds = predict_for_usr(ryan_model, movies_to_pred)
pred_df[user_name + "'s predicted ratings"] = preds
pred_df

Unnamed: 0,Action,Romance,Drama,Comedy,Children,Documentary,Pat's predicted ratings,Ryan's predicted ratings
Roman Holidays,0,1,1,1,0,0,4.179487,2.75
Malcolm x,0,0,0,0,0,1,3.615385,4.5


### Content-based vs. collaborative filtering 

- Latent-factor approach to collaborative filtering, where we reconstruct rating for user $i$ and item $j$ as: 
$$\hat{y}_{ij} = w_j^T z_{i}$$

    - $w_j^T$ are "hidden" features of item $j$
    - $z_i$ are "hidden" features of user $i$


- A linear model approach to content-based filtering, where we reconstruct rating for user $i$ and item $j$ as:  
$$\hat{y}_{ij} = w^T x_{ij}$$
    - $x_{ij}$ is a feature vector for user $i$ and item $j$
    - $w$ are the feature vectors for user $i$
    - Our usual supervised learning setup for linear regression.  



### More comments on content-based filtering

- Using predictions per user, we can fill in missing entries in the utility matrix. 
- The feature matrix for movies can contain different types of features.
    - Example: Plot of the movie (text features), actors (categorical features), year of the movie, budget and revenue of the movie (numerical features). 
    - You'll apply our usual preprocessing techniques to these features. 
- If you have enough data, you could also carry out hyperparameter tuning with cross-validation for each model.  
- Finally, although we have been talking about linear models above, you can use any regression model of your choice. 

### Advantages of content-based filtering 

- We don't need many users to provide ratings for an item. 
- Each user is modeled separately, so you might be able to capture uniqueness of taste. 
- Since you can obtain the features of the items, you can immediately recommend new items. 
    - This would not have been possible with collaborative filtering. 
- Recommendations are interpretable.
    - You can explain to the user why you are recommending an item because you have learned weights. 
    

### Disadvantages of content-based filtering 

- Feature acquisition and feature engineering
    - What features should we use to explain the difference in ratings? 
    - Obtaining those features for each item might be very expensive. 
- Less diversity: hardly recommend an item outside the user's profile. 
- Cold start: When a new user shows up, you don't have any information about them.

<br><br><br><br>

## Video 3: Hybrid approaches <a name="3"></a>
<hr>

### General idea 
- Both collaborative filtering and content-based filtering suffer from shortcomings. 
- Collaborative filtering does not predict well for new movies/users.
    - New movies don't yet have ratings, and new users haven't rated anything.
- Content-based approaches are less diverse. 
- We are not exploiting information about similarity between users. 
- Most of the recommenders used in practice are hybrid recommenders which combine the best of the two worlds.  

### Netflix hybrid approach 

- MORE LIKE THIS: Netflix employs content-based techniques when it shows you similar movies to a movie you're watching.
- Top Picks for XXX: It employs collaborative filtering to identify similar users and recommend movies they have liked. 

### Hybrid approach 
Hybrid approaches combine content-based and collaborative filtering: [SVDfeature](https://www.jmlr.org/papers/v13/chen12a.html) (won "KDD Cup" in 2011 and 2012).

$$\hat{y}_{ij} = \beta + \beta_i + \beta_j + w^TX_{ij} + w_j^Tz_i$$ 

- $\beta$ is average rating across all users/movies
- $\beta_i$ is average rating for user $i$
- $\beta_j$ average rating for movie $j$
- $w^TX_{ij}$ is our linear model based on user/movie features $x_{ij}$ 
- $w_j^Tz_i$ factors learned by our collaboative filtering model.  
- Note that $x_{ij}$ is a feature vector for user $i$ and movie $j$. 
- Also, $w$ and $w_j$ are different parameters. 
- Supervised learning can predict for new movies! 


In my quick search, I didn't find an easy-to-use package for this. 

<br><br><br><br>

## Video 4: Sparse utility matrix <a name="4"></a>

- Recommender systems work best when there is a large amount of data. 
- So far we've been working with small datasets. 

Let's using the [Amazon product data set](http://jmcauley.ucsd.edu/data/amazon/). The authors of the data set have asked for the following citations:

> Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering.
> R. He, J. McAuley.
> WWW, 2016.
> 
> Image-based recommendations on styles and substitutes.
> J. McAuley, C. Targett, J. Shi, A. van den Hengel.
> SIGIR, 2015.

We will focus on the Patio, Lawn, and Garden section. You can download the [ratings here](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/ratings_Patio_Lawn_and_Garden.csv). 

Let's load the data. 

In [23]:
filename = "ratings_Patio_Lawn_and_Garden.csv"

with open(os.path.join("data", filename), "rb") as f:
    ratings = pd.read_csv(f, names=("user", "item", "rating", "timestamp"))
ratings.head()

Unnamed: 0,user,item,rating,timestamp
0,A2VNYWOPJ13AFP,981850006,5.0,1259798400
1,A20DWVV8HML3AW,981850006,5.0,1371081600
2,A3RVP3YBYYOPRH,981850006,5.0,1257984000
3,A28XY55TP3Q90O,981850006,5.0,1314144000
4,A3VZW1BGUQO0V3,981850006,5.0,1308268800


In [24]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 993490 entries, 0 to 993489
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   user       993490 non-null  object 
 1   item       993490 non-null  object 
 2   rating     993490 non-null  float64
 3   timestamp  993490 non-null  int64  
dtypes: float64(1), int64(1), object(2)
memory usage: 30.3+ MB


We'd like to construct the utility matrix `Y`. However, let's see how big it would be:

In [25]:
def get_stats(ratings, item_key="item", user_key="user"):
    print("Number of ratings:", len(ratings))
    print("The average rating:", np.mean(ratings["rating"]))
    N = len(set(ratings[user_key]))
    M = len(set(ratings[item_key]))
    print("Number of users:", N)
    print("Number of items:", M)
    print("Fraction nonzero:", len(ratings) / (N * M))
    print("Size of full Y matrix (GB):", (N * M) * 8 / 1e9)
    return N, M


N, M = get_stats(ratings)

Number of ratings: 993490
The average rating: 4.006400668350965
Number of users: 714791
Number of items: 105984
Fraction nonzero: 1.3114269915944552e-05
Size of full Y matrix (GB): 606.051274752


606 GB! That is way too big. We don't want to create that matrix! On the other hand, we see that we only have about 1 million ratings, which would be 8 MB or so ($10^6$ numbers $\times$ 8 bytes per number). Much more manageable!

Let's create a sparse representation of our utility matrix $Y$. 

In [26]:
from scipy.sparse import csr_matrix as sparse_matrix

user_key = "user"
item_key = "item"
user_mapper = dict(zip(np.unique(ratings[user_key]), list(range(N))))
item_mapper = dict(zip(np.unique(ratings[item_key]), list(range(M))))

user_inverse_mapper = dict(zip(list(range(N)), np.unique(ratings[user_key])))
item_inverse_mapper = dict(zip(list(range(M)), np.unique(ratings[item_key])))

In [27]:
def create_Y(ratings, N, M, user_key="user", item_key="item"):
    """
    Creates a sparse matrix using scipy.csr_matrix and mappers to relate indexes to items' id.

    Parameters:
    -----------
    ratings: pd.DataFrame
        the ratings to be stored in the matrix;
    N: int
        the number of users
    M: int
        the number of items
    user_key: string
        the column in ratings that contains the users id
    item_key: string
        the column in ratings that contains the items id

    Returns:
    --------
    Y: np.sparse
        the sparse matrix containing the ratings.
    """
    user_ind = [user_mapper[i] for i in ratings[user_key]]
    item_ind = [item_mapper[i] for i in ratings[item_key]]
    Y = sparse_matrix((ratings["rating"], (user_ind, item_ind)), shape=(N, M))
    return Y

In [28]:
Y = create_Y(ratings, N, M)
Y

<714791x105984 sparse matrix of type '<class 'numpy.float64'>'
	with 993490 stored elements in Compressed Sparse Row format>

Note the shape of `Y`: our rows are the users, and the columns are products.

In [29]:
# sanity check
print(Y.shape)  # should be number of items by number of users
print(Y.nnz)  # number of nonzero elements -- should equal number of ratings
print(f"Using sparse matrix data structure, the size of X is: {Y.data.nbytes/1e6}mb")

(714791, 105984)
993490
Using sparse matrix data structure, the size of X is: 7.94792mb


### Let's try `surprise` package on this

In [30]:
ratings = ratings.drop(columns=["timestamp"])

In [31]:
import surprise
from surprise import SVD, Dataset, Reader
from surprise.model_selection import cross_validate

reader = Reader()
data = Dataset.load_from_df(ratings, reader)  # Load the data
k = 10
algo = SVD(n_factors=k, random_state=42)
cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2928  1.2903  1.2925  1.2891  1.2905  1.2910  0.0014  
MAE (testset)     1.0204  1.0191  1.0221  1.0187  1.0206  1.0202  0.0012  
Fit time          11.81   11.91   11.80   11.95   12.07   11.91   0.10    
Test time         1.29    1.15    1.19    1.18    0.91    1.14    0.12    


{'test_rmse': array([1.29275852, 1.29026753, 1.29248648, 1.28913228, 1.29054284]),
 'test_mae': array([1.02038521, 1.01914696, 1.02207026, 1.01868075, 1.02059237]),
 'fit_time': (11.81141996383667,
  11.911198139190674,
  11.800252199172974,
  11.94942593574524,
  12.068403959274292),
 'test_time': (1.285412073135376,
  1.1535170078277588,
  1.1907069683074951,
  1.175961971282959,
  0.9121360778808594)}

### Displaying product urls 

Let's consider the following product. 
- [Mr Grill - 18" Luxury Oak Barbecue Spatula / Turner](https://www.amazon.com/dp/B00IJB5MCS). 


In [32]:
from IPython.core.display import HTML, display

url_amazon = "https://www.amazon.com/dp/%s"


def disp_url(item_id):
    url = url_amazon % item_id
    display(HTML('<a href="%s">%s</a>' % (url, url)))

In [33]:
grill_spatula = "B00IJB5MCS"
grill_spatula_ind = item_mapper[grill_spatula]
grill_spatula_vec = Y[grill_spatula_ind]
disp_url(grill_spatula)

<br><br><br><br>

### Video 4: Final comments and summary <a name="5"></a>

### What did we cover? 

- There is a big world of recommendation systems out there. We talked about some basic approaches to recommender systems. 
    - collaborative filtering 
    - content-based filtering 
    - hybrid approaches 

### Beyond error rate in recommendation systems 

- If a system gives the best RMSE it doesn't necessarily mean that it's going to give best recommendations. 
- In recommendation systems we do not have ground truth.
- Just training your model and evaluating it offline is not ideal. 
- Other aspects such as simplicity, interpretation, code maintainability are equally (if not more) important than best validation error. 
- Winning system of Netflix Challenge was never adopted.
    - Big mess of ensembles was not really maintainable 
- There are other considerations. 

### Other issues important in recommender systems

- **Diversity**: how different are the recommendations?
    - Even if you really really like Star Wars, you might want non-Star-Wars suggestions.    
- **Freshness**: people tend to get more excited about new/surprising things.    
- **Trust**: tell user why you made a recommendation.
    - Quora gives explanations for recommendations.
- **Persistence**: how long should recommendations last?
    - If you keep not clicking on a recommendation, should it remain a recommendation?
- **Social recommendation**: what did your friends watch?

### Other issues important in recommender systems

- Many recommenders	are	now	connected to social	networks.
- "Login using you Facebook	account".
- Often, people	like similar movies	to their friends.
- If we get a new user, then recommendations are based on friend's preferences. 

### Types of data 

- Explicit data: ratings, thumbs up, etc. 
- Implicit data: collected from the users' behaviour (e.g., mouse clicks, purchases, time spent doing something)
- Trust implicit data that costs something, like time or even money. 
    - this makes it harder to fraud

### Reminder

- Recommendation systems can have terrible consequences. 
- Ask hard and uncomfortable questions to yourself (and to your employer if possible) before implementing and deploying a recommendation system.  

## Course roadmap

- Week 1 ✅
    - Clustering    
- Week 2 ✅
    - Dimensionality reduction
- Week 3 ✅
    - Word embeddings
- Week 4 ✅
    - Recommendation systems     
    
That's all for this course, folks! It was fun teaching you this material. Thanks for your support, feedback, and great questions ❤️!     

I would love to hear your thought on this course. When you get a chance, it'll be great if you fill in the evaluation survey for this course on [Canvas](https://canvas.ubc.ca/courses/30777/external_tools/6073). 

The evaluation closing date is: **March 24, 2021**

<br><br><br><br>

### Questions for class Discussion <a name="questions"></a>

### True/False questions

- In content-based filtering we leverage available item features in addition to similarity between users.
> False. We do not incorporate similarity between users. Each user has a separate regression model.  

- In content-based filtering you represent each user in terms of **known** features of items whereas in collaborative filtering each user is represented with **latent** features of items. 
> True. 

- In the set up of content-based filtering we discussed, if you have a new movie, you would have problems predicting ratings for that movie. 
> False. A new movie comes with a set of features and given the learned weights of regressor and the movie feature vector we can right away predict the rating for the movie. 

- Interpretation of recommendations might be easier with content-based filtering compared to collaborative filtering. 
> True. In collaborative filtering we do not know the actual features. 

<br><br><br><br>

### Questions to discuss in the breakout rooms 

- We have been ignoring the timestamp column in ratings datasets. How you might use this information when making recommendations? 

- Discuss similarities and differences between the Word2Vec style recommender we saw last week and the recommender systems we learned this week. 

<br><br><br><br>

### True/False questions

- `SVDFeature` hybrid model creates separate rating prediction models with content-based and collaborative filtering.  
> V's answer: False

- If you have a large number of users and items, it would be a problem to evaluate the reconstructed matrix against train and validation sets.  
> V's answer: False

- A user views 1 minute of a 10 minute YouTube video. What kind of input is this? 
    - Implicit (V's answer)
    - Explicit
    - Utility matrix 
    - indication that the user dislikes the video. 
    

### Questions to discuss in the breakout room
- Discuss memory-related problems that may cause when dealing with large number of users and items. 

### (Optional) Practice exercises for you

Use scikit-learn's [NearestNeighbors](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html) object (which uses Euclidean distance by default) to find the 10 most similar items to [Mr Grill - 18" Luxury Oak Barbecue Spatula / Turner](https://www.amazon.com/dp/B00IJB5MCS) using Euclidean distance and cosine distance. Which distance metric is giving you better recommendations? 

> Try it out on your own or with your friends. I might not get a chance to post solutions for these questions. 

<br><br>