In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

## Lighthouse Labs
### W10D2 Recommendation Engines II

**Agenda:**
- Content-based Recommender Systems
    - Item-to-item
    - user-to-user
    - Latent factors;
    
- Pros and Cons of different approaches to Recommenders
- Demo

# Recommendation Systems

- Two entities: **users** and **items**; 
    - whatever items may be: movies, books, musics, people, etc...;



- Utility matrix: a matrix that captures the interaction between users and items; 
    - ratings, clicks, reviews, purchases;

Ratings | Notting Hill | Jurassic Park | Rocky Balboa IV | Bird Box | Inglorious Basterds
--------|--------------|---------------|-----------------|----------|----------------------
User 1  |      ?       |      ?        |       2         |    ?     |    3 
User 2  |      3       |      ?        |       ?         |    ?     |    ?
User 3  |      ?       |      4        |       5         |    ?     |    5
User 4  |      ?       |      ?        |       ?         |    ?     |    ?
User 5  |      ?       |      ?        |       ?         |    5     |    ? 

## Our main problem: _Sparsity_

- The thing is: we cannot expect that users will interact with most items;
    - Nobody has watched the whole (or the vast majority of the) catalogue of Netflix videos;
        - Hopefully!
    - Clients don't buy the majority of items in Amazon;
    - Users don't listen to all the music in Spotify;
    

- Hence, the utility matrix usually has tons of missing data;

## What do we want?

- Do we want to predict the missing ratings?

- Or do we want to find items that the users will enjoy?

 - Aren't these problems equivalent? 

 - We must carefully consider what we want, so we can properly train our model!
     - but yes, our job for now is to try to fill **some** of the missing data in this matrix;

- There are different approaches to do so (which you learned yesterday): 
    - Content-based recommendations
    - Collaborative filtering

## Content-based recommendation (Review)

- Use knowledge of each item to recommend a similar one (item-based recommendation)

- Based on the features of the items and users. For example
    - movies: drama, comedy, horror, actors, director
    - books: math, languages, author, year, etc...
    - destinations: temperature, coast, mountains, price, distance, etc...
    - users: age, location, etc...

* If you read specific articles.
* The representations of these articles as vectors will be similar to some other article representations as vectors.
* A measure like cosine similarity is used to find similar articles.
* These similar articles are then recommended to you if you haven’t read them to improve your engagement with the website or article medium.

![img](img/CBF.png)

#### Advantages

- You don't need a lot of users to train your model.


- Each user is modeled separately, so you might be able to capture uniqueness of taste.


- Since you can obtain the features of the items, you can immediately recommend new items.


- You can explain to the users why you are recommending an item.

#### Disadvantages

- Feature acquisition:
    - What features should you use to explain the different ratings?
    - Obtaining those features for each item might be very expensive. 
       <br>  <br>
- Low diversity: hardly recommend an item outside the user's profile.
    - What if a user has an eclectic taste?
 <br>  <br>
- Cold start: you don't have any information about new users, what to do?

## Collaborative Filtering: Memory Based Approach
- User-user
- Item-item
- Latent-factors

Memory-based collaborative filtering computes similarities between users or items and predicts a new rating for an item by taking the weighted average of ratings from the similar group. 
<br><br>

- Use knowledge of a user’s past selections to recommend what similar users did (user-based recommendation).
<br><br>
- For item-based filtering: “users who preferred a certain item also liked…”
<br><br>
- The recommendations will be based on the utility matrix.
<br><br>
- The idea is to find similar items or similar users to make your recommendations, hence collaborative.

Ratings | Notting Hill | Jurassic Park | Rocky Balboa IV | Bird Box | Inglorious Basterds
--------|--------------|---------------|-----------------|----------|----------------------
User 1  |      ?       |      ?        |       2         |    ?     |    3 
User 2  |      3       |      ?        |       ?         |    ?     |    ?
User 3  |      ?       |      4        |       5         |    ?     |    5
User 4  |      ?       |      ?        |       ?         |    ?     |    ?
User 5  |      ?       |      ?        |       ?         |    5     |    ?

### User-to-user

User-based filtering first selects a user and finds users who have similar rating patterns. The recommender system then can suggest items that those similar users liked.

* The process is to calculate the similarities between target user i and all other users, select the top X similar users, and take the weighted average of ratings from these X users with similarities as weights.
<br><br>
* While different people may have different baselines when giving ratings, some people tend to give high scores generally, some are pretty strict even though they are satisfied with items. To avoid this bias, we can subtract each user’s average rating of all items when computing weighted average, and add it back for target user.


- Suppose you want to predict the rating that Sophie would give to Shrek.

- Here is one approach:
    1. Calculate the similarity of Sophie with each one of the users.
    3. Next, select the $k$ users that are most similar to Sophie and also rated Shrek.
    2. Aggregate their ratings.

### Item-to-item

- Item-based filtering takes an item first and finds users who liked the particular item, then searches other items that those users also liked.

- Item based collaborative filtering was introduced 1998 by Amazon. 

- Item based filtering looks at the similarity between different items, and does this by taking note of how many users that chose item X also chose item Y. 

- If the correlation is high enough, a similarity can be presumed to exist between the two items, and they can be assumed to be similar to one another.

* E.g. Suppose you want to predict the rating that a user (Sophie) would give to a movie (Shrek)


- Here is one approach:
    1. Calculate the similarity between Shrek and each one of the movies in our matrix.
    2. Next, select the $k$ movies that are  most similar to Shrek and that were also rated by Sophie.
    3. Aggregate these ratings.

## Evaluating your recommender system (discussion)

- Assessment of a recommender system can be very tricky;


- Well, we can use the classical measures: mean squared error, mean absolute error;


- But these error measures might be misleading;


- What we actually want to measure is the interest that our user have on the recommended items;


### Remember that:

- Just training your model and evaluating it offline is not ideal.
<br><br>


- We don't have a ground truth! 


- Although we want to recommend only items the user is interested in, the recommended items might skew/affect the users interest;
    - we can't measure this offline;

- Because of this, one can argue that the best way of testing a recommender system is actually testing it for real (A/B test). 




### Besides...

- The fact that a user liked a movie, it doesn't mean he/she wants to watch a similar movie in sequence;
    - also hard to capture this offline;


- Suppose a user likes action and sci-fi movies. 


- After just watching an action movie, should we recommend similar action movies?

- **Diversity**: Our user is a Harry Potter fan, should we recommend HP1, HP2, HP3,...? We might want some diversity! To measure diversity you could user, for example,
$$
1 - \bar{S}
$$
where $\bar{S}$ is the average similarity of your recommendations; 
    - Careful though, you need a balance. Just going for diversity is pointless;

- **Novelty:** You want to indicate new items to your users or the most popular items? We need a balance here
    - Just going for popular items won't surprise your users;
    - Just going for new/unknown items affects the trust in your recommendation;
        - Besides, popular items are popular for a reason;

- **Responsiveness:** how fast does your system change as new user/items interactions arrive? 
    - In other words, how frequently should you update your utility matrix?
    
    
- **Persistence:** How long do you want to keep an item in your recommended list?

## Non-algorithmic recommendations bias

- There might be other reasons for you to recommend items;


- For example, Netflix might want to stimulate original productions;


- A company might want to favor a product with high profit margin;

#### Or you might want not recommend somethings

- [Is Target the new pregnancy test?](https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#4d7f597f6668)
<br><br>
- [Walmart links Martin Luther King Jr. to “Planet of the Apes”](http://www.nbcnews.com/id/10730202/ns/technology_and_science-tech_and_gadgets/t/wal-mart-blames-human-error-offensive-link/#.XEamRc2IaUk)
<br><br>
- BE CAREFUL WITH SENSITIVE MATTERS!

### Types of data

- Explicit data: ratings, thumbs up, etc...


- Implicit data: collected from your behaviour (e.g., mouse clicks, purchases, time spent doing something)

- Users don't like to give explicit data
    - so companies use different strategies: (tag your photo, 10 years challenge?)
        - it is nice for you, so you do it;
        - but it is also nice for them;
        - Using these things without you knowing, is that ethical?
        


- Companies trust implicit data more, like time/money spent;

## Model Based Collaborative Filtering

Our task: decreases the dimension of the utility matrix A by extracting its latent factors



### Latent factors
- Latent factors are the characteristics of the items, for example, the genre of the music, the genre of the movie... 

Example of Matrix Factorization using PCA 

![img](img/0_pca.png)

We are going to do this using `Matrix Factorization` reducing the `Movie Ratings Matrix` into `Movies` and `Users Matrix`.

These Matrices will be dense representations of the Movie Ratings Matrix.


![img](img/SVD01.png)


### How our data looks like?

![img](img/SVD03.png)

- We've seen that if we had the user profile, we can learn the movie profile $\hat{X}$ 

- Or if we had the movie profile, we could learn the user profile $\hat{\beta}$

- We have a similar problem, but instead of minimizing on either $X$ or $\beta$, we need to minimize on both!!

### Visual SVD

Each movies has different genres like comedy and action with a specified rating precalculated.

User also has a preference for these genres like ‘I like comedy or I dislike it’. Only the like score is added into the rating.

![img](img/SVD04.png)

We get a final rating matrix like so.

Basically, the right matrix can be factorized into the 2 left matrices

![img](img/SVD05.png)


We need to get target matrix on right decomposed into 2 matrices
We need to find 2 matrices which when multiplied give us the target matrix
How do we do this?

![img](img/SVD06.png)

We get a final rating matrix like so.

Basically, the right matrix can be factorized into the 2 left matrices

We start with random matrices and use gradient descent to find right values to get the correct target matrix

![img](img/SVD07.png)

As you can see we get 1.44 instead of 3
This needs to be corrected.
![img](img/SVD08.png)

Gradient descent will update values in the non target matrices.
With enough iterations we will reach the right values.

![img](img/SVD09.png)

![img](img/SVD10.png)

Same technique can be used to factorize empty target matrices as well.
Thus we get right factored matrices from random ones.
When multiplied these give ratings for all users.
![img](img/SVD11.png)