# Lecture 16 Recommendation Systems
__Math 3280: Data Mining__

__Outline__
1. Content-based Recommendation System
    * User Profile
    * Recommendation
2. Collaborative-Filtering based Recommendation System
    * Duality of Similarity
3. Netflix Challenge

__Reading__ 
* Leskovec, Chapter 9
-----

### User Profiles
Having made an item profile (a matrix to provide features of the items), we can now create a profile associating each user with the same features from each item. For example, the utility matrix matches each user with a movie (the item) with a user rating. The item profile is a matrix that matches each movie (the item) with information such as actors or average ratings. These matrices can be used to create a __user profile__ which matches each user with the information in the item profile.

#### User Profile with like/dislike ratings
In this example, we're going to use movie features (actors and average ratings) to create a matrix giving a score based on the average ratings given to movies with each given actor. 
* Example: If 20\% of the movies that a user likes have Julia Roberts as one of the actors, then the user profile for that user will be 0.2 in the component for Julia Roberts.

How to find this:
* Let $\vec{A}$ be the vector of movies actor $A$ is in
* Let $\vec{u}$ be the vector of movies rating given by user $u$
* $\vec{A}\cdot\vec{u}$ will give the number of movies with the given actor that were given a rating by the given user
* $\vec{u}\cdot\vec{u}$ will give the norm of the user vector, which gives number of movies liked by that user
* The quotient of these values will give a normalized weight to movies rated by that user that have that actor

In [10]:
## User Profile - binary ratings
## Compare with Example 9.3 : Movies given only likes or dislikes
dotproduct = np.dot(movie_casts['Julia Roberts'], user_likes.loc['User A'])
norm = np.dot(user_likes.loc['User A'], user_likes.loc['User A'])
print(f"Dot Product of User Likes and Julia Roberts' Movies : {dotproduct}")
print(f"Norm of User Likes (number of movies liked)         : {norm}")
print(f"User weight to movies with Julia Roberts            : {dotproduct / norm}\n")


user_profile_likes = pd.DataFrame(columns=actors_list)

for actor_id in actors_list:
    for user_id in user_ratings.index:
        user_profile_likes.loc[user_id,actor_id] = np.dot(movie_casts[actor_id], user_ratings.loc[user_id]) / user_ratings.loc[user_id].sum()
        
user_profile_likes
#user_profile_likes.drop('Movie Rating', axis=1)

Dot Product of User Likes and Julia Roberts' Movies : 2
Norm of User Likes (number of movies liked)         : 10
User weight to movies with Julia Roberts            : 0.2



Unnamed: 0,Julia Roberts,Robin Williams,Clint Eastwood,Ian McKellen,Movie Rating
User A,0.176471,0.411765,0.5,0.529412,3.176471
User B,0.48,0.16,0.32,0.68,2.8
User C,0.307692,0.769231,0.230769,0.230769,3.538462


#### User Profile with scaled ratings
Let's look again at our utility matrix:

In [11]:
user_ratings

Unnamed: 0,M01,M02,M03,M04,M05,M06,M07,M08,M09,M10,M11,M12,M13,M14,M15
User A,0,1,2,1,0,4,4,0,3,0,0,5,5,4,5
User B,3,4,0,0,0,0,1,5,0,3,0,5,0,0,4
User C,0,0,0,2,4,0,0,0,0,4,0,0,0,0,3


There are a couple of problems we have to deal with:
* Users may only rate movies they like (or only movies that they don't like)

So, we think of it this way: with a 1-5 star rating, a rating above the user's average rating would be a high recommendation, while a rating below the user's average rating would be a weak recommendation. So, we want to, 
* normalize the user's ratings based on the average rating, then
* find the average of these normalized ratings

In [12]:
## Average Rating given by user over all videos - Compare with Example 9.4 : Movies given scaled ratings

# Replace all 0's with NaN so they don't influence the average
print(user_ratings.replace(0,np.nan).loc['User A'].mean())

user_avg_rating = user_ratings.replace(0,np.nan).loc['User A'].mean()
print(f"Average rating by this user = {user_avg_rating}")

3.4
Average rating by this user = 3.4


In [13]:
### Next find the movies with a given actor and the ratings the user has given it
actor_ratings_from_user = movie_casts['Julia Roberts'] * user_ratings.loc['User A']
actor_ratings_from_user

M01    0
M02    0
M03    2
M04    0
M05    0
M06    0
M07    4
M08    0
M09    0
M10    0
M11    0
M12    0
M13    0
M14    0
M15    0
dtype: int64

In [14]:
### Normalize the ratings by subtracting the average rating
actor_ratings_from_user = actor_ratings_from_user.apply(lambda x: 0 if x==0 else x-user_avg_rating)
actor_ratings_from_user

M01    0.0
M02    0.0
M03   -1.4
M04    0.0
M05    0.0
M06    0.0
M07    0.6
M08    0.0
M09    0.0
M10    0.0
M11    0.0
M12    0.0
M13    0.0
M14    0.0
M15    0.0
dtype: float64

In [15]:
### The average is the score
actor_ratings_from_user.replace(0,np.nan).mean()

-0.3999999999999999

In [16]:
a = 'Julia Roberts'
u = 'User A'

avg_rating = user_ratings.replace(0,np.nan).loc[u].mean()
(movie_casts[a] * user_ratings.loc[u]).apply(lambda x: 0 if x==0 else x-avg_rating).replace(0,np.nan).mean()

-0.3999999999999999

In [17]:
## User Profile - Apply for all users and actors
user_profile_ratings = pd.DataFrame(columns=actors_list)

for actor_id in actors_list:
    for user_id in user_ratings.index:
        avg_rating = user_ratings.replace(0,np.nan).loc[user_id].mean()  # Average rating given by user
        tmp = movie_casts[actor_id] * user_ratings.loc[user_id]          # Array of ratings given by user involving the given actor 
        user_profile_ratings.loc[user_id, actor_id] = tmp.apply(lambda x: 0 if x==0 else x-avg_rating).replace(0,np.nan).mean() # Subtract avg rating from ratings given, then take the mean
        

user_profile_ratings.drop('Movie Rating', axis=1, inplace=True)
user_profile_ratings

Unnamed: 0,Julia Roberts,Robin Williams,Clint Eastwood,Ian McKellen
User A,-0.4,0.1,0.0,1.1
User B,-0.571429,-1.571429,0.428571,0.678571
User C,0.75,0.083333,-0.25,-0.25


### Recommendations
At this point, we have an item profile (relating movies to features) and a user profile (relating users to the same features). Now, we can make a recommendation by calculating the distance (using your distance measure of choice) between a user's profile and the different movies in the item profile. We are going to use a cosine distance in this example.

#### Recommendations using binary (like/dislike) system

In [18]:
movie_casts

Unnamed: 0,Julia Roberts,Robin Williams,Clint Eastwood,Ian McKellen,Movie Rating
M01,1,0,0,1,3
M02,0,0,1,0,5
M03,1,0,0,0,4
M04,0,1,0,0,2
M05,0,1,0,0,4
M06,0,0,1,0,4
M07,1,1,0,0,3
M08,1,0,0,1,1
M09,0,0,1,1,5
M10,1,1,0,0,5


In [19]:
user_profile_likes

Unnamed: 0,Julia Roberts,Robin Williams,Clint Eastwood,Ian McKellen,Movie Rating
User A,0.176471,0.411765,0.5,0.529412,3.176471
User B,0.48,0.16,0.32,0.68,2.8
User C,0.307692,0.769231,0.230769,0.230769,3.538462


In [20]:
user = "User A"
movie = "M10"

cosine_distance(movie_casts.drop('Movie Rating', axis=1).loc[movie],
                user_profile_likes.drop('Movie Rating', axis=1).loc[user])

0.48650425541051995

#### Recommendations using scaled (stars) system

In [21]:
user_profile_ratings.head()

Unnamed: 0,Julia Roberts,Robin Williams,Clint Eastwood,Ian McKellen
User A,-0.4,0.1,0.0,1.1
User B,-0.571429,-1.571429,0.428571,0.678571
User C,0.75,0.083333,-0.25,-0.25


In [22]:
cosine_distance(movie_casts.drop('Movie Rating', axis=1).loc['M06'],
                user_profile_ratings.loc['User A'])

7.560676802487164e-17

Advantages to Content-based approach
* No need for data from other users
* Able to cater to unique tastes
* Can include new and unpopular items in recommendations
* Can include explanations for recommendations (Because you liked 'A', you might like 'B')

Disadvantages to Content-based approach
* Difficult to find the appropriate features
  * Images, music, etc.
* Overspecialization (sticks to user's profile - doesn't go outside of that
* Can't take advantage of experience (quality judgments) from other users
* How do you make recommendations to a new user who doesn't have a profile?

-----

## Collaborative-filtering approach
The basic idea here is to find other users in the Utility Matrix with similar ratings using a distance measure. For example, there is a section on each Amazon page labelled, "Customers who bought this item also bought." Instead of creating an Item Profile, we just use that item's column in the utility matrix, and instead of using a User Profile, we just use that user's row in the utility matrix. We then look for other columns/rows similar to the first. 

Common distance measures with advantages and disadvantages
* Jaccard Distance
    + Advantages: 
        * Ignores values
        * works with sparse matrices
    - Disadvantages: 
        * Loses details from the utility matrix (such as ratings)
        * Two users with opposite ratings may be labelled similar just because they have ratings
* Cosine Distance
    + Advantages:
        * Easy to calculate
        * They don't have to be exactly the same to have a low distance - just similar enough to have a small angle
    - Disadvantages: 
        * Have to fill blank values with a 0 which acts more like a negative rating
* Rounding the Data
    * Replace high values with a 1 and low values with a 0 or a blank
    + Advantages: 
        * Simplifies a scaled rating down to a binary rating
        * Easier calculation
        * Allows Jaccard Distance calculation while maintaining some of the details of the Utility Matrix
* Normalizing Ratings
    * Subtract average rating given by user from all ratings so low ranks are negative and high ranks are positive
    + Advantages:
        * Works very well with the Cosine Distance

### Duality of Similarity
A utility matrix can both relate items to users and relate users to items. However, they are not always symmetric relationships. Two examples of how symmetry is broken:
1. Even though we can find the similarity between different items, we need to take an additional step in order to make recommendations to users.
    * One method may require more calculations than the other
    * For example, we need to normalize each user's ratings to estimate true ratings in cases where users tend to always give either high or low ratings
2. A user's behavior and an item's features may not always be compatible. 
    * For example, a piece of music must be only one genre, whereas users can like multiple genres
    * If you have two people who both like 80's music, that doesn't necessarily mean they have the same tastes. One might like 80's music and classical music, while another might like 80's music and jazz. So, using one user's preferences to make recommendations to another may not always work.

The duality of the utility matrix can be seen in the following scenarios:
1. We can compare one user A's row to find $n$ other users most similar to A, then take average ratings of each item I to predict whether user A will like I or not
2. We can compare one item I's column to find $m$ other items most similar to I, then take the average rating of all these similar items given by user A to predict whether user A will like I or not

For either method, we need to fill in (predict) values for most items in a user's row in order to make predictions. Which method we use (using similar users or similar items to predict the value) doesn't matter so much. However, there are tradeoffs:
* Using similar users, we only have to do the process once to get enough information to fill in the row for user A
* Using similar items, we can get more reliable information, but at the expense of doing more calculations

-----

## The Netflix Challenge
Netflix wanted a better algorithm to recommend movies to their users. The offered a \$1,000,000 prize to the first person who could beat their algorithm (CineMatch) by 10\%. It was discovered early that the algorithm Netflix followed was not very good. There were many great entries over the next three years:

#### Basic Algorithm
The original algorithm for Netflix took the average rating given by a user $u$ on all rated movies and the average of the ratings for movie $m$ by all users who rated that movie (in other words, they took the average of a user's row and the average of a movie's column, then averaged those two values together).

The basic algorithm was only 3\% worse than CineMatch.

#### Singular Value Decomposition
A team of three students used a technique called __UV-decomposition__ (also known as Singular Value Decomposition) to give a 7\% improvement.

SVD will be the next topic we address in this class.

#### The winner
The winner actually used a combination of several different algorithms that had been developed independently.

A second team also used a combination of several different algorithms, but lost the competition by mere minutes.

The time at which the rating was given turned out to be an important factor.

-----
* Exercise 9.2.1 a-d
* Exercise 9.2.2 a
  * Calculate and interpret the cosine distances between each computer
* Exercise 9.2.3 a-b
* Exercise 9.3.1 a-f