# Hybrid Recommender System

In [1]:
import pandas as pd
import numpy as np

# Exercise 1

In this exercise, we are going to try out different methods, that can be used to combine rankings from multiple models.

Below you can see a toy dataframe with the scores of how likely a user will like 5 different items estimated with 2 different models:
- Model 1: rating predictions from a collaborative filtering model
- Model 2: cosine similarities from a content-based model

In [36]:
d = {'item_id': ['I1', 'I2', 'I3', 'I4', 'I5'],
     'model1_score': [1.2, 2.8, 3.0, 4.5, 5.0],
     'model2_score': [0.8, 0.5, 0.2, 0.9, 0.4]}
df = pd.DataFrame(data=d)

## 1.1

Rank the 5 items according to the scores from model 1 and 2 respectively (higher score is better in both models).

In [41]:
df['model1_rank'] = df['model1_score'].rank(method='first', ascending=False).astype(int)
df[['item_id','model1_rank']].sort_values(by="model1_rank", ascending=True)


Unnamed: 0,item_id,model1_rank
4,I5,1
3,I4,2
2,I3,3
1,I2,4
0,I1,5


In [42]:
df['model2_rank'] = df['model2_score'].rank(method='first', ascending=False).astype(int)
df[['item_id','model2_rank']].sort_values(by="model2_rank", ascending=True)


Unnamed: 0,item_id,model2_rank
3,I4,1
0,I1,2
1,I2,3
4,I5,4
2,I3,5


## 1.2

Normalize the scores from the 2 models (by subtracting the mean and dividing with the standard deviation) and compute a combined rank using the **Weighted Sum** method with $\alpha=\beta=1$. Round the results to 3 decimal points.

In [67]:
alpha, beta = 1, 1

df['model1_score_norm'] = (df['model1_score'] - df['model1_score'].mean())/df['model1_score'].std()
df['model2_score_norm'] = (df['model2_score'] - df['model2_score'].mean())/df['model2_score'].std()

df['weighted_sum'] = alpha * df['model1_score_norm'] + beta * df['model2_score_norm']

df['combined_score_rank'] = df['weighted_sum'].rank(method='min', ascending=False).astype(int)

df[['item_id', 'weighted_sum', 'combined_score_rank']].sort_values(by="combined_score_rank", ascending=True)
#<YOUR CODE HERE>

Unnamed: 0,item_id,weighted_sum,combined_score_rank
3,I4,1.976625,1
4,I5,0.572962,2
1,I2,-0.540125,3
0,I1,-0.560767,4
2,I3,-1.448695,5


## 1.3

Merge the ranking from the 2 models using the **Borda Fuse** method.

In [70]:
df['borda_fuse_points'] = (df[['model1_rank', 'model2_rank']] - 5).abs().sum(axis=1)

In [71]:
df

Unnamed: 0,item_id,model1_score,model2_score,model1_rank,model2_rank,model1_score_norm,model2_score_norm,weighted_sum,combined_score_rank,borda_fuse_points,borda_fuse_rank
0,I1,1.2,0.8,5,2,-1.393819,0.833052,-0.560767,4,3,3
1,I2,2.8,0.5,4,3,-0.331862,-0.208263,-0.540125,3,3,3
2,I3,3.0,0.2,3,5,-0.199117,-1.249578,-1.448695,5,2,5
3,I4,4.5,0.9,2,1,0.796468,1.180157,1.976625,1,7,1
4,I5,5.0,0.4,1,4,1.12833,-0.555368,0.572962,2,5,2


In [None]:
# "min" instead of "first" is used here, because if two items have the same points, they should have the same rank
df['borda_fuse_rank'] = df['borda_fuse_points'].rank(method='min', ascending=False).astype(int)

In [73]:
df[['item_id','borda_fuse_rank']].sort_values(by="borda_fuse_rank", ascending=True)

Unnamed: 0,item_id,borda_fuse_rank
3,I4,1
4,I5,2
0,I1,3
1,I2,3
2,I3,5


## 1.4

Merge the ranking from the 2 models using the **Reciprocal Rank Fusion** method with $k=0$.

In [76]:
df

Unnamed: 0,item_id,model1_score,model2_score,model1_rank,model2_rank,model1_score_norm,model2_score_norm,weighted_sum,combined_score_rank,borda_fuse_points,borda_fuse_rank
0,I1,1.2,0.8,5,2,-1.393819,0.833052,-0.560767,4,3,3
1,I2,2.8,0.5,4,3,-0.331862,-0.208263,-0.540125,3,3,3
2,I3,3.0,0.2,3,5,-0.199117,-1.249578,-1.448695,5,2,5
3,I4,4.5,0.9,2,1,0.796468,1.180157,1.976625,1,7,1
4,I5,5.0,0.4,1,4,1.12833,-0.555368,0.572962,2,5,2


In [78]:
 df[['model1_rank','model2_rank']]

Unnamed: 0,model1_rank,model2_rank
0,5,2
1,4,3
2,3,5
3,2,1
4,1,4


In [84]:
df[['model1_rank','model2_rank']].apply(lambda x: print(x), axis=1)

model1_rank    5
model2_rank    2
Name: 0, dtype: int64
model1_rank    4
model2_rank    3
Name: 1, dtype: int64
model1_rank    3
model2_rank    5
Name: 2, dtype: int64
model1_rank    2
model2_rank    1
Name: 3, dtype: int64
model1_rank    1
model2_rank    4
Name: 4, dtype: int64


0    None
1    None
2    None
3    None
4    None
dtype: object

In [87]:
df['reciprocal_rank_fusion_score'] = df[['model1_rank','model2_rank']].apply(lambda x: 1 / x['model1_rank'] + 1/ x['model2_rank'], axis=1)

In [88]:
df['reciprocal_rank_fusion_rank'] =  df['reciprocal_rank_fusion_score'].rank(method='min', ascending=False).astype(int)
df[['item_id','reciprocal_rank_fusion_rank']].sort_values(by="reciprocal_rank_fusion_rank", ascending=True)

Unnamed: 0,item_id,reciprocal_rank_fusion_rank
3,I4,1
4,I5,2
0,I1,3
1,I2,4
2,I3,5


# Exercise 2

In this exercise, we are going to predict the rating of a single user-item pair using a hybrid method, where we use the user profiles from a content-based method as input to a collaborative filtering (neighborhood-based) method.

Download and load the provided dataframe containing content-based user profiles of the user with reviewerID `A25C2M3QF9G7OQ` and all users that have rated the item with asin `B00EYZY6LQ`.

In [89]:
user_profiles = pd.read_pickle("user_profiles.pkl")

## 2.1 

Compute the cosine similarities between user `A25C2M3QF9G7OQ` and the other users based on their user profiles. 
What are the similarities and what are the ratings given by these users on item `B00EYZY6LQ`?

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Load data generated in Session 1 or the provided data splits (see Absalon, W7 Lab)
df_train = pd.read_pickle("train_dataframe.pkl")
df_test = pd.read_pickle("test_dataframe.pkl")

user_item_matrix = df_train.pivot_table(index='reviewerID', columns='asin', values='overall')
user_item_matrix = user_item_matrix.fillna(0)
input_users = user_item_matrix[user_item_matrix['B00EYZY6LQ']>0]

#<YOUR CODE HERE>

## 2.2

Predict the rating for user `A25C2M3QF9G7OQ` on item `B00EYZY6LQ` based on the ratings from the $3$ most similar users, using a weighted (by similarity) average. What is the prediction (round it to 2 decimal points)?

In [None]:
k = 3
#<YOUR CODE HERE>
prediction_hybrid = #<YOUR CODE HERE>

print(f'Predicted rating: {round(prediction_hybrid,2)}')