# Group Recommender Systems - Tutorial 2 (Lab 2)

Now, let's focus on Group Recommendations. For this, we need a group! For simplicity, we will create a random group by selecting 5 users in our dataset. We will select 5 users for which we have at least 200 evaluations.

In [8]:
preprocessed_dataset_folder = "../preprocessed_dataset"

import pandas as pd
ratings_df = pd.read_csv(preprocessed_dataset_folder+"/ratings.csv") 
movies_df = pd.read_csv(preprocessed_dataset_folder+"/movies.csv", index_col="item")


In [9]:
users_ratings = ratings_df.groupby(['user']).count()
selected = users_ratings['rating'] > 200
selected_users = users_ratings.loc[selected]
random_selected = selected_users.sample(n=5) # sample() returns now n random rows from the dataframe. The returned object is a dataframe with five rows. 
select_column_df = random_selected.reset_index()['user'] # reset_index() create a new index, and the userId became a column. Then, we can filter using the column name
group_users = list(select_column_df) # iloc select by index, since our dataframe only has one row we read it from the index 0
print(group_users)

[480, 469, 132, 597, 42]


Let's assume we want to recommend to this group a list of 10 movies that nobody in the group has seen yet. We first need to determine the list of possible candidates. For simplicity, we will only consider movies for which we more then 10 evaluations.

In [10]:
group_ratings = ratings_df.loc[ratings_df['user'].isin(group_users)]
all_movies = set(movies_df.index.tolist())
num_ratings_df = ratings_df.groupby(['item']).count()
considered_movies = set(num_ratings_df.loc[num_ratings_df['user'] > 10].reset_index()['item'])

group_seen_movies = set(group_ratings['item'].tolist())
group_unseen_movies = considered_movies - group_seen_movies

print(len(all_movies))
print(len(considered_movies))
print(len(group_seen_movies))
print(len(group_unseen_movies))

4787
1450
1043
630


Now, we need to evaluate individuals' preverences for the unseen movies. To do so, we use the Lenskit library. We will use the same CF recommender used in the previous example. To generate the Dataframe with user-item pairs to pass as input in the *predict* function, we use the [product](https://docs.python.org/3/library/itertools.html#itertools.product) method of the itertools library, which takes as imput two lists and returns all the possible combinations between elements of the two lists. This is passed as input for the Dataframe constructor, which will then generate a Dataframe containing a pair on each row.

In [13]:
import itertools
from lenskit.algorithms import Recommender
from lenskit.algorithms.user_knn import UserUser

user_user = UserUser(15, min_nbrs=3)  # Minimum (3) and maximum (15) number of neighbors to consider
recsys = Recommender.adapt(user_user)
recsys.fit(ratings_df)
group_unseen_df = pd.DataFrame(list(itertools.product(group_users, group_unseen_movies)), columns=['user', 'item'])
group_unseen_df['predicted_rating'] = recsys.predict(group_unseen_df)
display(group_unseen_df)

Unnamed: 0,user,item,predicted_rating
0,480,122882,4.209213
1,480,2052,2.652377
2,480,5,2.505063
3,480,2053,2.148116
4,480,9,2.424205
...,...,...,...
3145,42,112623,3.889903
3146,42,55282,3.402300
3147,42,55290,3.825411
3148,42,143355,3.762144


We have now our predicted ratings.
We can apply an aggregation strategy to generate the group recommendations.

#### 1. Aggregation Strategies

Let's implement some of the aggregation strategies seen in the lecture today.

##### Additive strategy

The Additive strategy considers as group rating the sum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *sum*.

In [17]:
# Additive strategy

additive_df = group_unseen_df.groupby('item').sum()
additive_df = additive_df.join(movies_df['title'], on='item')
additive_df = additive_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(additive_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,23.213572
1,3030,yojimbo,22.793834
2,1411,hamlet,22.689449
3,951,his girl friday,22.055984
4,3836,kelly's heroes,21.677632
5,3435,double indemnity,21.442027
6,905,it happened one night,21.437941
7,1283,high noon,21.404303
8,57669,in bruges,21.240705
9,168252,logan,21.196714


##### Least Misery strategy

The Least Misery strategy considers as group rating the minimum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. As we did before, we can implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *min*.

In [19]:
# least misery

least_misery_df = group_unseen_df.groupby('item').min()
least_misery_df = least_misery_df.join(movies_df['title'], on='item')
least_misery_df = least_misery_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(least_misery_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,4.284263
1,1411,hamlet,4.070693
2,3030,yojimbo,4.049809
3,951,his girl friday,3.967801
4,112175,how to train your dragon 2,3.822641
5,3435,double indemnity,3.782478
6,168252,logan,3.749123
7,905,it happened one night,3.747862
8,69757,(500) days of summer,3.744409
9,94959,moonrise kingdom,3.737647


##### Most Pleasure strategy

The Most Pleasure strategy considers as group rating the maximum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. Again, We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *max*.

In [20]:
# most pleasure

most_pleasure_df = group_unseen_df.groupby('item').max()
most_pleasure_df = most_pleasure_df.join(movies_df['title'], on='item')
most_pleasure_df = most_pleasure_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(most_pleasure_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,5.149966
1,1411,hamlet,4.976942
2,28,persuasion,4.922663
3,3030,yojimbo,4.921011
4,1283,high noon,4.909004
5,951,his girl friday,4.903882
6,3836,kelly's heroes,4.856575
7,1041,secrets & lies,4.80605
8,3307,city lights,4.802022
9,176371,blade runner 2049,4.791873


##### Fairness strategy

For the Fairness strategy we have an ordering between the group members, and at each round one group member choose the best item for him/her. Hence, we can compute the preference lists for each group member separately. Then we iterate over the group members, and at each iteration we select one element from the list of the correct user, and add it to the result list. Finally, we create a dataframe and enrich the information of the movies selected 

In [32]:
# Fairness

import pandas as pd

def generate_preference_list(user):
    individual_df = group_unseen_df.loc[group_unseen_df['user']==user]
    return list(individual_df.sort_values(by="predicted_rating", ascending=False).reset_index()['item'])

individual_preference_lists = dict()
for member in group_users:
    individual_preference_lists[member] = generate_preference_list(member)
    
result = list()
for i in range(10):
    user = group_users[i % 5]
    user_best = individual_preference_lists[user].pop(0)
    for member in group_users:
        if user_best in individual_preference_lists[member]:
            individual_preference_lists[member].remove(user_best)
    result.append(user_best)
    
fairness_df = pd.DataFrame(result, columns=['item']).join(movies_df['title'], on='item')
display(fairness_df)

Unnamed: 0,item,title
0,3030,yojimbo
1,3451,guess who's coming to dinner
2,1411,hamlet
3,28,persuasion
4,905,it happened one night
5,951,his girl friday
6,3836,kelly's heroes
7,69757,(500) days of summer
8,1283,high noon
9,3435,double indemnity


In [48]:
# To check individual evaluations on a specific item
group_unseen_df.loc[group_unseen_df['item']==3451]

Unnamed: 0,user,item,predicted_rating
448,480,3451,4.361034
1078,469,3451,4.762369
1708,132,3451,4.284263
2338,597,3451,5.149966
2968,42,3451,4.655939


##### 1.1 EXERCISE

Implement the Approval Voting strategy for group recommendations.

#### 2 Explanations for Group Recommenders

Let's see now some simple strategy to generate basic explanations for the group recommendation strategies implemented before. For the Additive, Least Misery and Most Pleasure strategies, we will use social-choice based explanations as defined in [Barile et al., 2021](http://ceur-ws.org/Vol-2955/paper11.pdf). For the Fairness strategy, we will use a generic formulation:

- Additive: "i_k has been recommended to the group since it achieves the highest total rating."
- Least Misery: "i_k has been recommended to the group since no group members has a real problem with it."
- Most Pleasure: "i_k has been recommended to the group since it achieves the highest of all individual group members."
- Fairness: "i_k has been recommended to the group since it is the favourite for u_j, and it's his/her turn to choose."

In [50]:
explanations = {
    "ADD" : "<item> has been recommended to the group since it achieves the highest total rating.\n",
    "LMS" : "<item> has been recommended to the group since no group members has a real problem with it.\n",
    "MPL" : "<item> has been recommended to the group since it achieves the highest of all individual group members.\n",
    "FAI" : "<item> has been recommended to the group since it is the favourite for <user>, and it's his/her turn to choose.\n"
}

# Present explanations for the first item of each strategy
movie_title = additive_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["ADD"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = least_misery_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["LMS"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = most_pleasure_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["MPL"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = fairness_df['title'].iloc[0]
user = group_users[0]
print("Recommendation: " + movie_title.title())
print(explanations["FAI"]
      .replace("<item>", "The movie \"" + movie_title.title() + "\"")
      .replace("<user>", "the user with id " + str(user)))

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it achieves the highest total rating.

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since no group members has a real problem with it.

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it achieves the highest of all individual group members.

Recommendation: Yojimbo
The movie "Yojimbo" has been recommended to the group since it is the favourite for the user with id 480, and it's his/her turn to choose.



##### 2.1 EXERCISE

Implement the explanation for the Approval Voting strategy for group recommendations, and print the corresponding explanation for the best movie for the group.