# Group Recommender Systems - Tutorial 2 (Lab 2)

In this tutorial, we will focus on Group Recommender Systems. After completing this tutorial, you will be able to: 
- Implement the basic agregation strategies for group recommendations.
- Generate a simple textual explanation for such strategies.

#### Summary

1. Selection of a random group in our dataset
2. Aggregation Strategies for Group Recommenders
3. Explanations for Group Recommenders


#### 1. Selection of a random group in our dataset

First, we need a group! So, we will select a random set of five users from our dataset. For simplicity, we will focus on users having at least 200 evaluations.

In [1]:
preprocessed_dataset_folder = "../preprocessed_dataset"

import pandas as pd
ratings_df = pd.read_csv(preprocessed_dataset_folder+"/ratings.csv") 
movies_df = pd.read_csv(preprocessed_dataset_folder+"/movies.csv", index_col="item")


In [2]:
users_ratings = ratings_df.groupby(['user']).count()
selected = users_ratings['rating'] > 200
selected_users = users_ratings.loc[selected]
random_selected = selected_users.sample(n=5) # sample() returns now n random rows from the dataframe. The returned object is a dataframe with five rows. 
select_column_df = random_selected.reset_index()['user'] # reset_index() create a new index, and the userId became a column. Then, we can filter using the column name
group_users = list(select_column_df) # iloc select by index, since our dataframe only has one row we read it from the index 0
print(group_users)

[226, 305, 567, 45, 599]


Let us assume we want to recommend to this group a list of 10 movies that nobody in the group has seen yet. We first need to determine the list of possible candidates. For simplicity, we will only consider movies for which we more then 10 evaluations.

In [3]:
group_ratings = ratings_df.loc[ratings_df['user'].isin(group_users)]
all_movies = set(movies_df.index.tolist())
num_ratings_df = ratings_df.groupby(['item']).count()
considered_movies = set(num_ratings_df.loc[num_ratings_df['user'] > 10].reset_index()['item'])

group_seen_movies = set(group_ratings['item'].tolist())
group_unseen_movies = considered_movies - group_seen_movies

print(len(all_movies))
print(len(considered_movies))
print(len(group_seen_movies))
print(len(group_unseen_movies))

4633
1421
1879
308


Now, we need to evaluate individuals' preverences for the unseen movies. To do so, we use the Lenskit library. We will use the same CF recommender used in the previous example. To generate the Dataframe with user-item pairs to pass as input in the *predict* function, we use the [product](https://docs.python.org/3/library/itertools.html#itertools.product) method of the itertools library, which takes as imput two lists and returns all the possible combinations between elements of the two lists. This is passed as input for the Dataframe constructor, which will then generate a Dataframe containing a pair on each row.

In [4]:
import itertools
from lenskit.algorithms import Recommender
from lenskit.algorithms.user_knn import UserUser

user_user = UserUser(15, min_nbrs=3)  # Minimum (3) and maximum (15) number of neighbors to consider
recsys = Recommender.adapt(user_user)
recsys.fit(ratings_df)
group_unseen_df = pd.DataFrame(list(itertools.product(group_users, group_unseen_movies)), columns=['user', 'item'])
group_unseen_df['predicted_rating'] = recsys.predict(group_unseen_df)
display(group_unseen_df)

Unnamed: 0,user,item,predicted_rating
0,226,76293,3.172668
1,226,6155,2.861319
2,226,69644,2.681574
3,226,41997,3.576444
4,226,14,3.650704
...,...,...,...
1535,599,7151,2.897302
1536,599,49649,1.215567
1537,599,5108,1.948049
1538,599,7162,2.951615


We have now our predicted ratings.
We can apply an aggregation strategy to generate the group recommendations.

#### 2. Aggregation Strategies for Group Recommenders

Let's implement some of the aggregation strategies seen in the lecture today.

##### Additive strategy

The Additive strategy considers as group rating the sum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *sum*.

In [5]:
# Additive strategy

additive_df = group_unseen_df.groupby('item').sum()
additive_df = additive_df.join(movies_df['title'], on='item')
additive_df = additive_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(additive_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,21.217272
1,1217,ran,20.9169
2,3836,kelly's heroes,20.360848
3,1304,butch cassidy and the sundance kid,20.067616
4,1041,secrets & lies,19.753255
5,1273,down by law,19.524883
6,1235,harold and maude,19.447238
7,866,bound,19.42384
8,955,bringing up baby,19.413516
9,1227,once upon a time in america,19.209418


##### Least Misery strategy

The Least Misery strategy considers as group rating the minimum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. As we did before, we can implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *min*.

In [6]:
# least misery

least_misery_df = group_unseen_df.groupby('item').min()
least_misery_df = least_misery_df.join(movies_df['title'], on='item')
least_misery_df = least_misery_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(least_misery_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,1217,ran,3.389185
1,3451,guess who's coming to dinner,3.221164
2,1304,butch cassidy and the sundance kid,3.121459
3,1273,down by law,3.076135
4,866,bound,3.071139
5,3836,kelly's heroes,3.061582
6,1041,secrets & lies,2.977731
7,1227,once upon a time in america,2.9425
8,3246,malcolm x,2.914512
9,417,barcelona,2.863815


##### Most Pleasure strategy

The Most Pleasure strategy considers as group rating the maximum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. Again, We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *max*.

In [7]:
# most pleasure

most_pleasure_df = group_unseen_df.groupby('item').max()
most_pleasure_df = most_pleasure_df.join(movies_df['title'], on='item')
most_pleasure_df = most_pleasure_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(most_pleasure_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,4.972993
1,955,bringing up baby,4.763494
2,3836,kelly's heroes,4.756509
3,1217,ran,4.738371
4,1041,secrets & lies,4.69117
5,905,it happened one night,4.676859
6,933,to catch a thief,4.621698
7,3504,network,4.611347
8,28,persuasion,4.594919
9,1273,down by law,4.574629


##### Fairness strategy

For the Fairness strategy we have an ordering between the group members, and at each round one group member choose the best item for him/her. Hence, we can compute the preference lists for each group member separately. Then we iterate over the group members, and at each iteration we select one element from the list of the correct user, and add it to the result list. Finally, we create a dataframe and enrich the information of the movies selected 

In [8]:
# Fairness

import pandas as pd

def generate_preference_list(user):
    individual_df = group_unseen_df.loc[group_unseen_df['user']==user]
    return list(individual_df.sort_values(by="predicted_rating", ascending=False).reset_index()['item'])

individual_preference_lists = dict()
for member in group_users:
    individual_preference_lists[member] = generate_preference_list(member)
    
result = list()
for i in range(10):
    user = group_users[i % 5]
    user_best = individual_preference_lists[user].pop(0)
    for member in group_users:
        if user_best in individual_preference_lists[member]:
            individual_preference_lists[member].remove(user_best)
    result.append(user_best)
    
fairness_df = pd.DataFrame(result, columns=['item']).join(movies_df['title'], on='item')
display(fairness_df)

Unnamed: 0,item,title
0,1217,ran
1,3451,guess who's coming to dinner
2,1304,butch cassidy and the sundance kid
3,1041,secrets & lies
4,3836,kelly's heroes
5,1227,once upon a time in america
6,955,bringing up baby
7,1273,down by law
8,905,it happened one night
9,1235,harold and maude


In [9]:
# To check individual evaluations on a specific item
group_unseen_df.loc[group_unseen_df['item']==3740]

Unnamed: 0,user,item,predicted_rating


#### EXERCISE

Implement the Approval Voting and Plurality Voing strategies for group recommendations.

##### Solution

###### Plurality Voting

For the plurality voting strategy each user votes for all the items having the maximum score. Then, the votes are counted and the items with the highest number of votes are chosen. We iterate the process until we have selected 10 items.

In [10]:
def generate_user_votes(user, selected):
    # select the ratings for the considered user
    individual_df = group_unseen_df.loc[group_unseen_df['user']==user]
    
    # remove the ratings for the already selected movies
    individual_df = individual_df.loc[~individual_df['item'].isin(selected)]
    
    # compute the max rating for the considered movies
    max_eval = max(list(individual_df['predicted_rating']))
    
    # return the items for which the user has the maximum predicted rating
    voted_items_df = individual_df.loc[individual_df['predicted_rating']==max_eval]
    return voted_items_df

result = list()


i = 1 # to print the current iteration
while len(result) < 10:
    print("###### ITERATION ", i)
    votes_df = pd.DataFrame()

    # computing votes for all the group members
    for member in group_users:
        user_voted_items_df = generate_user_votes(member, result)
        votes_df = pd.concat([votes_df, user_voted_items_df])

    # votes_df contains the items that each user rated with the highest rating
    display(votes_df)

    # We can group it by item and count the number of rows, to obtain the number of votes for each item
    count_df = votes_df.groupby('item').count()
    display(count_df)

    # We need to select all the items with the highest number of votes.
    # We compute the max of all the votes, and then select the items having the maximum number of votes
    max_votes = max(list(count_df['user']))
    selected_items = list(count_df.loc[count_df['user']==max_votes].index.values)

    print(selected_items)

    # We add the selected items to the result list
    result = result + selected_items
    print(result)
    
    i = i + 1
    
# We could have selected more than 10 items, so we just keep the first 10
result = result[:10]

plurality_df = pd.DataFrame(result, columns=['item']).join(movies_df['title'], on='item')
display(plurality_df)

###### ITERATION  1


Unnamed: 0,user,item,predicted_rating
111,226,1217,4.5455
536,305,3451,4.972993
727,567,1217,3.389185
1152,45,3451,4.884474
1460,599,3451,3.618163


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
1217,2,2
3451,3,3


[3451]
[3451]
###### ITERATION  2


Unnamed: 0,user,item,predicted_rating
111,226,1217,4.5455
584,305,955,4.763494
727,567,1217,3.389185
1035,45,1217,4.694823
1343,599,1217,3.549021


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
955,1,1
1217,4,4


[1217]
[3451, 1217]
###### ITERATION  3


Unnamed: 0,user,item,predicted_rating
173,226,1304,4.469978
584,305,955,4.763494
789,567,1304,3.121459
931,45,1041,4.69117
1405,599,1304,3.485169


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
955,1,1
1041,1,1
1304,3,3


[1304]
[3451, 1217, 1304]
###### ITERATION  4


Unnamed: 0,user,item,predicted_rating
150,226,3836,4.413464
584,305,955,4.763494
764,567,1273,3.076135
931,45,1041,4.69117
1382,599,3836,3.483627


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
955,1,1
1041,1,1
1273,1,1
3836,2,2


[3836]
[3451, 1217, 1304, 3836]
###### ITERATION  5


Unnamed: 0,user,item,predicted_rating
118,226,1227,4.309181
584,305,955,4.763494
764,567,1273,3.076135
931,45,1041,4.69117
1354,599,1235,3.424292


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
955,1,1
1041,1,1
1227,1,1
1235,1,1
1273,1,1


[955, 1041, 1227, 1235, 1273]
[3451, 1217, 1304, 3836, 955, 1041, 1227, 1235, 1273]
###### ITERATION  6


Unnamed: 0,user,item,predicted_rating
257,226,417,4.202029
570,305,933,4.621698
830,567,866,3.071139
1163,45,905,4.676859
1476,599,914,3.419559


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
417,1,1
866,1,1
905,1,1
914,1,1
933,1,1


[417, 866, 905, 914, 933]
[3451, 1217, 1304, 3836, 955, 1041, 1227, 1235, 1273, 417, 866, 905, 914, 933]


Unnamed: 0,item,title
0,3451,guess who's coming to dinner
1,1217,ran
2,1304,butch cassidy and the sundance kid
3,3836,kelly's heroes
4,955,bringing up baby
5,1041,secrets & lies
6,1227,once upon a time in america
7,1235,harold and maude
8,1273,down by law
9,417,barcelona


#### Approval Voting

Voters are allowed to vote for as many alternatives as they wish. We assume that the users vote for the items with a rating above a threshold of 3.5

In [11]:
group_unseen_temp_df = group_unseen_df.copy()
group_unseen_temp_df['voted'] = group_unseen_temp_df['predicted_rating'].apply(lambda x: 1 if x>3 else 0)

approval_df = group_unseen_temp_df.groupby('item').sum()
approval_df = approval_df.sort_values(by="voted", ascending=False)
approval_df = approval_df.join(movies_df['title'], on='item')
display(approval_df.head(10))

Unnamed: 0_level_0,user,predicted_rating,voted,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3451,1742,21.217272,5,guess who's coming to dinner
1217,1742,20.9169,5,ran
866,1742,19.42384,5,bound
1273,1742,19.524883,5,down by law
3836,1742,20.360848,5,kelly's heroes
1304,1742,20.067616,5,butch cassidy and the sundance kid
3359,1742,19.146668,4,breaking away
1358,1742,17.925523,4,sling blade
8366,1742,18.359198,4,saved!
8958,1742,18.133374,4,ray


#### 2 Explanations for Group Recommenders

Let's see now some simple strategy to generate basic explanations for the group recommendation strategies implemented before. For the Additive, Least Misery and Most Pleasure strategies, we will use social-choice based explanations as defined in [Barile et al., 2021](http://ceur-ws.org/Vol-2955/paper11.pdf). For the Fairness strategy, we will use a generic formulation:

- Additive: "i_k has been recommended to the group since it achieves the highest total rating."
- Least Misery: "i_k has been recommended to the group since no group members has a real problem with it."
- Most Pleasure: "i_k has been recommended to the group since it achieves the highest of all individual group members."
- Fairness: "i_k has been recommended to the group since it is the favourite for u_j, and it's his/her turn to choose."

In [12]:
explanations = {
    "ADD" : "<item> has been recommended to the group since it achieves the highest total rating.\n",
    "LMS" : "<item> has been recommended to the group since no group members has a real problem with it.\n",
    "MPL" : "<item> has been recommended to the group since it achieves the highest of all individual group members.\n",
    "FAI" : "<item> has been recommended to the group since it is the favourite for <user>, and it's his/her turn to choose.\n"
}

# Present explanations for the first item of each strategy
movie_title = additive_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["ADD"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = least_misery_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["LMS"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = most_pleasure_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["MPL"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = fairness_df['title'].iloc[0]
user = group_users[0]
print("Recommendation: " + movie_title.title())
print(explanations["FAI"]
      .replace("<item>", "The movie \"" + movie_title.title() + "\"")
      .replace("<user>", "the user with id " + str(user)))

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it achieves the highest total rating.

Recommendation: Ran
The movie "Ran" has been recommended to the group since no group members has a real problem with it.

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it achieves the highest of all individual group members.

Recommendation: Ran
The movie "Ran" has been recommended to the group since it is the favourite for the user with id 226, and it's his/her turn to choose.



##### EXERCISE

Implement the explanation for the Approval Voting and Plurality Voting strategies for group recommendations, and print the corresponding explanation for the best movie for the group.

##### Solution

In [13]:
explanations["PLU"] = "<item> has been recommended to the group since it is the preferred item for most of the group members.\n"
explanations["APP"] = "<item> has been recommended to the group since it achieves the highest number of ratings which are above <threshold>.\n"


movie_title = plurality_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["PLU"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = approval_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["APP"].replace("<item>", "The movie \"" + movie_title.title() + "\"")
     .replace("<threshold>", str(3)))

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it is the preferred item for most of the group members.

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since it achieves the highest number of ratings which are above 3.

