# Group Recommender Systems - Tutorial 2 (Lab 2)

In this tutorial, we will focus on Group Recommender Systems. After completing this tutorial, you will be able to: 
- Implement the basic agregation strategies for group recommendations.
- Generate a simple textual explanation for such strategies.

#### Summary

1. Selection of a random group in our dataset
2. Aggregation Strategies for Group Recommenders
3. Explanations for Group Recommenders


#### 1. Selection of a random group in our dataset

First, we need a group! So, we will select a random set of five users from our dataset. For simplicity, we will focus on users having at least 200 evaluations.

In [1]:
preprocessed_dataset_folder = "../preprocessed_dataset"

import pandas as pd
ratings_df = pd.read_csv(preprocessed_dataset_folder+"/ratings.csv") 
movies_df = pd.read_csv(preprocessed_dataset_folder+"/movies.csv", index_col="item")


In [2]:
users_ratings = ratings_df.groupby(['user']).count()
selected = users_ratings['rating'] > 200
selected_users = users_ratings.loc[selected]
random_selected = selected_users.sample(n=5) # sample() returns now n random rows from the dataframe. The returned object is a dataframe with five rows. 
select_column_df = random_selected.reset_index()['user'] # reset_index() create a new index, and the userId became a column. Then, we can filter using the column name
group_users = list(select_column_df) # iloc select by index, since our dataframe only has one row we read it from the index 0
print(group_users)

[51, 57, 89, 111, 580]


Let us assume we want to recommend to this group a list of 10 movies that nobody in the group has seen yet. We first need to determine the list of possible candidates. For simplicity, we will only consider movies for which we more then 10 evaluations.

In [3]:
group_ratings = ratings_df.loc[ratings_df['user'].isin(group_users)]
all_movies = set(movies_df.index.tolist())
num_ratings_df = ratings_df.groupby(['item']).count()
considered_movies = set(num_ratings_df.loc[num_ratings_df['user'] > 10].reset_index()['item'])

group_seen_movies = set(group_ratings['item'].tolist())
group_unseen_movies = considered_movies - group_seen_movies

print(len(all_movies))
print(len(considered_movies))
print(len(group_seen_movies))
print(len(group_unseen_movies))

4633
1421
1116
645


Now, we need to evaluate individuals' preverences for the unseen movies. To do so, we use the Lenskit library. We will use the same CF recommender used in the previous example. To generate the Dataframe with user-item pairs to pass as input in the *predict* function, we use the [product](https://docs.python.org/3/library/itertools.html#itertools.product) method of the itertools library, which takes as imput two lists and returns all the possible combinations between elements of the two lists. This is passed as input for the Dataframe constructor, which will then generate a Dataframe containing a pair on each row.

In [4]:
import itertools
from lenskit.algorithms import Recommender
from lenskit.algorithms.user_knn import UserUser

user_user = UserUser(15, min_nbrs=3)  # Minimum (3) and maximum (15) number of neighbors to consider
recsys = Recommender.adapt(user_user)
recsys.fit(ratings_df)
group_unseen_df = pd.DataFrame(list(itertools.product(group_users, group_unseen_movies)), columns=['user', 'item'])
group_unseen_df['predicted_rating'] = recsys.predict(group_unseen_df)
display(group_unseen_df)

Unnamed: 0,user,item,predicted_rating
0,51,2052,3.339245
1,51,2053,2.556409
2,51,9,2.919039
3,51,12,2.927437
4,51,122892,3.753387
...,...,...,...
3220,580,55282,3.237170
3221,580,4084,3.417977
3222,580,4085,3.531973
3223,580,55290,3.801478


We have now our predicted ratings.
We can apply an aggregation strategy to generate the group recommendations.

#### 2. Aggregation Strategies for Group Recommenders

Let's implement some of the aggregation strategies seen in the lecture today.

##### Additive strategy

The Additive strategy considers as group rating the sum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *sum*.

In [5]:
# Additive strategy

additive_df = group_unseen_df.groupby('item').sum()
additive_df = additive_df.join(movies_df['title'], on='item')
additive_df = additive_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(additive_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,1178,paths of glory,21.733127
1,1217,ran,21.500615
2,6440,barton fink,21.062484
3,57669,in bruges,21.057702
4,168252,logan,20.945547
5,1411,hamlet,20.934546
6,1041,secrets & lies,20.870429
7,3328,ghost dog: the way of the samurai,20.74908
8,914,my fair lady,20.58432
9,1945,on the waterfront,20.547766


##### Least Misery strategy

The Least Misery strategy considers as group rating the minimum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. As we did before, we can implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *min*.

In [6]:
# least misery

least_misery_df = group_unseen_df.groupby('item').min()
least_misery_df = least_misery_df.join(movies_df['title'], on='item')
least_misery_df = least_misery_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(least_misery_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,3451,guess who's coming to dinner,4.351304
1,6440,barton fink,3.930697
2,56782,there will be blood,3.91889
3,1273,down by law,3.908903
4,1217,ran,3.875841
5,97921,silver linings playbook,3.875332
6,55442,persepolis,3.841354
7,139385,the revenant,3.835525
8,3328,ghost dog: the way of the samurai,3.825552
9,1950,in the heat of the night,3.811778


##### Most Pleasure strategy

The Most Pleasure strategy considers as group rating the maximum of all the individuals ratings. Then, the recommended items are the one scoring the best with such group rating. Again, We can easily implement it grouping our *group_unseen_df* Dataframe by *item*, and then computing the *max*.

In [7]:
# most pleasure

most_pleasure_df = group_unseen_df.groupby('item').max()
most_pleasure_df = most_pleasure_df.join(movies_df['title'], on='item')
most_pleasure_df = most_pleasure_df.sort_values(by="predicted_rating", ascending=False).reset_index()[['item', 'title', 'predicted_rating']]
display(most_pleasure_df.head(10))

Unnamed: 0,item,title,predicted_rating
0,222,circle of friends,4.889165
1,1178,paths of glory,4.844014
2,3451,guess who's coming to dinner,4.812309
3,57669,in bruges,4.80122
4,168252,logan,4.726154
5,1217,ran,4.711662
6,106100,dallas buyers club,4.701459
7,914,my fair lady,4.672444
8,1041,secrets & lies,4.662626
9,176371,blade runner 2049,4.617937


##### Fairness strategy

For the Fairness strategy we have an ordering between the group members, and at each round one group member choose the best item for him/her. Hence, we can compute the preference lists for each group member separately. Then we iterate over the group members, and at each iteration we select one element from the list of the correct user, and add it to the result list. Finally, we create a dataframe and enrich the information of the movies selected 

In [8]:
# Fairness

import pandas as pd

def generate_preference_list(user):
    individual_df = group_unseen_df.loc[group_unseen_df['user']==user]
    return list(individual_df.sort_values(by="predicted_rating", ascending=False).reset_index()['item'])

individual_preference_lists = dict()
for member in group_users:
    individual_preference_lists[member] = generate_preference_list(member)
    
result = list()
for i in range(10):
    user = group_users[i % 5]
    user_best = individual_preference_lists[user].pop(0)
    for member in group_users:
        if user_best in individual_preference_lists[member]:
            individual_preference_lists[member].remove(user_best)
    result.append(user_best)
    
fairness_df = pd.DataFrame(result, columns=['item']).join(movies_df['title'], on='item')
display(fairness_df)

Unnamed: 0,item,title
0,222,circle of friends
1,1178,paths of glory
2,3246,malcolm x
3,3451,guess who's coming to dinner
4,1217,ran
5,57669,in bruges
6,1411,hamlet
7,986,fly away home
8,1963,take the money and run
9,6440,barton fink


In [9]:
# To check individual evaluations on a specific item
group_unseen_df.loc[group_unseen_df['item']==3740]

Unnamed: 0,user,item,predicted_rating


#### EXERCISE

Implement the Approval Voting and Plurality Voing strategies for group recommendations.

##### Solution

###### Plurality Voting

For the plurality voting strategy each user votes for all the items having the maximum score. Then, the votes are counted and the items with the highest number of votes are chosen. We iterate the process until we have selected 10 items.

In [10]:
def generate_user_votes(user, selected):
    # select the ratings for the considered user
    individual_df = group_unseen_df.loc[group_unseen_df['user']==user]
    
    # remove the ratings for the already selected movies
    individual_df = individual_df.loc[~individual_df['item'].isin(selected)]
    
    # compute the max rating for the considered movies
    max_eval = max(list(individual_df['predicted_rating']))
    
    # return the items for which the user has the maximum predicted rating
    voted_items_df = individual_df.loc[individual_df['predicted_rating']==max_eval]
    return voted_items_df

result = list()


i = 1 # to print the current iteration
while len(result) < 10:
    print("###### ITERATION ", i)
    votes_df = pd.DataFrame()

    # computing votes for all the group members
    for member in group_users:
        user_voted_items_df = generate_user_votes(member, result)
        votes_df = pd.concat([votes_df, user_voted_items_df])

    # votes_df contains the items that each user rated with the highest rating
    display(votes_df)

    # We can group it by item and count the number of rows, to obtain the number of votes for each item
    count_df = votes_df.groupby('item').count()
    display(count_df)

    # We need to select all the items with the highest number of votes.
    # We compute the max of all the votes, and then select the items having the maximum number of votes
    max_votes = max(list(count_df['user']))
    selected_items = list(count_df.loc[count_df['user']==max_votes].index.values)

    print(selected_items)

    # We add the selected items to the result list
    result = result + selected_items
    print(result)
    
    i = i + 1
    
# We could have selected more than 10 items, so we just keep the first 10
result = result[:10]

plurality_df = pd.DataFrame(result, columns=['item']).join(movies_df['title'], on='item')
display(plurality_df)

###### ITERATION  1


Unnamed: 0,user,item,predicted_rating
85,51,222,4.889165
1053,57,1178,4.449443
1703,89,3246,4.432686
2412,111,3451,4.654449
2988,580,1178,4.656918


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
222,1,1
1178,2,2
3246,1,1
3451,1,1


[1178]
[1178]
###### ITERATION  2


Unnamed: 0,user,item,predicted_rating
85,51,222,4.889165
1129,57,1411,4.367733
1703,89,3246,4.432686
2412,111,3451,4.654449
3007,580,1217,4.546401


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
222,1,1
1217,1,1
1411,1,1
3246,1,1
3451,1,1


[222, 1217, 1411, 3246, 3451]
[1178, 222, 1217, 1411, 3246, 3451]
###### ITERATION  3


Unnamed: 0,user,item,predicted_rating
130,51,57669,4.80122
1112,57,1387,4.338817
1635,89,986,4.362379
2552,111,1963,4.309166
2701,580,6440,4.455883


Unnamed: 0_level_0,user,predicted_rating
item,Unnamed: 1_level_1,Unnamed: 2_level_1
986,1,1
1387,1,1
1963,1,1
6440,1,1
57669,1,1


[986, 1387, 1963, 6440, 57669]
[1178, 222, 1217, 1411, 3246, 3451, 986, 1387, 1963, 6440, 57669]


Unnamed: 0,item,title
0,1178,paths of glory
1,222,circle of friends
2,1217,ran
3,1411,hamlet
4,3246,malcolm x
5,3451,guess who's coming to dinner
6,986,fly away home
7,1387,jaws
8,1963,take the money and run
9,6440,barton fink


#### Approval Voting

Voters are allowed to vote for as many alternatives as they wish. We assume that the users vote for the items with a rating above a threshold of 3.5

In [11]:
group_unseen_temp_df = group_unseen_df.copy()
group_unseen_temp_df['voted'] = group_unseen_temp_df['predicted_rating'].apply(lambda x: 1 if x>3 else 0)

approval_df = group_unseen_temp_df.groupby('item').sum()
approval_df = approval_df.sort_values(by="voted", ascending=False)
approval_df = approval_df.join(movies_df['title'], on='item')
display(approval_df.head(10))

Unnamed: 0_level_0,user,predicted_rating,voted,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3418,888,17.249818,5,thelma & louise
96610,888,18.990815,5,looper
1587,888,18.496399,5,conan the barbarian
1589,888,17.988572,5,cop land
55442,888,20.168272,5,persepolis
1680,888,17.096302,5,sliding doors
4571,888,17.868263,5,bill & ted's excellent adventure
94864,888,17.290022,5,prometheus
55765,888,19.194157,5,american gangster
1754,888,17.074041,5,fallen


#### 2 Explanations for Group Recommenders

Let's see now some simple strategy to generate basic explanations for the group recommendation strategies implemented before. For the Additive, Least Misery and Most Pleasure strategies, we will use social-choice based explanations as defined in [Barile et al., 2021](http://ceur-ws.org/Vol-2955/paper11.pdf). For the Fairness strategy, we will use a generic formulation:

- Additive: "i_k has been recommended to the group since it achieves the highest total rating."
- Least Misery: "i_k has been recommended to the group since no group members has a real problem with it."
- Most Pleasure: "i_k has been recommended to the group since it achieves the highest of all individual group members."
- Fairness: "i_k has been recommended to the group since it is the favourite for u_j, and it's his/her turn to choose."

In [12]:
explanations = {
    "ADD" : "<item> has been recommended to the group since it achieves the highest total rating.\n",
    "LMS" : "<item> has been recommended to the group since no group members has a real problem with it.\n",
    "MPL" : "<item> has been recommended to the group since it achieves the highest of all individual group members.\n",
    "FAI" : "<item> has been recommended to the group since it is the favourite for <user>, and it's his/her turn to choose.\n"
}

# Present explanations for the first item of each strategy
movie_title = additive_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["ADD"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = least_misery_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["LMS"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = most_pleasure_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["MPL"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = fairness_df['title'].iloc[0]
user = group_users[0]
print("Recommendation: " + movie_title.title())
print(explanations["FAI"]
      .replace("<item>", "The movie \"" + movie_title.title() + "\"")
      .replace("<user>", "the user with id " + str(user)))

Recommendation: Paths Of Glory
The movie "Paths Of Glory" has been recommended to the group since it achieves the highest total rating.

Recommendation: Guess Who'S Coming To Dinner
The movie "Guess Who'S Coming To Dinner" has been recommended to the group since no group members has a real problem with it.

Recommendation: Circle Of Friends
The movie "Circle Of Friends" has been recommended to the group since it achieves the highest of all individual group members.

Recommendation: Circle Of Friends
The movie "Circle Of Friends" has been recommended to the group since it is the favourite for the user with id 51, and it's his/her turn to choose.



##### EXERCISE

Implement the explanation for the Approval Voting and Plurality Voting strategies for group recommendations, and print the corresponding explanation for the best movie for the group.

##### Solution

In [13]:
explanations["PLU"] = "<item> has been recommended to the group since it is the preferred item for most of the group members.\n"
explanations["APP"] = "<item> has been recommended to the group since it achieves the highest number of ratings which are above <threshold>.\n"


movie_title = plurality_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["PLU"].replace("<item>", "The movie \"" + movie_title.title() + "\""))

movie_title = approval_df['title'].iloc[0]
print("Recommendation: " + movie_title.title())
print(explanations["APP"].replace("<item>", "The movie \"" + movie_title.title() + "\"")
     .replace("<threshold>", str(3)))

Recommendation: Paths Of Glory
The movie "Paths Of Glory" has been recommended to the group since it is the preferred item for most of the group members.

Recommendation: Thelma & Louise
The movie "Thelma & Louise" has been recommended to the group since it achieves the highest number of ratings which are above 3.

