# Social Computing - Summer 2018
# Exercise 5 - Group Recommender System 

### Background
In exercise 3 we built a simple collaborative filtering recommender system for
movies using the MovieLens dataset. In this exercise we will reuse and extend this system  to build a group recommender system for restaurants. A group recommender provides recommendations for a group of users instead of a single user. A Group recommendation is an aggregation of single user recommendations generated to each group member. There are two possibilities to do this aggregation:
* The first is to generate recommendations (predicted ratings) for individual members of the group (like a single user recommender), then aggregate those individual predicted ratings into predicted ratings (recommendations) for the group.
* The second is to aggregate individual user ratings (actual ratings) to build a group model, and then create predicted ratings for the model, i.e. treating the group model as a single user and create predicted ratings for that user (using the single user recommender system)

In both cases, the aggreation is done according to a social choice strategy (e.g. Maximum satisficaction, minimum misery, etc...)
For more information about group recommenders and aggregation strategy, please refer to the paper: Group Recommender Systems: New Perspectives in the Social Web by Cantador and Castells (You will find it the reading material for this exercise)

In this exercise, we will use to the first design option: The system should generate predicted ratings for each member of the group and those ratings will be aggregated into a group rating based on the "Least Misery" strategy.

The group recommender system we will be build in this exercise is a social-context-aware recommender system and the input will be a subset of the anonymized restaurant rating dataset that the students submitted in the experiment part of this class. The "restaurant domain expertise" will be used as the social context parameter that will influence the output of the group recommender.

The dataset is available in 2 comma-separated files: ratings.csv, and domain-expertise.csv
* Ratings.csv contains individual participant restaurant ratings according to the rating attributes: price, clumsiness, service, hippieness, location, social overlap. The first two columns in the csv are for participant ID and restaurant ID respectively. The rating values for each of the attributes are between 0 and 100.
* domain-expertise.csv: Each record contains the restaurant experitise rating of a participant as estimated by another participant. The file has 3 columns from_participant_id: The ID of the participant who did the rating, to_participant_id: The ID of the participant being rated, and domain_expertise: the rating value between 0 and 100 


### The Exercise
Write a simple group recommender system in python. The entry point to the program should be a method that takes the following arguments:
* group: a list of integers which represents the participant IDs of some participants forming a group
* ratings_path: path to the files ratings.csv 
* domain_expertise_path: path to the file domain-expertise.csv.

Pre-processing: 

The restaurant ratings in the dataset are multi-valued (because there are several rating parameters for a single restaurant). Your program should calculate a single that represents the overall rating for each restaurant by each participant. The single value rating should be between 0 and 1 (divide the different rating values by 100 and calculate the average).
The same applies for domain expertise ratings (divide the rating values by 100)

Single-user recommendations:

The program should generate individual recommendations for each participant in the group. A delegation-based-method that takes into consideration the domain-expertise of other participants in the group will be used. The idea is that a member's preference will be influenced by the opinion of another member in the group depending on how much she thinks this other person has expertise in the domain in question: if the person thinks that the other group member is an expert in restaurants, then she will be influenced by that members's opinion while choosing a restaurant for a group dinner for example. The delegation-based-method is formulated as follows:
$$pred'(u,i) = \frac{1}{\mid{\sum_{v \in G} d_{u,v}\mid}}\sum_{v \in G \wedge v \neq u}d_{u,v}*pred(v,i)$$
* $pred'(u,i)$ is the social-context-aware predicted rating of participant u to the restaurant i
* $v$ is another member in the group $G$
* $d_{u,v}$ is the domain expertise rating from participant u to participant v
* $pred(v,i)$ is the predicted rating of participant v to the restaurant it

According to the formula, you will notice that there are two different predicted ratings: pred and pred'.
* pred: represents a normal collaborative filtering predicted rating for a certain user towards a certain item. We call this the base rating
* while pred': is the social-context-aware predicted rating which is function in the base  rating of the other group members and with those members' domain expertise as preceived by the current user. 

This means your program should calculate two different ratings for each group member
The program should start by calculating the base predicted rating for all members in the group for all restaurants (re-use your code from exercise 3), then for each group member, the program should calculate the social-context-aware predicted rating for all restaurants

The final step in the program should be the calculation of the group recommendation, or the appliation of the aggregation strategy. The strategy we are going to use in this exercise is: "Least misery"

The output should be a list of Python tuples (sorted by the groups restaurants' predicted ratings: highest first). Each tuple has the following two attributes: restaurant's ID, and social-context-aware predicted rating. You are free to design your recommendation engine the way as you want (the provided code below is just a suggested design). Clean, readable, and documented code is expected, and those aspects will be part of the overall grade of the exercise

Note: You can test your recommender with the following groups (identified by the participant IDs): [63,117,116], [160,161,162], [178,134,91], [67,198,197]


In [1]:
import math

# Given the social-context-aware predicted ratings for each group member, aggregate those ratings into group
# recommendations for each restaurant based on the "Least Misery" strategy (sorted by predicted ratings: highest first)
def least_misery(social_pred_dict, restaurants):
    group_rating_dict = {}
    for restaurant in restaurants:
        rating = 1.0
        for participant in social_pred_dict:
            if restaurant in social_pred_dict[participant]:
                if social_pred_dict[participant][restaurant] < rating:
                    rating = social_pred_dict[participant][restaurant]
        if rating == 1.0:
            group_rating_dict[restaurant] = 0.0
        else:
            group_rating_dict[restaurant] = rating
    return sorted(group_rating_dict.items(), key=lambda x: x[1], reverse = True)
 
# For each participant in the group, calculate the base ratings for each restaurant
def calculate_base_predictions(participants, ratings, restaurants):
    base_pred_dict = {}
    for part1 in participants:
        weighted_ratings = {} # {restaurant_id: weighted_rating}
        similarity_scores = {} # {restaurant_id: similarity_score}
        for part2 in ratings:
            if part2 != part1:
                similarity = calculate_similarity_score(ratings, part1, part2)
                if similarity != 0:
                    for restaurant_id in restaurants:
                        # Restaurant was not recommended by the target participant before
                        if restaurant_id not in ratings[part1] and restaurant_id in ratings[part2]:
                            weighted_ratings.setdefault(restaurant_id, 0)
                            weighted_ratings[restaurant_id] += ratings[part2][restaurant_id] * similarity
                            similarity_scores.setdefault(restaurant_id, 0)
                            similarity_scores[restaurant_id] += similarity
        base_pred_dict.setdefault(part1, {})
        for restaurant in weighted_ratings:
            base_pred_dict[part1][restaurant] = weighted_ratings[restaurant] / similarity_scores[restaurant]
    return base_pred_dict # {participant_id: {restaurant_id: predicted_rating}}

# For each participant in the group, calculate the social-context-aware predicted ratings (given the base predicted ratings)
def calculate_social_context_aware_predictions(participants, domain_exp_dict, base_pred_dict):
    social_pred_dict = {}
    for part1 in participants:
        up = {} # {restaurant_id: upper equation}
        down = {} # {restaurant_id: lower equation}
        for part2 in participants:
            if part2 != part1:
                for restaurant_id in base_pred_dict[part2]:
                    up.setdefault(restaurant_id, 0)
                    down.setdefault(restaurant_id, 0)
                    # part1 reviewed part2 about domain expertise
                    if part1 in domain_exp_dict and part2 in domain_exp_dict[part1]:
                        up[restaurant_id] += domain_exp_dict[part1][part2] * base_pred_dict[part2][restaurant_id]
                        down[restaurant_id] += domain_exp_dict[part1][part2]
        social_pred_dict.setdefault(part1, {})
        for restaurant in up:
            if up[restaurant] != 0 and down[restaurant] != 0:
                social_pred_dict[part1][restaurant] = (up[restaurant] / len(participants)) / (abs(down[restaurant] / len(participants)))
    return social_pred_dict # {participant_id: {restaurant_id: social_rating}}

# using the ratings return a list of unique restaurant IDs
def get_restaurants(ratings):
    restaurants = []
    for participant in ratings:
        for restaurant in ratings[participant]:
            restaurants.append(restaurant)
    return list(set(restaurants))

# Using Euclidean distance to calculate similarity score
def calculate_similarity_score(ratings, participant_id1, participant_id2):
    if participant_id1 not in ratings or participant_id2 not in ratings: # if participant has not rate a restaurant
        return 0
    common_restaurants = [restaurant for restaurant in ratings[participant_id1].keys() if restaurant in ratings[participant_id2].keys()]
    if len(common_restaurants) == 0: # no common ratings between two users. Similarity is 0
        return 0
    # Calculate Euclidean distance between two users based on their common ratings
    sum_of_squares_of_differences = 0
    for restaurant_id in common_restaurants:
        # Accumulate the sum of squares of differences in ratings between the two participants for the same restaurant
        diff = ratings[participant_id1][restaurant_id] - ratings[participant_id2][restaurant_id]
        sum_of_squares_of_differences += diff * diff
    return 1 / (1 + math.sqrt(sum_of_squares_of_differences))

# Group recommender (Main program)
def group_recommender(group, ratings_path, domain_expertise_path):
    # parse ratings.csv (e.g into a dictionary of ratings)
    ratings_dict = {}
    for line in open(ratings_path):
        row = line.strip().split(',')
        try:
            int(row[0])
        except ValueError:
            continue
        participant_id, restaurant_id, price, clumsiness, service, hippieness, location, social_overlap = int(row[0]), row[1], int(row[2]) / 100.0, int(row[3]) / 100.0, int(row[4]) / 100.0, int(row[5]) / 100.0, int(row[6]) / 100.0, int(row[7]) / 100.0
        ratings_dict.setdefault(participant_id, {})
        ratings_dict[participant_id][restaurant_id] = (1.0 - price + 1.0 - clumsiness + service + hippieness + location + social_overlap) / 6.0
    
    # parse domain_expertise.csv (e.g into the dictionary {from: {to, domain_expertise}})
    domain_exp_dict = {}
    for line in open(domain_expertise_path):
        row = line.strip().split(',')
        try:
            int(row[0])
        except ValueError:
            continue
        fro, to, domain_exp = int(row[0]), int(row[1]), int(row[2]) / 100.0
        domain_exp_dict.setdefault(fro, {})
        domain_exp_dict[fro][to] = domain_exp
    
    restaurants = get_restaurants(ratings_dict)
    #print(restaurants)
    base = calculate_base_predictions(group, ratings_dict, restaurants)
    print(base)
    social = calculate_social_context_aware_predictions(group, domain_exp_dict, base)
    print(social)
    final = least_misery(social, restaurants)
    print(final)

# Test (Call your main program to test it with the sample groups from the exercise description above)
group_recommender([160, 161, 162], "ratings.csv", "domain_expertise.csv")

{160: {'ChIJidSoDBzgnUcRny_WAQ7PaCI': 0.6766666666666667, 'ChIJ7cha3E9NdEcRm3ZJ_tKCNZ0': 0.5523175339537075, 'ChIJHeg9F1_fnUcRodyOlcSDA7Y': 0.5583333333333332, 'ChIJ7QZjgV_fnUcRQYr2Ii8RwOY': 0.5933333333333334, 'ChIJIT-4CnWxdUcRqrb5UcasMbU': 0.4554149186488071, 'ChIJYWUkRh_gnUcR3VU29ofuXVw': 0.6783333333333333, 'ChIJf8-tUy_fnUcRmtMuAQeXRE8': 0.45, 'ChIJab0H05-xdUcR-JzdcY5tXnQ': 0.5433333333333333}, 161: {'ChIJ8edul6AHbUcR6faY3loHwaI': 0.4983333333333334, 'ChIJbfe_W4ffnUcRLvUQJ-XgyJ8': 0.47500000000000003, 'ChIJab0H05-xdUcR-JzdcY5tXnQ': 0.5433333333333333}, 162: {'ChIJGWAg7zdznkcRVPNxQCF7d-o': 0.5236922572992678, 'ChIJbfe_W4ffnUcRLvUQJ-XgyJ8': 0.47500000000000003, 'ChIJ8edul6AHbUcR6faY3loHwaI': 0.4983333333333334}}
{160: {'ChIJ8edul6AHbUcR6faY3loHwaI': 0.49833333333333346, 'ChIJbfe_W4ffnUcRLvUQJ-XgyJ8': 0.47500000000000003, 'ChIJGWAg7zdznkcRVPNxQCF7d-o': 0.5236922572992678, 'ChIJab0H05-xdUcR-JzdcY5tXnQ': 0.5433333333333333}, 161: {'ChIJidSoDBzgnUcRny_WAQ7PaCI': 0.6766666666666667, 'ChIJ