# Laboratorium 5 - rekomendacje grupowe

## Przygotowanie

 * pobierz i wypakuj dataset: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
   * więcej możesz poczytać tutaj: https://grouplens.org/datasets/movielens/
 * [opcjonalnie] Utwórz wirtualne środowisko
 `python3 -m venv ./recsyslab5`
 * zainstaluj potrzebne biblioteki:
 `pip install numpy pandas matplotlib`

## Część 1. - przygotowanie danych

In [1]:
# importujemy wszystkie potrzebne pakiety

import pandas as pd
from random import sample

from reco_utils import *

In [2]:
# wczytujemy oceny uytkownikow i obliczamy (za pomocą collaborative filtering) wszystkie przewidywane oceny filmow

raw_ratings = pd.read_csv('ml-latest-small/ratings.csv').drop(columns=['timestamp'])
movies = list(raw_ratings['movieId'].unique())
users = list(raw_ratings['userId'].unique())
ratings = get_predicted_ratings(raw_ratings)
ratings

Total error: 215067.2923138497
Total error: 208273.9028455605
Total error: 201952.6848327329
Total error: 196056.71228876026
Total error: 190544.86374070394
Total error: 185380.97885609418
Total error: 180533.15612699883
Total error: 175973.16471367877
Total error: 171675.9493260957
Total error: 167619.21142391817
Total error: 163783.05340007946
Total error: 160149.67503616813
Total error: 156703.1135670967
Total error: 153429.0203051623
Total error: 150314.46805183607
Total error: 147347.78454558997
Total error: 144518.40801326072
Total error: 141816.76155440713
Total error: 139234.14362613967
Total error: 136762.63233562806
Total error: 134395.00160872555
Total error: 132124.64760140347
Total error: 129945.52396806458
Total error: 127852.0848068786
Total error: 125839.23427467182
Total error: 123902.28200865047
Total error: 122036.90361423152
Total error: 120239.10558139798
Total error: 118505.19407950502
Total error: 116831.74715490236
Total error: 115215.58991924905
Total error: 11

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
1,6,7,6,7,5,5,10,4,8,6,...,7,10,8,4,10,4,5,5,6,6
2,4,2,6,0,10,10,1,10,3,8,...,8,0,1,8,0,10,4,2,5,7
3,10,10,5,9,0,5,0,8,7,9,...,3,10,1,10,10,0,0,5,2,0
4,7,6,3,2,4,10,6,7,5,4,...,7,5,8,3,1,5,3,7,4,7
5,5,4,1,0,5,10,10,10,10,0,...,6,3,0,4,10,1,0,0,0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,6,6,5,6,6,6,6,6,6,6,...,6,7,6,6,6,5,6,6,6,6
607,8,6,7,5,0,4,3,6,3,5,...,3,6,8,4,4,5,1,7,4,6
608,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,7,6,6,6
609,3,10,6,10,2,0,10,0,4,9,...,6,10,6,7,10,9,10,5,10,10


In [3]:
# wczytujemy nazwy filmow i kategorie

movies_metadata = pd.read_csv('ml-latest-small/movies.csv').set_index('movieId')
movies_metadata

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
...,...,...
193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
193585,Flint (2017),Drama
193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [4]:
# wczytujemy przykladowe grupy uzytkownikow
groups = pd.read_csv('groups.csv').values.tolist()
groups

[[111, 307, 474, 599, 414],
 [469, 182, 232, 448, 600],
 [508, 581, 497, 402, 566],
 [300, 515, 245, 568, 507],
 [2, 371, 252, 518, 37],
 [269, 360, 469, 287, 308],
 [243, 527, 418, 118, 370],
 [186, 559, 327, 553, 314]]

In [5]:
# przygotowujemy funkcje pomocnicza

def describe_group(group, N=10):
    print(f'\n\nUser ids: {group}')

    mean_stdev = ratings.loc[group].std(axis=0).mean()
    median_stdev = ratings.loc[group].std(axis=0).median()
    std_stdev = ratings.loc[group].std(axis=0).std()
    print(f'\nMean ratings deviation: {mean_stdev}')
    print(f'Median ratings deviation: {median_stdev}')
    print(f'Standard deviation of ratings deviation: {std_stdev}')

    average_scores = ratings.iloc[group].mean(axis=0)
    average_scores = average_scores.sort_values()
    best_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in
                   list(average_scores[-N:].index)]
    worst_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in
                    list(average_scores[:N].index)]

    print('\nBest movies:')
    for movie, score in best_movies[::-1]:
        print(f'{movie}, {score}*')
    print('\nWorst movies:')
    for movie, score in worst_movies:
        print(f'{movie}, {score}*')


describe_group(groups[2])



User ids: [508, 581, 497, 402, 566]

Mean ratings deviation: 3.378309503367405
Median ratings deviation: 3.507135583350036
Standard deviation of ratings deviation: 0.9510345634922845

Best movies:
Arthur Christmas (2011), 9.2*
Gleaners & I, The (Les glaneurs et la glaneuse) (2000), 9.2*
Drift (2013), 9.0*
Uncle Buck (1989), 9.0*
Decoy Bride, The (2011), 9.0*
Secrets & Lies (1996), 9.0*
First Snow (2006), 9.0*
Dark Half, The (1993), 9.0*
Cashback (2006), 9.0*
Robin-B-Hood (Bo bui gai wak) (2006), 9.0*

Worst movies:
Look Who's Talking Too (1990), 1.4*
Wet Hot American Summer (2001), 1.6*
Marked for Death (1990), 1.8*
Alex and Emma (2003), 2.0*
Going in Style (1979), 2.0*
High Heels and Low Lifes (2001), 2.0*
Bogus (1996), 2.2*
Once Bitten (1985), 2.2*
Fun (1994), 2.2*
1900 (Novecento) (1976), 2.2*


## Część 2. - algorytmy proste

In [6]:
# zdefiniujmy interfejs dla wszystkich algorytmow rekomendacyjnych

class Recommender:
    def recommend(self, movies, ratings, group, size):
        pass


# jako pierwszy zaimplementujemy algorytm losowy - dla porownania

class RandomRecommender(Recommender):
    def __init__(self):
        self.name = 'random'

    def recommend(self, movies, ratings, group, size):
        return sample(movies, size)

In [7]:
RandomRecommender().recommend(movies, ratings, groups[4], 5)

[5466, 1562, 7027, 3917, 56003]

In [8]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen

class AverageRecommender(Recommender):
    def __init__(self):
        self.name = 'average'

    def recommend(self, movies, ratings, group, size):
        return ratings.loc[group].mean(axis=0).nlargest(size).index.tolist()

In [9]:
AverageRecommender().recommend(movies, ratings, groups[2], 5)

[1734, 2330, 3766, 4770, 5443]

In [10]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen,
#   ale rownoczesnie wykluczajacy te filmy, ktore otrzymaly choc jedna ocene ponizej thresholdu

class AverageWithoutMiseryRecommender(Recommender):
    def __init__(self, score_threshold):
        self.name = 'average_without_misery'
        self.score_threshold = score_threshold

    def recommend(self, movies, ratings, group, size):
        filtered_columns = (ratings.loc[group] >= self.score_threshold).all()
        return ratings.loc[group, filtered_columns.tolist()].mean(axis=0).nlargest(size).index.tolist()

In [11]:
AverageWithoutMiseryRecommender(4).recommend(movies, ratings, groups[1], 5)

[3833, 4826, 6187, 7037, 69251]

In [12]:
# algorytm uwzgledniajacy preferencje tylko jednego uzytkownika w kazdej iteracji

class FairnessRecommender(Recommender):
    def __init__(self):
        self.name = 'fairness'

    def recommend(self, movies, ratings, group, size):
        recommendation = []
        ratings_copy = ratings.copy(deep=True)

        for i in range(size):
            # Determine the user from the group based on the current iteration
            user = i % len(group)

            # Find the movie with the highest rating for the selected user
            chosen_movie = ratings_copy.loc[group[user]].idxmax()
            recommendation.append(chosen_movie)

            # Set the rating for the chosen movie to 0 for all users in the group
            ratings_copy.loc[group, [chosen_movie]] = 0

        return recommendation

In [13]:
FairnessRecommender().recommend(movies, ratings, groups[6], 5)

[5, 9, 1, 8, 3]

In [14]:
# wybrany algorytm wyborczy (dyktatura, glosowanie proste, Borda, Copeland)

class VotingRecommender(Recommender):
    def __init__(self):
        self.name = 'borda'

    def recommend(self, movies, ratings, group, size):
        user_points = ratings.loc[group].copy(deep=True)

        for user_id in group:
            # Sort user ratings in descending order
            sorted_user_ratings = ratings.loc[user_id].sort_values(ascending=False)

            # Initialize counter for Borda count
            counter = len(sorted_user_ratings)

            # Assign Borda count values to user_points DataFrame
            for i, value in sorted_user_ratings.items():
                user_points.loc[user_id, i] = counter
                counter -= 1

        # Sum Borda count values across users, recommend top movies and return them
        return user_points.sum(axis=0).nlargest(size).index.tolist()

In [15]:
VotingRecommender().recommend(movies, ratings, groups[3], 5)

[4610, 3598, 2474, 3990, 2922]

In [16]:
# algorytm zachlanny, aproksymujacy metode Proportional Approval Voting
# w kazdej iteracji wybieramy ten film, ktory najbardziej zwieksza zadowolenie zgodnie z punktacja PAV

class ProportionalApprovalVotingRecommender(Recommender):
    def __init__(self, threshold):
        self.threshold = threshold
        self.name = 'PAV'

    def recommend(self, movies, ratings, group, size):
        # Initialize user points for each user in the group
        user_points = {user: 1 for user in group}

        # Identify user preferences based on the threshold
        user_preferences = ratings.loc[group] >= self.threshold

        # Initialize user satisfactions count for each user in the group
        user_satisfactions_number = {user: 0 for user in group}

        # Initialize a DataFrame to store user satisfaction values
        user_satisfaction = pd.DataFrame(0, index=ratings.loc[group].index, columns=ratings.columns)

        recommendation = []
        for _ in range(size):
            # Loop through movies to calculate user satisfaction
            for movie in ratings.columns:
                for user in group:
                    # If the user prefers the movie, assign user points to the satisfaction
                    if user_preferences.loc[user, movie]:
                        user_satisfaction.loc[user, movie] = user_points[user]

            # Select the movie with the maximum total user satisfaction
            recommended_movie = user_satisfaction.loc[group, ratings.columns.difference(recommendation).tolist()].sum(
                axis=0).idxmax()
            recommendation.append(recommended_movie)

            # Update user satisfactions count based on the recommendation
            for user in group:
                if user_preferences.loc[user, recommended_movie]:
                    user_satisfactions_number[user] += 1
                    user_points[user] = 1 / (user_satisfactions_number[user] + 1)

        return recommendation

In [17]:
ProportionalApprovalVotingRecommender(7).recommend(movies, ratings, groups[0], 5)

  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]


[325, 662, 102058, 2272, 5767]

## Część 3. - funkcje celu

In [18]:
# dwie funkcje pomocnicze:
#  - znajdujaca ulubione filmy danego uzytkownika
#  - obliczajaca sume ocen wystawionych przez uzytkownika wszystkim filmom w rekomendacji

def top_n_movies_for_user(ratings, movies, user_id, n):
    return ratings.loc[user_id].nlargest(n).index.tolist()


def total_score(recommendation, user_id, ratings):
    return ratings.loc[user_id, recommendation].sum()

In [19]:
top_n_movies_for_user(ratings, movies, 3, 5)

[1, 2, 18, 22, 26]

In [20]:
# funkcja obliczajaca zadowolenie pojedynczego uzytkownika
#  - iloraz zadowolenia z wygenerowanej rekomendacji oraz zadowolenia z hipotetycznej rekomendacji idealnej
def overall_user_satisfaction(recommendation, user_id, movies, ratings):
    real_score = total_score(recommendation, user_id, ratings)
    user_top_n_recommendation = top_n_movies_for_user(ratings, movies, user_id, len(recommendation))
    ideal_score = total_score(user_top_n_recommendation, user_id, ratings)
    return real_score / ideal_score


# funkcja celu - srednia z zadowolenia wszystkich uzytkownikow w grupie
def overall_group_satisfaction(recommendation, group, movies, ratings):
    return sum(overall_user_satisfaction(recommendation, user_id, movies, ratings) for user_id in group) / len(group)


# funkcja celu - roznica miedzy maksymalnym i minimalnym zadowolenie w grupie
def group_disagreement(recommendation, group, movies, ratings):
    each_user_satisfaction = [overall_user_satisfaction(recommendation, user_id, movies, ratings) for user_id in group]
    return max(each_user_satisfaction) - min(each_user_satisfaction)

## Część 4. - Sequential Hybrid Aggregation

In [21]:
# algorytm balansujacy pomiedzy wyborem elementow o najwyzszej sredniej ocen
#   i o najwyzszej minimalnej ocenie
#   wyliczajacy w kazdej iteracji parametr alfa - jak na wykladzie
class SequentialHybridAggregationRecommender(Recommender):
    def __init__(self):
        self.name = 'sequential_hybrid_aggregation'

    def recommend(self, movies, ratings, group, size):
        # Calculate average score and least score for each movie in the group
        avg_score = ratings.loc[group].mean(axis=0)
        least_score = ratings.loc[group].min()
        alpha = 1

        # Create a dictionary to store scores for each movie
        score = {movie: 0 for movie in ratings.columns}

        recommendation = []

        # Iterate through the specified number of recommendations
        for _ in range(size):
            # Calculate the score for each movie using the weighted average
            score.update(
                {movie: (1 - alpha) * avg_score.loc[movie] + alpha * least_score.loc[movie] for movie in movies})

            # Set the score to -1 for movies already recommended
            score.update({movie: -1 for movie in recommendation})

            # Find the movie with the maximum score
            max_score_movie = max(score.items(), key=lambda k: k[1])[0]
            recommendation.append(max_score_movie)

            # Update alpha using the group disagreement function
            alpha = group_disagreement(recommendation, group, movies, ratings)

        return recommendation

In [22]:
SequentialHybridAggregationRecommender().recommend(movies, ratings, groups[7], 5)

[2891, 4410, 6090, 6305, 45668]

## Część 5. - porównanie algorytmów

In [23]:
def calculate_statistics(data, length):
    avg = np.sum(data) / length
    std = np.sqrt(np.sum(pow(data - avg, 2))) / length

    return avg, std

In [24]:
recommenders = [
    RandomRecommender(),
    AverageRecommender(),
    AverageWithoutMiseryRecommender(5),
    FairnessRecommender(),
    VotingRecommender(),
    ProportionalApprovalVotingRecommender(5),
    SequentialHybridAggregationRecommender()
]

recommendation_size = 10

# dla kazdego algorytmu:
#  - wygenerujmy jedna rekomendacje dla kazdej grupy
#  - obliczmy wartosci obu funkcji celu dla kazdej rekomendacji
#  - obliczmy srednia i odchylenie standardowe dla obu funkcji celu
#  - wypiszmy wyniki na konsole

for recommender in recommenders:
    satisfaction = np.zeros(len(groups))
    disagreement = np.zeros(len(groups))

    for i, group in enumerate(groups):
        recommendation = recommender.recommend(movies, ratings, group, recommendation_size)
        satisfaction[i] = overall_group_satisfaction(recommendation, group, movies, ratings)
        disagreement[i] = group_disagreement(recommendation, group, movies, ratings)

    avg_satisfaction, std_satisfaction = calculate_statistics(satisfaction, len(groups))
    avg_disagreement, std_disagreement = calculate_statistics(disagreement, len(groups))

    print(f'Recommender: {recommender.name}')
    print(f'satisfaction: {avg_satisfaction} +/- {std_satisfaction}')
    print(f'disagreement: {avg_disagreement} +/- {std_disagreement}\n')

Recommender: random
satisfaction: 0.6026339285714285 +/- 0.04349198230837043
disagreement: 0.2677232142857143 +/- 0.030604685154857147

Recommender: average
satisfaction: 0.9681785714285713 +/- 0.00943772384516262
disagreement: 0.07696428571428572 +/- 0.023879463868413317

Recommender: average_without_misery
satisfaction: 0.9681785714285713 +/- 0.00943772384516262
disagreement: 0.07696428571428572 +/- 0.023879463868413317

Recommender: fairness
satisfaction: 0.696125 +/- 0.03690211462039312
disagreement: 0.16165178571428573 +/- 0.015588853278550627

Recommender: borda
satisfaction: 0.9498839285714287 +/- 0.014715358065445105
disagreement: 0.09040178571428571 +/- 0.030013246271767578



  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]
  user_satisfaction.loc[user, movie] = user_points[user]


Recommender: PAV
satisfaction: 0.7998214285714286 +/- 0.010965703433237394
disagreement: 0.19651785714285713 +/- 0.016652892772468283

Recommender: sequential_hybrid_aggregation
satisfaction: 0.9670714285714286 +/- 0.009925688494600094
disagreement: 0.07410714285714287 +/- 0.023576501077581986

