# Laboratorium 5 - rekomendacje grupowe

## Przygotowanie

 * pobierz i wypakuj dataset: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
   * więcej możesz poczytać tutaj: https://grouplens.org/datasets/movielens/
 * [opcjonalnie] Utwórz wirtualne środowisko
 `python3 -m venv ./recsyslab5`
 * zainstaluj potrzebne biblioteki:
 `pip install numpy pandas matplotlib`

## Część 1. - przygotowanie danych

In [2]:
# importujemy wszystkie potrzebne pakiety

import math
import numpy as np
import pandas

from random import choice, sample
from statistics import mean, stdev

from reco_utils import *

In [3]:
# wczytujemy oceny uytkownikow i obliczamy (za pomocą collaborative filtering) wszystkie przewidywane oceny filmow

raw_ratings = pandas.read_csv('ml-latest-small/ratings.csv').drop(columns=['timestamp'])
movies = list(raw_ratings['movieId'].unique())
users = list(raw_ratings['userId'].unique())
ratings = get_predicted_ratings(raw_ratings)
ratings

Total error: 220244.00197853832
Total error: 213140.64362706672
Total error: 206550.80499082754
Total error: 200420.87057426266
Total error: 194704.21733923288
Total error: 189360.13904032623
Total error: 184352.96191933163
Total error: 179651.3128471647
Total error: 175227.50990821366
Total error: 171057.05207583075
Total error: 167118.18964763024
Total error: 163391.56093204574
Total error: 159859.88361460945
Total error: 156507.6915079514
Total error: 153321.10916667702
Total error: 150287.65824681235
Total error: 147396.09059808814
Total error: 144636.2439620783
Total error: 141998.91685999243
Total error: 139475.75982843628
Total error: 137059.1806285939
Total error: 134742.26143617206
Total error: 132518.68633332648
Total error: 130382.67768298645
Total error: 128328.94018107114
Total error: 126352.61156128485
Total error: 124449.21907713416
Total error: 122614.64101171713
Total error: 120845.07257198417
Total error: 119136.99561391074
Total error: 117487.1517211558
Total error: 

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
1,4,10,4,10,2,7,5,5,4,8,...,2,5,4,10,9,3,7,7,7,10
2,6,10,2,10,1,8,10,8,8,10,...,8,9,6,0,10,4,1,4,10,4
3,6,8,0,10,10,4,10,10,0,7,...,10,0,10,0,10,8,8,10,10,2
4,6,6,4,7,9,7,2,5,10,5,...,6,7,4,9,7,5,7,6,2,7
5,3,3,2,0,9,1,10,9,10,8,...,9,5,3,3,2,10,10,2,3,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,6,7,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,7,6,6,6
607,5,4,5,4,3,7,6,3,4,7,...,3,7,7,8,4,8,3,10,8,5
608,6,7,6,6,6,6,7,7,6,6,...,6,6,6,6,6,6,6,6,6,7
609,6,1,6,10,9,1,1,0,2,10,...,9,6,10,2,9,10,4,10,6,2


In [5]:
# wczytujemy nazwy filmow i kategorie

movies_metadata = pandas.read_csv('ml-latest-small/movies.csv').set_index('movieId')
movies_metadata

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
...,...,...
193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
193585,Flint (2017),Drama
193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [28]:
# wczytujemy przykladowe grupy uzytkownikow
groups = pandas.read_csv('groups.csv').values.tolist()
groups

[[111, 307, 474, 599, 414],
 [469, 182, 232, 448, 600],
 [508, 581, 497, 402, 566],
 [300, 515, 245, 568, 507],
 [2, 371, 252, 518, 37],
 [269, 360, 469, 287, 308],
 [243, 527, 418, 118, 370],
 [186, 559, 327, 553, 314]]

In [29]:
# przygotowujemy funkcje pomocnicza

def describe_group(group, N=10):
    print(f'\n\nUser ids: {group}')
    group_size = len(group)
    
    mean_stdev = ratings.loc[group].std(axis=0).mean()
    median_stdev = ratings.loc[group].std(axis=0).median()
    std_stdev = ratings.loc[group].std(axis=0).std()
    print(f'\nMean ratings deviation: {mean_stdev}')
    print(f'Median ratings deviation: {median_stdev}')
    print(f'Standard deviation of ratings deviation: {std_stdev}')
    
    average_scores = ratings.iloc[group].mean(axis=0)
    average_scores = average_scores.sort_values()
    best_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[-N:].index)]
    worst_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[:N].index)]
    
    print('\nBest movies:')
    for movie, score in best_movies[::-1]:
        print(f'{movie}, {score}*')
    print('\nWorst movies:')
    for movie, score in worst_movies:
        print(f'{movie}, {score}*')

describe_group(groups[1])



User ids: [469, 182, 232, 448, 600]

Mean ratings deviation: 0.8335846435963976
Median ratings deviation: 0.8366600265340756
Standard deviation of ratings deviation: 0.45997338220139017

Best movies:
San Andreas (2015), 10.0*
Mercury Rising (1998), 10.0*
Dark Places (2015), 10.0*
Einstein and Eddington (2008), 9.8*
The Adventures of Sherlock Holmes and Dr. Watson: The Hound of the Baskervilles (1981), 9.8*
12 Chairs (1971), 9.8*
Man of Tai Chi (2013), 9.8*
Michael Collins (1996), 9.8*
Alice Adams (1935), 9.6*
Neal Brennan: 3 Mics (2017), 9.6*

Worst movies:
Maximum Overdrive (1986), 0.2*
Phantom of the Opera, The (2004), 0.2*
Lola Versus (2012), 0.4*
Pathology (2008), 0.6*
My Love (2006), 0.6*
Running Man, The (1987), 0.6*
Puppet Master (1989), 0.6*
Dragon Blade (2015), 0.8*
Eat Drink Man Woman (Yin shi nan nu) (1994), 0.8*
Justice League: Doom (2012) , 0.8*


## Część 2. - algorytmy proste

In [30]:
# zdefiniujmy interfejs dla wszystkich algorytmow rekomendacyjnych

class Recommender:
    def recommend(self, movies, ratings, group, size):
        pass

# jako pierwszy zaimplementujemy algorytm losowy - dla porownania
    
class RandomRecommender(Recommender):
    def __init__(self):
        self.name = 'random'
        
    def recommend(self, movies, ratings, group, size):
        return sample(movies, size)

In [32]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen

class AverageRecommender(Recommender):
    def __init__(self):
        self.name = 'average'
    
    def recommend(self, movies, ratings, group, size):
        average_scores = ratings.iloc[group].mean(axis=0)
        average_scores = average_scores.sort_values()
        return list(average_scores[-size:].index)
    
AverageRecommender().recommend(movies, ratings, groups[0], len(groups[0]))

[109483, 90384, 1972, 49735, 179491]

In [36]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen,
#   ale rownoczesnie wykluczajacy te filmy, ktore otrzymaly choc jedna ocene ponizej thresholdu

class AverageWithoutMiseryRecommender(Recommender):
    def __init__(self, score_threshold):
        self.name = 'average_without_misery'
        self.score_threshold = score_threshold
        
    def recommend(self, movies, ratings, group, size):
        average_scores = ratings.iloc[group].mean(axis=0)
        average_scores = average_scores.sort_values()
        average_scores = average_scores[average_scores >= self.score_threshold]
        return list(average_scores[-size:].index)
    
AverageWithoutMiseryRecommender(8).recommend(movies, ratings, groups[0], len(groups[0]))

[109483, 90384, 1972, 49735, 179491]

In [37]:
# algorytm uwzgledniajacy preferencje tylko jednego uzytkownika w kazdej iteracji

class FairnessRecommender(Recommender):
    def __init__(self):
        self.name = 'fairness'
        self.user_index = 0
        
    def recommend(self, movies, ratings, group, size):
        self.user_index = (self.user_index + 1) % len(group)
        user_id = group[self.user_index]
        user_ratings = ratings.loc[user_id]
        user_ratings = user_ratings.sort_values()
        return list(user_ratings[-size:].index)
    
FairnessRecommender().recommend(movies, ratings, groups[0], len(groups[0]))

[80572, 173197, 4190, 251, 32060]

In [64]:
# wybrany algorytm wyborczy (dyktatura, glosowanie proste, Borda, Copeland)

class VotingRecommender(Recommender):
    def __init__(self):
        self.name = 'borda'
    
    def recommend(self, movies, ratings, group, size):
        user_points = ratings.loc[group].copy()
        
        for user_id in group:
            sorted_user_ratings = user_points.loc[user_id].sort_values(ascending=False)
            sorted_user_ratings = sorted_user_ratings.rank(method='dense')
            user_points.loc[user_id] = sorted_user_ratings
            
        user_points = user_points.sum(axis=0)
        return list(user_points.sort_values()[-size:].index)

VotingRecommender().recommend(movies, ratings, groups[0], len(groups[0]))

[8966, 1117, 5338, 1621, 88069]

In [None]:
# algorytm zachlanny, aproksymujacy metode Proportional Approval Voting
#   w kazdej iteracji wybieramy ten film, ktory najbardziej zwieksza zadowolenie zgodnie z punktacja PAV

class ProportionalApprovalVotingRecommender(Recommender):
    def __init__(self, threshold):
        self.threshold = threshold
        self.name = 'PAV'
        
    def recommend(self, movies, ratings, group, size):
        

## Część 3. - funkcje celu

In [None]:
# dwie funkcje pomocnicze:
#  - znajdujaca ulubione filmy danego uzytkownika
#  - obliczajaca sume ocen wystawionych przez uzytkownika wszystkim filmom w rekomendacji

def top_n_movies_for_user(ratings, movies, user_id, n):
    raise NotImplementedError()

def total_score(recommendation, user_id, ratings):
    raise NotImplementedError()

In [None]:
# funkcja obliczajaca zadowolenie pojedynczego uzytkownika
#  - iloraz zadowolenia z wygenerowanej rekomendacji oraz zadowolenia z hipotetycznej rekomendacji idealnej
def overall_user_satisfaction(recommendation, user_id, movies, ratings):
    raise NotImplementedError()

# funkcja celu - srednia z zadowolenia wszystkich uzytkownikow w grupie
def overall_group_satisfaction(recommendation, group, movies, ratings):
    raise NotImplementedError()

# funkcja celu - roznica miedzy maksymalnym i minimalnym zadowolenie w grupie
def group_disagreement(recommendation, group, movies, ratings):
    raise NotImplementedError()

## Część 4. - Sequential Hybrid Aggregation

In [None]:
# algorytm balansujacy pomiedzy wyborem elementow o najwyzszej sredniej ocen
#   i o najwyzszej minimalnej ocenie
#   wyliczajacy w kazdej iteracji parametr alfa - jak na wykladzie
class SequentialHybridAggregationRecommender(Recommender):
     def __init__(self):
        self.name = 'sequential_hybrid_aggregation'
    
    def recommend(self, movies, ratings, group, size):
        raise NotImplementedError()

## Część 5. - porównanie algorytmów

In [None]:
recommenders = [
    RandomRecommender(),
    AverageRecommender(),
    AverageWithoutMiseryRecommender(5),
    FairnessRecommender(),
    VotingRecommender(),
    ProportionalApprovalVotingRecommender(5),
    SequentialHybridAggregationRecommender()
]

recommendation_size = 10

# dla kazdego algorytmu:
#  - wygenerujmy jedna rekomendacje dla kazdej grupy
#  - obliczmy wartosci obu funkcji celu dla kazdej rekomendacji
#  - obliczmy srednia i odchylenie standardowe dla obu funkcji celu
#  - wypiszmy wyniki na konsole

for recommender in recommenders:
    raise NotImplementedError()