In [None]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Group-movie-recommender-system

Mounted at /content/drive
/content/drive/MyDrive/Group-movie-recommender-system


# GroupRec

__Team__: Bhavik Ameta(225008988), Parul Priya(725006957), Shobhit Jain(625007846)

# Introduction and Problem Statement: 
In the past, a lot of effort has been put towards finding the recommendations for a given user in different fields. But many a times it is seen that people like to do certain activities in groups like, camping or watching movies. In such a case there is a need for a recommender which takes care of the interests of each individual of the group and suggests items which are likely to be enjoyed by the entire group. So hereby we present the idea of a Group Recommendation System for movies. 

We are proposing to implement the idea using Matrix Factorization (MF) based Collaborative Filtering. The main idea of MF models is to factorize the original rating matrix into two or more matrices in order to represent the user-item interactions. We will be using various approaches to calculate these factors for groups using those of individual users.

# Related Work:
The most popular method to compute recommendations for a group of users is KNN based CF.The data sparsity issues related to KNN can be tackled by using a support vector machine learning model that computes similarities between items. Another approach defines the set of neighbors of the group as the intersection of the sets of neighbors of each user of the group.

Although for single user recommendations, MF based CF has been extensively studied, its implementation for group recommendation has not been carried out in depth. One such approach has been proposed in an earlier paper which modifies the MF model to include a wide variety of sociological factors such as cohesion, social similarity and social centrality.

# Dataset :
We have used MovieLens 100k data set for our project.

https://grouplens.org/datasets/movielens/100k/

# Approach:
The system takes a group size as input, generates random groups and produces movie recommendations based on group members' biases. 
The following approaches are being used for the same : 
<ul>
<li>AF (After factorization) : Here we first factorize user x item matrix and then calculate factors for users. We then aggregate these factors to approximate the factors for the groups of users. Using these group factors, we can calculate group ratings for items based on these factors and biases.</li>
<li>BF (Before factorization ) : In BF (before factorization) approach, we aggregate ratings of the users before factorization of ratings matrix and calculate group rating . We can assume group as a virtual user from this step. Now we derive factors and biases for the groups. The idea of ridge regression has been used for this approach.</li>
<li>WBF (Weighted BF) : The weighted BF approach is similar to BF method except that here each item is associated with a weight depending upon the number of users who have watched it and how similar are the corresponding ratings. Hence the stochastic gradient minimization function will be different.</li>
</ul>

Also the different sizes of groups being considered are :
<ul>
<li>Small (2-4 users),</li>
<li>medium (5-8 users) and  </li>
<li>large groups (9-12 users)</li>
</ul>
![title](./res/flowchart.jpg)



# CODE: 
Here we present the code for the project. We have first presented all the major class definitions in the project, followed by the main script code that actually runs and gives the output statistics and graphs.
These are the main classes:
<li> Group: The methods and parameters for group of users </li>
<li> GroupRec: The main class with methods for AF, BF and WBF </li>
<li> Aggregator: utility functions for different types of aggregation (average, weighted average etc.) </li>

<h4>Group Class :</h4>
The class 'Group' is responsible for generating random groups of different sizes : small, medium and large and performing evaluations of different methods AF, BF and WBF used for recommendation for these groups.

In [17]:
ConfigParser_ = configparser.RawConfigParser()
ConfigParser_.read(r"./config.conf")
ConfigParser_

<configparser.RawConfigParser at 0x7fc88e76f8b0>

In [24]:
for keys in ConfigParser_:
    print(keys)

DEFAULT
Config


In [36]:
for values in ConfigParser_["Config"]:
    print(values,":",ConfigParser_["Config"][values])
    

training_file : ./data/u1.base
testing_file : ./data/u1.test
small_grp_size : 3
medium_grp_size : 5
large_grp_size : 10
max_iterations_mf : 3
no_of_small_grps : 50
no_of_medium_grps : 50
no_of_large_grps : 50
lambda_mf : 0.1
learning_rate_mf : 0.05
num_factors : 15
rating_threshold_af : 4
num_recos_af : 50
rating_threshold_bf : 4
num_recos_bf : 50
rating_threshold_wbf : 4
num_recos_wbf : 50
is_debug : False


In [49]:
import numpy as np
import configparser

class Group:
    def __init__(self, members, candidate_items, ratings):
        # member ids
        self.members = sorted(members)
        
        # List of items that can be recommended.
        # These should not have been watched by any member of group.
        self.candidate_items = candidate_items

        self.actual_recos = []
        self.false_positive = []
        
        self.ratings_per_member = [np.size(ratings[member].nonzero()) for member in self.members]
        
        # AF
        self.grp_factors_af = []
        self.bias_af = 0
        self.precision_af = 0
        self.recall_af = 0
        self.reco_list_af = [] 
        
        # BF
        self.grp_factors_bf = []
        self.bias_bf = 0
        self.precision_bf = 0
        self.recall_bf = 0
        self.reco_list_bf = []
        
        # WBF
        self.grp_factors_wbf = []
        self.bias_wbf = 0
        self.precision_wbf = 0
        self.recall_wbf = 0
        self.weight_matrix_wbf = []
        self.reco_list_wbf = []

#Configuration reader.
class Config:
    def __init__(self, config_file_path):
        self.config_file_path = config_file_path

        ConfigParser = configparser.RawConfigParser()
        ConfigParser.read(config_file_path)
        
        #movie lens 100k dataset, 80 - 20 train/test ratio, present in data directory
        self.training_file = "./data/u1.base"
        self.testing_file = "./data/u1.test"
        
        self.small_grp_size = int(ConfigParser.get('Config', 'small_grp_size'))
        self.medium_grp_size = int(ConfigParser.get('Config', 'medium_grp_size'))
        self.large_grp_size = int(ConfigParser.get('Config', 'large_grp_size'))
        
        self.max_iterations_mf = int(ConfigParser.get('Config', 'max_iterations_mf'))
        self.lambda_mf = float(ConfigParser.get('Config', 'lambda_mf'))
        self.learning_rate_mf = float(ConfigParser.get('Config', 'learning_rate_mf'))
        
        self.num_factors = int(ConfigParser.get('Config', 'num_factors'))
        
        #AF (after factorization)
        self.rating_threshold_af = float(ConfigParser.get('Config', 'rating_threshold_af'))
        self.num_recos_af = int(ConfigParser.get('Config', 'num_recos_af'))
        
        #BF (before factorization)
        self.rating_threshold_bf = float(ConfigParser.get('Config', 'rating_threshold_bf'))
        self.num_recos_bf = int(ConfigParser.get('Config', 'num_recos_bf'))
        
        #WBF (weighted before factorization)
        self.rating_threshold_wbf = float(ConfigParser.get('Config', 'rating_threshold_wbf'))
        self.num_recos_wbf = int(ConfigParser.get('Config', 'num_recos_wbf'))
        
        self.is_debug = ConfigParser.getboolean('Config', 'is_debug')

<h4>Creating list of movies which can be recommended :</h4>
<p>The following function creates a list of movies which have not been watched by any of the members of the group.</p>
Our recommendation will be made out of these movies only.

In [50]:
@staticmethod
def find_candidate_items(ratings, members):
    if len(members) == 0: return []

    unwatched_items = np.argwhere(ratings[members[0]] == 0)
    for member in members:
        cur_unwatched = np.argwhere(ratings[member] == 0)
        unwatched_items = np.intersect1d(unwatched_items, cur_unwatched)

    return unwatched_items
Group.find_candidate_items = find_candidate_items

In [51]:
@staticmethod
def non_testable_items(members, ratings): 
    non_eval_items = np.argwhere(ratings[members[0]] == 0)
    for member in members:
        cur_non_eval_items = np.argwhere(ratings[member] == 0)
        non_eval_items = np.intersect1d(non_eval_items, cur_non_eval_items)
    return non_eval_items
Group.non_testable_items = non_testable_items

<h4>Generating groups! </h4>
Now we will generate groups from the available users. For better evaluation of our recommendation apporaches, we have to make sure that there are enough items to test upon. So we have set the testable_threshold to be 50, which basically means that there are at least 50 movies in the test data set which have been rated by at least one member of the group. 

In [52]:
@staticmethod
def generate_groups(cfg, ratings, test_ratings, num_users, count, size, disjoint = True):
    avbl_users = [i for i in range(num_users)]
    groups = []
    testable_threshold = 50

    iter_idx = 0
    while iter_idx in range(count):
        group_members = np.random.choice(avbl_users, size = size, replace = False)
        candidate_items = Group.find_candidate_items(ratings, group_members)
        non_eval_items = Group.non_testable_items(group_members, test_ratings)
        testable_items = np.setdiff1d(candidate_items, non_eval_items)

        if len(candidate_items) != 0 and len(testable_items) >= testable_threshold:
            groups += [Group(group_members, candidate_items, ratings)]
            avbl_users = np.setdiff1d(avbl_users, group_members)
            iter_idx += 1

    return groups
    
Group.generate_groups = generate_groups

<h4>Prediction!</h4>
<p>Now that the groups have been formed, this is the method for finally predicting the movies!</p>
We have kept the threshold for predicted rating for an item to be 4.

In [53]:
def generate_actual_recommendations(self, ratings, threshold):
    non_eval_items = Group.non_testable_items(self.members, ratings)

    items = np.argwhere(np.logical_or(ratings[self.members[0]] >= threshold, ratings[self.members[0]] == 0)).flatten()
    fp = np.argwhere(np.logical_and(ratings[self.members[0]] > 0, ratings[self.members[0]] < threshold)).flatten()
    for member in self.members:
        cur_items = np.argwhere(np.logical_or(ratings[member] >= threshold, ratings[member] == 0)).flatten()
        fp = np.union1d(fp, np.argwhere(np.logical_and(ratings[member] > 0, ratings[member] < threshold)).flatten())
        items = np.intersect1d(items, cur_items)

    items = np.setdiff1d(items, non_eval_items)

    self.actual_recos = items
    self.false_positive = fp

Group.generate_actual_recommendations  = generate_actual_recommendations

<h4>Evaluation :</h4>
<p>The following three functions are used for the evaluation of the three methods AF, BF and WBF respectively.</p>
We are evaluating the methods using their Precision and Recall for different sizes of groups.

In [54]:
def evaluate_af(self, is_debug=False):
    tp = float(np.intersect1d(self.actual_recos, self.reco_list_af).size)
    fp = float(np.intersect1d(self.false_positive, self.reco_list_af).size)

    try:
        self.precision_af = tp / (tp + fp)
    except ZeroDivisionError:
        self.precision_af = np.NaN

    try:
        self.recall_af = tp / self.actual_recos.size
    except ZeroDivisionError:
        self.recall_af = np.NaN

    if is_debug:
        print('tp: ', tp)
        print('fp: ', fp)
        print('precision_af: ', self.precision_af)
        print('recall_af: ', self.recall_af)

    return self.precision_af, self.recall_af, tp, fp
Group.evaluate_af = evaluate_af

In [55]:
def evaluate_bf(self, is_debug=False):
    tp = float(np.intersect1d(self.actual_recos, self.reco_list_bf).size)
    fp = float(np.intersect1d(self.false_positive, self.reco_list_bf).size)

    try:
        self.precision_bf = tp / (tp + fp)
    except ZeroDivisionError:
        self.precision_bf = np.NaN

    try:
        self.recall_bf = tp / self.actual_recos.size
    except ZeroDivisionError:
        self.recall_bf = np.NaN

    if is_debug:
        print('tp: ', tp)
        print('fp: ', fp)
        print('precision_bf: ', self.precision_bf)
        print('recall_bf: ', self.recall_bf)

    return self.precision_bf, self.recall_bf, tp, fp
Group.evaluate_bf = evaluate_bf

In [56]:
def evaluate_wbf(self, is_debug=False):
    tp = float(np.intersect1d(self.actual_recos, self.reco_list_wbf).size)
    fp = float(np.intersect1d(self.false_positive, self.reco_list_wbf).size)

    try:
        self.precision_wbf = tp / (tp + fp)
    except ZeroDivisionError:
        self.precision_wbf = np.NaN

    try:
        self.recall_wbf = tp / self.actual_recos.size
    except ZeroDivisionError:
        self.recall_wbf = np.NaN

    if is_debug:
        print('tp: ', tp)
        print('fp: ', fp)
        print('precision_bf: ', self.precision_wbf)
        print('recall_bf: ', self.recall_wbf)

    return self.precision_wbf, self.recall_wbf, tp, fp
Group.evaluate_wbf = evaluate_wbf

<h4>Aggregator Class :</h4>
This class is responsible for defining different ways to aggregate factors for the member of the group.

In [57]:
import math
import numpy as np
import warnings

class Aggregators:
    def __init__(self):
        pass
    
    #pass ratings or factors as input
    @staticmethod
    def average(arr):
        return np.average(arr, axis = 0, weights = None)

    @staticmethod
    def average_bf(arr):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
            arr[arr == 0] = np.nan
            return np.nanmean(arr, axis=0)
    
    @staticmethod
    def weighted_average(arr, weights):
        return np.average(arr, axis = 0, weights = weights)

<h4>GroupRec Class :</h4>
This is our main class responsible for reading the data, defining methods for our appoaches and finally evaluating them.

In [58]:
from sklearn.metrics import mean_squared_error

import numpy as np
import pandas as ps


# overflow warnings should be raised as errors
np.seterr(over='raise')

class GroupRec:
    def __init__(self):
        self.cfg = Config(r"./config.conf")
        
        # training and testing matrices
        self.ratings = None
        self.test_ratings = None

        self.groups = []
        
        # read data into above matrices
        self.read_data()
        
        self.num_users = self.ratings.shape[0]
        self.num_items = self.ratings.shape[1]
        
        # predicted ratings matrix based on factors.
        self.predictions = np.zeros((self.num_users, self.num_items))
        
        # output after svd factorization
        # initialize all unknowns with random values from -1 to 1
        self.user_factors = np.random.uniform(-1, 1, (self.ratings.shape[0], self.cfg.num_factors))
        self.item_factors = np.random.uniform(-1, 1, (self.ratings.shape[1], self.cfg.num_factors))

        self.user_biases = np.zeros(self.num_users)
        self.item_biases = np.zeros(self.num_items)
        
        # global mean of ratings a.k.a mu
        self.ratings_global_mean = 0

    # add list of groups
    def add_groups(self, groups):
        self.groups = groups
    
    # remove groups
    def remove_groups(self, groups):
        self.groups = []

<h4>Reading the data : </h4>
We have used 'pandas' library for reading testing data and training data from the csv file.
We will finally generate our user * item ratings matrix here.

In [59]:
# read training and testing data into matrices
def read_data(self):
    column_headers = ['user_id', 'item_id', 'rating',"timestamp"]

    print('Reading training data from ', self.cfg.training_file, '...')
    training_data = ps.read_csv(self.cfg.training_file, sep='\t', names=column_headers)

    print('Reading testing data from ', self.cfg.testing_file, '...')
    testing_data = ps.read_csv(self.cfg.testing_file, sep='\t', names=column_headers)

    num_users = 6682
    num_items = 2277

    self.ratings = np.zeros((num_users, num_items))
    self.test_ratings = np.zeros((num_users, num_items))

    for row in training_data.itertuples(index=False):
        self.ratings[row.user_id - 1, row.item_id - 1] = row.rating

    for row in testing_data.itertuples(index=False):
        self.test_ratings[row.user_id - 1, row.item_id - 1] = row.rating
        
GroupRec.read_data = read_data

<h4>Matrix Factorization : </h4>
Now we would like to factorize the rating matrix. 
We have considered the number of factors to be 15.
And we are using gradient descent for error minimization.

In [60]:
def sgd_factorize(self):
    #solve for these for matrix ratings        
    ratings_row, ratings_col = self.ratings.nonzero()
    num_ratings = len(ratings_row)
    learning_rate = self.cfg.learning_rate_mf
    regularization = self.cfg.lambda_mf

    self.ratings_global_mean = np.mean(self.ratings[np.where(self.ratings != 0)])

    print('Doing matrix factorization...')
    try:
        for iter in range(self.cfg.max_iterations_mf):
            print('Iteration: ', iter)
            rating_indices = np.arange(num_ratings)
            np.random.shuffle(rating_indices)

            for idx in rating_indices:
                user = ratings_row[idx]
                item = ratings_col[idx]

                pred = self.predict_user_rating(user, item)
                error = self.ratings[user][item] - pred

                self.user_factors[user] += learning_rate \
                                            * ((error * self.item_factors[item]) - (regularization * self.user_factors[user]))
                self.item_factors[item] += learning_rate \
                                            * ((error * self.user_factors[user]) - (regularization * self.item_factors[item]))

                self.user_biases[user] += learning_rate * (error - regularization * self.user_biases[user])
                self.item_biases[item] += learning_rate * (error - regularization * self.item_biases[item])

            self.sgd_mse()

    except FloatingPointError:
        print('Floating point Error: ')
GroupRec.sgd_factorize = sgd_factorize


def sgd_mse(self):
    self.predict_all_ratings()
    predicted_training_ratings = self.predictions[self.ratings.nonzero()].flatten()
    actual_training_ratings = self.ratings[self.ratings.nonzero()].flatten()

    predicted_test_ratings = self.predictions[self.test_ratings.nonzero()].flatten()
    actual_test_ratings = self.test_ratings[self.test_ratings.nonzero()].flatten()

    training_mse = mean_squared_error(predicted_training_ratings, actual_training_ratings)
    print('training mse: ', training_mse)
    test_mse = mean_squared_error(predicted_test_ratings, actual_test_ratings)
    print('test mse: ', test_mse)
GroupRec.sgd_mse = sgd_mse


def predict_user_rating(self, user, item):
    prediction = self.ratings_global_mean + self.user_biases[user] + self.item_biases[item]
    prediction += self.user_factors[user, :].dot(self.item_factors[item, :].T)
    return prediction
GroupRec.predict_user_rating = predict_user_rating

def predict_group_rating(self, group, item, method):
    if (method == 'af'):
        factors = group.grp_factors_af; bias_group = group.bias_af
    elif (method == 'bf'):
        factors = group.grp_factors_bf; bias_group = group.bias_bf
    elif (method == 'wbf'):
        factors = group.grp_factors_wbf; bias_group = group.bias_wbf

    return self.ratings_global_mean + bias_group + self.item_biases[item] \
                                    + np.dot(factors.T, self.item_factors[item])
GroupRec.predict_group_rating = predict_group_rating

def predict_all_ratings(self):
    for user in range(self.num_users):
        for item in range(self.num_items):
            self.predictions[user, item] = self.predict_user_rating(user, item)
GroupRec.predict_all_ratings = predict_all_ratings

<h4>After Factorization (AF) Method Definition.....</h4>

In [61]:
#AF method
def af_runner(self, groups = None, aggregator = Aggregators.average):
    #if groups is not passed, use self.groups
    if (groups is None):
        groups = self.groups

    #calculate factors
    for group in groups:
        member_factors = self.user_factors[group.members, :]
        member_biases = self.user_biases[group.members]

        #aggregate the factors
        if (aggregator == Aggregators.average):
            group.grp_factors_af = aggregator(member_factors)
            group.bias_af = aggregator(member_biases)
        elif (aggregator == Aggregators.weighted_average):
            group.grp_factors_af = aggregator(member_factors, weights = group.ratings_per_member)
            group.bias_af = aggregator(member_biases, weights = group.ratings_per_member)

        #predict ratings for all candidate items
        group_candidate_ratings = {}
        for idx, item in enumerate(group.candidate_items):
            cur_rating = self.predict_group_rating(group, item, 'af')

            if (cur_rating > self.cfg.rating_threshold_af):
                group_candidate_ratings[item] = cur_rating

        #sort and filter to keep top 'num_recos_af' recommendations
        group_candidate_ratings = sorted(list(group_candidate_ratings.items()), key=lambda x: x[1], reverse=True)[:self.cfg.num_recos_af]

        group.reco_list_af = np.array([rating_tuple[0] for rating_tuple in group_candidate_ratings])

GroupRec.af_runner = af_runner

<h4>Before Factorization(BF) Method.....</h4>

In [62]:
def bf_runner(self, groups=None, aggregator=Aggregators.average_bf):
    # aggregate user ratings into virtual group
    # calculate factors of group
    lamb = self.cfg.lambda_mf

    for group in groups:
        all_movies = np.arange(len(self.ratings.T))
        watched_items = sorted(list(set(all_movies) - set(group.candidate_items)))

        group_rating = self.ratings[group.members, :]
        agg_rating = aggregator(group_rating)
        s_g = []
        for j in watched_items:
            s_g.append(agg_rating[j] - self.ratings_global_mean - self.item_biases[j])

        # creating matrix A : contains rows of [item_factors of items in watched_list + '1' vector]
        A = np.zeros((0, self.cfg.num_factors))

        for item in watched_items:
            A = np.vstack([A, self.item_factors[item]])
        v = np.ones((len(watched_items), 1))
        A = np.c_[A, v]

        factor_n_bias = np.dot(np.linalg.inv(np.dot(A.T, A) + lamb * np.identity(self.cfg.num_factors + 1)), np.dot(A.T, s_g))
        group.grp_factors_bf = factor_n_bias[:-1]
        group.bias_bf = factor_n_bias[-1]

        # Making recommendations on candidate list :
        group_candidate_ratings = {}
        for idx, item in enumerate(group.candidate_items):
            cur_rating = self.predict_group_rating(group, item, 'bf')

            if (cur_rating > self.cfg.rating_threshold_bf):
                group_candidate_ratings[item] = cur_rating

        # sort and filter to keep top 'num_recos_bf' recommendations
        group_candidate_ratings = sorted(list(group_candidate_ratings.items()), key=lambda x: x[1], reverse=True)[
                                  :self.cfg.num_recos_bf]

        group.reco_list_bf = np.array([rating_tuple[0] for rating_tuple in group_candidate_ratings])
        
GroupRec.bf_runner = bf_runner

<h4>Weighted Before Factorization Method (WBF).....</h4>

In [63]:
def wbf_runner(self, groups=None, aggregator=Aggregators.average_bf):
    # aggregate user ratings into virtual group
    # calculate factors of group
    lamb = self.cfg.lambda_mf
    for group in groups:
        all_movies = np.arange(len(self.ratings.T))
        watched_items = sorted(list(set(all_movies) - set(group.candidate_items)))

        group_rating = self.ratings[group.members, :]
        agg_rating = aggregator(group_rating)
        s_g = []
        for j in watched_items:
            s_g.append(agg_rating[j] - self.ratings_global_mean - self.item_biases[j])

        # creating matrix A : contains rows of [item_factors of items in watched_list + '1' vector]
        A = np.zeros((0, self.cfg.num_factors))  # 3 is the number of features here = K

        for item in watched_items:
            A = np.vstack([A, self.item_factors[item]])
        v = np.ones((len(watched_items), 1))
        A = np.c_[A, v]

        wt = []
        for item in watched_items:
            rated = np.argwhere(self.ratings[:, item] != 0)  # list of users who have rated this movie
            watched = np.intersect1d(rated, group.members)  # list of group members who have watched this movie
            std_dev = np.std([a for a in self.ratings[:, item] if a != 0])  # std deviation for the rating of the item
            wt += [len(watched) / float(len(group.members)) * 1 / (1 + std_dev)]  # list containing diagonal elements
        W = np.diag(wt)  # diagonal weight matrix

        factor_n_bias = np.dot(np.linalg.inv(np.dot(np.dot(A.T, W),A) + lamb * np.identity(self.cfg.num_factors + 1)),
                               np.dot(np.dot(A.T, W), s_g))
        group.grp_factors_wbf = factor_n_bias[:-1]
        group.bias_wbf = factor_n_bias[-1]

        # Making recommendations on candidate list :
        group_candidate_ratings = {}
        for idx, item in enumerate(group.candidate_items):
            cur_rating = self.predict_group_rating(group, item, 'wbf')

            if (cur_rating > self.cfg.rating_threshold_wbf):
                group_candidate_ratings[item] = cur_rating

        # sort and filter to keep top 'num_recos_wbf' recommendations
        group_candidate_ratings = sorted(list(group_candidate_ratings.items()), key=lambda x: x[1], reverse=True)[
                                  :self.cfg.num_recos_wbf]

        group.reco_list_wbf = np.array([rating_tuple[0] for rating_tuple in group_candidate_ratings])

GroupRec.wbf_runner = wbf_runner

<h4>Evaluating our methods......</h4>

In [64]:
def evaluation(self):
    # For AF
    af_precision_list = []
    af_recall_list = []
    print("\n#########-------For AF-------#########")
    for grp in self.groups:
        grp.generate_actual_recommendations(self.test_ratings, self.cfg.rating_threshold_af)
        (precision, recall, tp, fp) = grp.evaluate_af()
        af_precision_list.append(precision)
        af_recall_list.append(recall)

    af_mean_precision = np.nanmean(np.array(af_precision_list))
    af_mean_recall = np.nanmean(np.array(af_recall_list))
    print('\nAF method: mean precision: ', af_mean_precision)
    print('AF method: mean recall: ', af_mean_recall)

    # For BF
    bf_precision_list = []
    bf_recall_list = []
    print("\n#########-------For BF-------#########")
    for grp in self.groups:
        grp.generate_actual_recommendations(self.test_ratings, self.cfg.rating_threshold_bf)
        (precision, recall, tp, fp) = grp.evaluate_bf()
        bf_precision_list.append(precision)
        bf_recall_list.append(recall)

    bf_mean_precision = np.nanmean(np.array(bf_precision_list))
    bf_mean_recall = np.nanmean(np.array(bf_recall_list))
    print('\nBF method: mean precision: ', bf_mean_precision)
    print('BF method: mean recall: ', bf_mean_recall)

    # For WBF
    wbf_precision_list = []
    wbf_recall_list = []
    print("\n#########-------For WBF-------#########")
    for grp in self.groups:
        grp.generate_actual_recommendations(self.test_ratings, self.cfg.rating_threshold_wbf)
        (precision, recall, tp, fp) = grp.evaluate_wbf()
        wbf_precision_list.append(precision)
        wbf_recall_list.append(recall)

    wbf_mean_precision = np.nanmean(np.array(wbf_precision_list))
    wbf_mean_recall = np.nanmean(np.array(wbf_recall_list))
    print('\nWBF method: mean precision: ', wbf_mean_precision)
    print('WBF method: mean recall: ', wbf_mean_recall)
GroupRec.evaluation = evaluation

Here we are running all our proposed methods and evaluating them altogether.

In [65]:
def run_all_methods(self, groups):
    if (groups is None):
        groups = self.groups
    #PS: could call them without passing groups as we have already added groups to grouprec object
    self.af_runner(groups, Aggregators.weighted_average)
    self.bf_runner(groups, Aggregators.average_bf)
    self.wbf_runner(groups, Aggregators.average_bf)

    #evaluation
    self.evaluation()
GroupRec.run_all_methods = run_all_methods

***Our class definitions end here***

### Starting from here, this can be treated as the main script for the entire code.

First, Here we complete the matrix factorization with SGD method. The number of iterations is taken from the
config file and MSE over the iterations is reported. We are only doing 3 iterations in this demo so the mse(error) is higher.
For our results, we have done more iterations to get lesser mse.

In [66]:
gr = GroupRec()
print((gr.cfg.max_iterations_mf))
gr.sgd_factorize()

Reading training data from  ./data/EN120K_train.tsv ...
Reading testing data from  ./data/EN120K_test.tsv ...
3
Doing matrix factorization...
Iteration:  0
training mse:  0.7334063282022386
test mse:  1.3423805433883131
Iteration:  1
training mse:  0.5876426705854596
test mse:  1.2316889241680935
Iteration:  2
training mse:  0.5222136052003068
test mse:  1.191864712112312


We generate small, medium and large groups.

In [None]:
#generate groups programmatically
#disjoint means none of the groups shares any common members     
small_groups = Group.generate_groups(gr.cfg, gr.ratings, gr.test_ratings, gr.num_users, 10, gr.cfg.small_grp_size, disjoint=True)
medium_groups = Group.generate_groups(gr.cfg, gr.ratings, gr.test_ratings, gr.num_users, 10, gr.cfg.medium_grp_size, disjoint=True)
large_groups = Group.generate_groups(gr.cfg, gr.ratings, gr.test_ratings, gr.num_users, 10, gr.cfg.large_grp_size, disjoint=True)

group_set = [small_groups, medium_groups, large_groups]
group_type = ['small', 'medium', 'large']

for idx, groups in enumerate(group_set):
    if groups is []:
        continue

    # generated groups
    n = 11
    print('\n******* Running for ', group_type[idx], ' groups *************')
    print('generated groups (only first %d are getting printed here): ' % n)
    for group in groups[:n]:
        print(group.members)

In [29]:
# inspect the group setting

print(len(small_groups), len(medium_groups), len(large_groups))


10 10 10


We run all the methods (AF, BF and WBF) for all the 3 group sizes and report the results: 

In [None]:
for idx, groups in enumerate(group_set):
    if groups is []:
        continue
    print('\n******* Running for ', group_type[idx], ' groups *************')

    gr.add_groups(groups)
    gr.run_all_methods(groups)
    gr.remove_groups(groups)

## Evaluation Methodology:

We follow similar evaluation methods as the paper.

These are the evaluation metrics

![title](./res/evaluation_metrics.png)

To put it simply, __we count a movie recommendation as positive, if the test ratings for all the users in that group exceeds the user satisfaction threshold : rating of 4__ and there should be atleast 1 user in group with given test rating. (Note that test dataset is very sparse)

For evaluation, we run all the 3 methods: AF, BF and WBF for 50 randomly generated groups from among the users in the Movielens
dataset.
These groups are non disjoint in the sense that same user can be present in multiple groups.

Here are the configuration parameters stated again all at once:

__Matrix Factorization hyperparameters__:
<li> No. of factors = 15 </li>
<li> lambda(regularization) = 0.1 </li>
<li> neta (learning rate) = 0.07 </li>

__User Satisfaction Threshold__:
<li> 4 </li>

__Group Parameters__:
<li> small group = 3 users </li>
<li> medium group = 5 users </li>
<li> large group = 8 users </li>
<li> No. of Recommendations per user = 50 </li>


### Results:

#### Precision and Recall for all groups:

![title](./res/result.png)
    
We note that the AF method works best for small groups followed by Weighted before method. 
For the medium groups as well, this is the case but the margin of difference is lower between the small and medium groups.

For larger groups, the WBF method outperforms the AF method and reports better accuracy. Increasing the group size further to very large ( > 15) should further increase the success margin of WBF.

We note that BF method (no weights) performs worse than both AF and WBF for all the 3 group sizes.

We note that the recall for methods is pretty low. The low recall values are in keeping with the data reported by the paper.

Follwing plot shows the precision vs recall plot of all the 3 methods for large groups. Due to time limitations, plot was evaluated only on 5 sample points (5 iterations over multiple groups.) and hence looks a bit jagged. Plots for other groups can
be accessed in the ./res/ subdirectory of github repo.
![title](./res/large_grp.png)

###### Reason for low recall:
Note that we only report top 50 recommendation per group.

Also, since the test data is very sparse, there are lot of movies in our recommendations for which there is no rating data for
any member of the group in the test set. Hence, we cannot calculate the precision / recall for these movies as we dont know
whether to interpret them as positive or negative matches.

In short, our total set of True positives is not known due to sparse data.

### Final Comments:
Our reported precision metrics match with those reported in the paper. One difference is that author reports far better results for BF and WBF with small and large groups.

This could be attributable to the fact that datasets are different. Author uses Movielens-1MB dataset while we are using 100K
dataset.

Author mentions that when the data is sparse, AF performs quite well which is the case in our Movielens 100 K dataset.

Once the data is increased, more data is involved in group recommendation process and AF approach does not work properly. A virtual user is better representation for the group of users than the aggregation of users' factors in this case. 
Both BF and WBF make better recommendations for non-sparse datasets.

Similar effect is at work when no. of users per group is increased.(more data is involved) In this case too, AF method performs worse gradually than WBF.

__Note__: These reported results have been generated over very large number of iterations. However, the working version of this notebook has very less number of matrix factorization iterations and number of groups to make the time taken for running the code manageable. Hence, the mean precision and recall output by running this notebook may not align exactly with the reported results. Moreover there is random initialization of the factors in matrix factorization which leads to slightly different results different times.


### Conclusion and Future Work:

#### Learning through hands on experience
(a) This was an exciting project and we had a lot of practical learning. We got to know about different Python libraries for data processing (Pandas) and matrix processing. 

(b) We got an insight into the hyperparameter tuning process and overfitting. Initially, we kept learning rate very high and saw that the mean squared error (MSE) for training data decreased to very low but the MSE for test data kept on increasing. We
decreased learning rate and carefully checked for overfitting in the iterations for matrix factorization.

#### Concepts learned in class
(a)  We applied matrix factorization technique learned in the course and tried it for a different domain i.e. group recommendations.
Since 'group recommendation' is a relatively less explored field (we found very less research papers on it), there is a lot of
scope for improvement in the techniques.

(b) Moreover, although we have applied it to movie dataset, the technique of aggregating individual user ratings to evaluate group satisfaction is fairly general and can be applied to other domains, for eg. travel destination suggestions

(d) We learned that real data is sparse and the traditional metrics. i.e. recall and precision can't be evaluated in all the cases. In these cases, some redefinition of metrics is required for domain.


(iii) 
(a) We see that the precision of all the methods is heavily dependent on the initial matrix factorization step. The method of embeddings that we learnt in the class can be used for more accurate matrix factorization. Specifically we can
use deep learning based neural networks (CNN) for more accurate matrix factorization. The mean squared error achieved with neural networks is less than normal SGD methods.
Given link shows good illustration of this:
https://github.com/bradleypallen/keras-movielens-cf/blob/master/MovieLens%201M%20Recommendations.ipynb

#### Future Improvements
In Future, we would like to improve on the following:
##### Dataset:
We used the movie lens 100K database for this project. We tried for larger Movielens complete dataset of 100 MB entries but
could not process this much data due to computational limitations. Matrix Factorization takes a lot of time as the no. of
entries increase. Netflix dataset was even larger.

##### Better Data Structures:
We could try the Sparse Matrix Data Structures given in SciPy in future and store results of different parts of computation
as files on disk so that we do matrix factorization only once and reuse the user and item factors over and over.
    
    
    

## References

1. MovieLens Dataset- https://grouplens.org/datasets/movielens/100k/

2. Ortega, Fernando, et al. "Recommending items to group of users using Matrix Factorization based Collaborative Filtering."
Information Sciences 345 (2016): 313-324
http://www.sciencedirect.com/science/article/pii/S0020025516300196