# Sequential Group Recommendations in MovieLens 100K


**Authors**: Ashkan Khademian, Ujunwa Edum\
**Project Part**: Part I\
**Course**: DATA.ML.360-2024-2025-1 Recommender Systems

# Foundations

## Introduction
Lorem ipsum

## Install Requirements

Use the comment template for installing your packages that are not already present in the google colab environment.

In [20]:
# !pip install <package-name>

## Import Libararies

### Main Libraries

In [21]:
import random
import typing
from time import sleep
from collections import defaultdict
from functools import lru_cache

from tqdm import tqdm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations

## Part II Utils

In [22]:
from part2_utils.predict_user_rating import predict_user_rating

### Typing

In [23]:
from typing import *
from pandas.core.frame import DataFrame, Series

## Define Constants

In [24]:
RATING_DATASET = "data/ml-latest-small/ratings.csv"
MOVIES_DETAILS_DATASET = "data/ml-latest-small/movies.csv"

# Sequential Group Recommendation

## Load Data

In [25]:
raw_df = pd.read_csv(RATING_DATASET)

In [26]:
movies_details_df = pd.read_csv(MOVIES_DETAILS_DATASET)

### Transform CSV DataFrame to User-Movies Matrix
The `transform_csv_dataframe_to_user_movies_matrix` function transforms the DataFrame resulted from the dataset CSV into a user-movies matrix where the rows represent users, the columns represent movies, and the values represent the ratings given by users to movies.

In [27]:
def transform_csv_dataframe_to_user_movies_matrix(csv_df: DataFrame) -> DataFrame:
  user_movie_matrix = csv_df.pivot(index='userId', columns='movieId', values='rating')
  user_movie_matrix.reset_index(inplace=True)
  user_movie_matrix.columns.name = None
  return user_movie_matrix

In [28]:
user_movie_matrix = transform_csv_dataframe_to_user_movies_matrix(raw_df)

### Get Movie Genres
The `get_movie_genres` function retrieves the genres of movies based on the movie IDs. It filters the DataFrame to include only rows with the specified movie IDs and creates a dictionary mapping movie IDs to their genres (as sets).

In [29]:
def get_movie_genres(movie_ids) -> Dict[int, Set[str]]:
    df = movies_details_df.copy()
    # Filter the DataFrame to include only rows with the specified movie IDs
    filtered_df = df[df['movieId'].isin(movie_ids)]

    # Create a dictionary mapping movie IDs to their genres (as sets)
    movie_genres = {
        row['movieId']: set(row['genres'].split('|'))
        for _, row in filtered_df.iterrows()
    }

    return movie_genres

## Sequential Group Recommendation Implementation

### Preference Score
The `preference_score` function calculates the preference score of a user for a movie. It uses the `predict_user_rating` function to predict the rating of the user for the movie.

In [30]:
@lru_cache(maxsize=None)
def preference_score(user_id, movie_id):
    return predict_user_rating(user_movie_matrix, user_id, movie_id)

### Calculate User Satisfaction
The `calculate_user_satisfaction` function calculates the user satisfaction based on the group recommendations. It divides the group satisfaction by the user's ideal satisfaction.

In [31]:
def calculate_user_satisfaction(user_id, group_rec, user_ratings, user_satisfactions):
    """
    Calculate user satisfaction based on the group recommendations.
    """
    user_ideal_satisfaction = user_satisfactions[user_id]
    group_satisfaction = sum([preference_score(user_id, movie) for movie in group_rec])
    return group_satisfaction / user_ideal_satisfaction if user_ideal_satisfaction > 0 else 0

### Calculate Group Satisfaction
The `calculate_group_satisfaction` function calculates the average satisfaction across all users in the group. It uses the `calculate_user_satisfaction` function to calculate the satisfaction of each user in the group.

In [32]:
def calculate_group_satisfaction(group, group_rec, user_ratings, user_satisfactions):
    """
    Calculate average satisfaction across all users in the group.
    """
    individual_satisfactions = [calculate_user_satisfaction(user, group_rec, user_ratings, user_satisfactions) for user in group]
    return np.mean(individual_satisfactions)


### Calculate Group Disagreement
The `calculate_group_disagreement` function calculates the disagreement within the group as the difference between the maximum and minimum satisfaction.

In [33]:
def calculate_group_disagreement(group, group_rec, user_ratings, user_satisfactions):
    """
    Calculate disagreement within the group as the difference between max and min satisfaction.
    """
    individual_satisfactions = [calculate_user_satisfaction(user, group_rec, user_ratings, user_satisfactions) for user in group]
    return max(individual_satisfactions) - min(individual_satisfactions)

### Calculate Genre Diversity
The `calculate_genre_diversity` function calculates the genre diversity score for a set of movies. The higher the score, the more diverse the genre representation. It calculates the entropy-based diversity score and it further normalizes the score to be between 0 and 1.

In [34]:
def calculate_genre_diversity(group, movies, movie_genres, user_ratings):
    """
    Calculate genre diversity score for a set of movies.
    Higher score means more diverse genre representation.

    Args:
        group: List of user IDs
        movies: List of movie IDs to evaluate
        movie_genres: Dict mapping movie IDs to their genres
        user_ratings: Dict of user ratings

    Returns:
        Float between 0 and 1 indicating genre diversity
    """
    if not movies:
        return 0

    # Collect all genres in the recommended movies
    movie_genre_set = set()
    for movie in movies:
        if movie in movie_genres:
            movie_genre_set.update(movie_genres[movie])

    # Count genre occurrences
    genre_counts = defaultdict(int)
    for movie in movies:
        if movie in movie_genres:
            for genre in movie_genres[movie]:
                genre_counts[genre] += 1

    if not genre_counts:
        return 0

    # Calculate entropy-based diversity score
    total_genres = sum(genre_counts.values())
    proportions = [count / total_genres for count in genre_counts.values()]
    entropy = -sum(p * np.log(p) if p > 0 else 0 for p in proportions)
    max_entropy = np.log(len(movie_genre_set)) if movie_genre_set else 1

    # Normalize to [0,1]
    diversity_score = entropy / max_entropy if max_entropy > 0 else 0
    return diversity_score


### Calculate Average and Least Score
The `avgScore` function calculates the average preference score for the movie among the group members, while the `leastScore` function calculates the minimum preference score for the movie among the group members.

In [35]:
def avgScore(group, movie, user_ratings):
    """Calculate the average preference score for the movie among the group members."""
    return np.mean([preference_score(user, movie) for user in group])

def leastScore(group, movie, user_ratings):
    """Calculate the minimum preference score for the movie among the group members."""
    return min([preference_score(user, movie) for user in group])

### Generate Sequential Recommendations
The `generate_sequential_recommendations` function generates sequential recommendations for a group of users. It uses three components to generate recommendations: average score (group preference), least score (fairness), and genre diversity (variety). The function iterates over the specified number of rounds and updates the weights for the three components based on the group satisfaction, disagreement, and diversity scores. It prints the group recommendations for each iteration.

In [36]:
def generate_sequential_recommendations(
        group,
        user_ratings,
        movies,
        movie_genres,
        iterations=10,
        top_k=10,
        alpha=0,  # weight for least score
        beta=1,  # weight for average score
        gamma=0,  # weight for diversity score
):
    """
    Enhanced sequential recommendation generator with three components:
    1. Average Score (group preference)
    2. Least Score (fairness)
    3. Genre Diversity (variety)

    Args:
        group: List of user IDs
        user_ratings: Dict of user ratings
        movies: List of available movies
        movie_genres: Dict mapping movie IDs to their genres
        iterations: Number of recommendation rounds
        top_k: Number of recommendations per round
        alpha: Weight for least score
        beta: Weight for average score
        gamma: Weight for diversity score
    """
    assert abs((alpha + beta + gamma) - 1.0) < 1e-6, "Weights must sum to 1"

    group_recommendations = []
    movie_scores = defaultdict(float)
    user_satisfactions = defaultdict(float)

    # Calculate ideal satisfaction for each user
    for user in group:
        top_user_prefs = sorted(user_ratings[user].items(), key=lambda x: x[1], reverse=True)[:top_k]
        user_satisfactions[user] = sum([score for _, score in top_user_prefs])

    for iteration in range(iterations):
        print(f"Iteration {iteration + 1} progress")
        print(f"Current weights - α: {alpha:.2f}, β: {beta:.2f}, γ: {gamma:.2f}\n")
        movies_iterator = movies
        if iteration == 0:
            sleep(0.001)
            movies_iterator = tqdm(movies_iterator, desc="Calculating scores for each user-movie pair and storing them")

        # Calculate scores for each movie
        for movie in movies_iterator:
            avg_score = avgScore(group, movie, user_ratings)
            least_score = leastScore(group, movie, user_ratings)

            # Calculate temporary recommendations with this movie
            temp_recommendations = group_recommendations[-1][:top_k - 1] + [movie] if group_recommendations else [movie]
            diversity_score = calculate_genre_diversity(group, temp_recommendations, movie_genres, user_ratings)

            # Combine scores with weights
            movie_scores[movie] = (alpha * least_score +
                                   beta * avg_score +
                                   gamma * diversity_score)

        # Select top movies based on combined score
        top_movies = sorted(movie_scores, key=movie_scores.get, reverse=True)[:top_k]
        group_recommendations.append(top_movies)

        # Calculate metrics
        group_sat = calculate_group_satisfaction(group, top_movies, user_ratings, user_satisfactions)
        group_dis = calculate_group_disagreement(group, top_movies, user_ratings, user_satisfactions)
        group_div = calculate_genre_diversity(group, top_movies, movie_genres, user_ratings)

        # Update weights based on metrics with distinct strategies
        # If satisfaction is low, increase beta (satisfaction weight)
        if group_sat < 0.7:  # threshold for "low" satisfaction
            beta = min(0.6, beta + 0.05)  # increase but cap at 0.6

        # If disagreement is high, increase alpha (fairness weight)
        if group_dis > 0.20:  # threshold for "high" disagreement
            alpha = min(0.5, alpha + 0.05)  # increase but cap at 0.5

        # If diversity is low, increase gamma (diversity weight)
        if group_div < 0.9:  # threshold for "low" diversity
            gamma = min(0.4, gamma + 0.05)  # increase but cap at 0.4

        # Normalize weights to sum to 1
        total = alpha + beta + gamma
        alpha = alpha / total
        beta = beta / total
        gamma = gamma / total

        print(f"\nIteration {iteration + 1}")
        print(f"Top Movies: {top_movies}")
        print(f"Group Satisfaction: {group_sat:.3f}")
        print(f"Group Disagreement: {group_dis:.3f}")
        print(f"Genre Diversity: {group_div:.3f}")

        sleep(0.001)

    return group_recommendations

## Example

In [41]:
sample_group = [597, 217, 66, 177, 274, 391, 483, 561, 414, 509, 160]

user_ratings = raw_df[raw_df["userId"].isin(sample_group)].groupby('userId')[['movieId', 'rating']].apply(
    lambda x: dict(zip(x['movieId'], x['rating']))).to_dict()
group_users_movies = user_movie_matrix[user_movie_matrix['userId'].isin(sample_group)].dropna(axis=1, how='all').columns.tolist()[1:]
movies = random.sample([m for m in raw_df['movieId'].unique() if m not in group_users_movies], 100)
movie_genres = get_movie_genres(movies)

In [42]:
# Run the sequential group recommendation process
recommendations = generate_sequential_recommendations(sample_group, user_ratings, movies, movie_genres)

Iteration 1 progress
Current weights - α: 0.00, β: 1.00, γ: 0.00



Calculating scores for each user-movie pair and storing them: 100%|██████████| 100/100 [00:19<00:00,  5.09it/s]


Iteration 1
Top Movies: [np.int64(87234), np.int64(59143), np.int64(84414), np.int64(7706), np.int64(3046), np.int64(116897), np.int64(165947), np.int64(4399), np.int64(117887), np.int64(4263)]
Group Satisfaction: 0.815
Group Disagreement: 0.314
Genre Diversity: 0.855
Iteration 2 progress
Current weights - α: 0.05, β: 0.91, γ: 0.05


Iteration 2
Top Movies: [np.int64(87234), np.int64(59143), np.int64(84414), np.int64(7706), np.int64(3046), np.int64(116897), np.int64(165947), np.int64(4399), np.int64(4263), np.int64(115122)]
Group Satisfaction: 0.814
Group Disagreement: 0.283
Genre Diversity: 0.855
Iteration 3 progress
Current weights - α: 0.09, β: 0.83, γ: 0.09


Iteration 3
Top Movies: [np.int64(87234), np.int64(59143), np.int64(84414), np.int64(7706), np.int64(3046), np.int64(116897), np.int64(4263), np.int64(115122), np.int64(165947), np.int64(4399)]
Group Satisfaction: 0.814
Group Disagreement: 0.283
Genre Diversity: 0.855
Iteration 4 progress
Current weights - α: 0.12, β: 0.75, γ


