# User-Based Collaborative Filtering Analysis

This notebook demonstrates how to evaluate a user-based collaborative filtering system using leave-one-out cross-validation. The system uses cosine similarity to find similar users and makes recommendations based on their ratings, helping us understand how well the algorithm performs for different types of users.

### Setup and Imports

First, let's import the necessary libraries and load our data.

In [1]:
from recsys.MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter
from recsys.RecommenderMetrics import RecommenderMetrics
from recsys.EvaluationDataset import EvaluationDataset

lens, ratings_data, rankings = MovieLens.load()

### Prepare Evaluation Data

Set up the evaluation data using leave-one-out cross-validation (loo_iterator) to test our recommender system. This approach helps us understand how well the system performs for individual users.

In [2]:
evaluation_dataset = EvaluationDataset(ratings_data, rankings)

# Train on leave-One-Out train set
loo_trainset = evaluation_dataset.loo_trainset
loo_testset = evaluation_dataset.loo_testset

### Train the Model

Train a KNN-based model using cosine similarity for user-based collaborative filtering. This will help us identify similar users and their preferences.

In [3]:
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options)
model.fit(loo_trainset)
similarity_matrix = model.compute_similarities()

### Generate Recommendations

For each user, we'll:
1. Find their k most similar users
2. Get movies rated by similar users
3. Weight the ratings by user similarity
4. Generate top-N recommendations

This process helps us understand how well the system can predict user preferences based on similar users' behavior.

In [4]:
# Build up dict to lists of (int(movie_id), predictedrating) pairs
top_n = defaultdict(list)
k = 10  # Number of similar users to consider

for uiid in range(loo_trainset.n_users):
    # Get top N similar users to this one
    similarityRow = similarity_matrix[uiid]
    
    similar_users = []
    for inner_id, score in enumerate(similarityRow):
        if (inner_id != uiid):
            similar_users.append((inner_id, score))
    
    k_neighbours = heapq.nlargest(k, similar_users, key=lambda t: t[1])
    
    # Get the stuff they rated, and add up ratings for each item, weighted by user similarity
    candidates = defaultdict(float)
    for similar_user in k_neighbours:
        inner_id = similar_user[0]
        user_similarity_score = similar_user[1]
        users_own_ratings = loo_trainset.ur[inner_id]
        for rating in users_own_ratings:
            candidates[rating[0]] += (rating[1] / 5.0) * user_similarity_score
    
    # Build a dictionary of stuff the user has already seen
    watched = {}
    for item_id, rating in loo_trainset.ur[uiid]:
        watched[item_id] = 1
    
    # Get top-rated items from similar users:
    pos = 0
    for item_id, rating_sum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
        if not item_id in watched:
            movie_id = loo_trainset.to_raw_iid(item_id)
            top_n[int(loo_trainset.to_raw_uid(uiid))].append((int(movie_id), 0.0))
            pos += 1
            if (pos > 40):
                break

### Evaluate Performance

Measure the hit rate of our recommendations to see how well the system performs. This metric tells us how often our recommendations match the user's actual preferences.

In [5]:
hit_rate, cumulative_hit_rate, average_reciprocal_hit_rank, rating_hit_rate = RecommenderMetrics.hit_rate_metrics(top_n, loo_testset, 4.0)

(
    f"Hit Rate: {hit_rate}\n"
    f"Cumulative Hit Rate: {cumulative_hit_rate}\n"
    f"Average Reciprocal Hit Rank: {average_reciprocal_hit_rank}"
    f"Rating Hit Rate: {rating_hit_rate}"
)