# Evaluating User-Based Collaborative Filtering

This notebook demonstrates how to evaluate a user-based collaborative filtering recommender system using leave-one-out cross-validation and hit rate metrics.

The evaluation process includes:
1. Loading the MovieLens dataset
2. Setting up leave-one-out cross-validation
3. Training a user-based KNN model
4. Generate recommendations for each user
5. Measuring the hit rate on the test set

### Import Required Libraries

In [1]:
from recsys.MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter
from surprise.model_selection import LeaveOneOut
from recsys.RecommenderMetrics import RecommenderMetrics
from recsys.EvaluationDataset import EvaluationDataset

### Load Data and Initialize Evaluator

In [2]:
lens, ratings_data, rankings = MovieLens.load()

# Initialize evaluation data with leave-one-out cross-validation
evaluation_dataset = EvaluationDataset(ratings_data, rankings)

### Train the Model

We'll train a user-based KNN model using cosine similarity on the leave-one-out training set.

In [6]:
# Get the leave-one-out training set
loo_trainset = evaluation_dataset.loo_trainset

# Configure and train the model
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options, verbose=False)
model.fit(loo_trainset)
similarity_matrix = model.compute_similarities()

### Generate Recommendations

For each user, we'll:
1. Find their k most similar users
2. Get items rated by similar users
3. Generate recommendations based on weighted ratings

In [4]:
# Get the leave-one-out test set
loo_testset = evaluation_dataset.loo_testset

# Generate recommendations for each user
top_n = defaultdict(list)
k = 10

for uiid in range(loo_trainset.n_users):
    # Get top N similar users
    similarityRow = similarity_matrix[uiid]
    
    similar_users = []
    for inner_id, score in enumerate(similarityRow):
        if (inner_id != uiid):
            similar_users.append((inner_id, score))
    
    k_neighbours = heapq.nlargest(k, similar_users, key=lambda t: t[1])
    
    # Get items rated by similar users
    candidates = defaultdict(float)
    for similar_user in k_neighbours:
        inner_id = similar_user[0]
        user_similarity_score = similar_user[1]
        users_own_ratings = loo_trainset.ur[inner_id]
        for rating in users_own_ratings:
            candidates[rating[0]] += (rating[1] / 5.0) * user_similarity_score
    
    # Track items the user has already seen
    watched = {}
    for item_id, rating in loo_trainset.ur[uiid]:
        watched[item_id] = 1
    
    # Get top recommendations
    pos = 0
    for item_id, rating_sum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
        if not item_id in watched:
            movie_id = loo_trainset.to_raw_iid(item_id)
            top_n[int(loo_trainset.to_raw_uid(uiid))].append((int(movie_id), 0.0))
            pos += 1
            if (pos > 40):
                break

### Evaluate Performance

Let's measure the hit rate of our recommendations against the test set.

In [8]:
hit_rate, cumulative_hit_rate, average_reciprocal_hit_rank = RecommenderMetrics.hit_rate_metrics(top_n, loo_testset, 4.0)
    

(
    f"Hit Rate: {hit_rate}\n"
    f"Cumulative Hit Rate: {cumulative_hit_rate}\n"
    f"Average Reciprocal Hit Rank: {average_reciprocal_hit_rank}"
)

'Hit Rate: 0.05514157973174367'