# Evaluating User-Based Collaborative Filtering

This notebook demonstrates how to evaluate a user-based collaborative filtering recommender system using leave-one-out cross-validation and hit rate metrics.

The evaluation process includes:
1. Loading the MovieLens dataset
2. Setting up leave-one-out cross-validation
3. Training a user-based KNN model
4. Computing recommendations for each user
5. Measuring the hit rate on the test set

In [None]:
# Import required libraries
from recsys.MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter
from surprise.model_selection import LeaveOneOut
from recsys.RecommenderMetrics import RecommenderMetrics
from recsys.EvaluationData import EvaluationData

In [None]:
# Load the MovieLens dataset
ml, data, rankings = MovieLens.load()

# Initialize evaluation data with leave-one-out cross-validation
evalData = EvaluationData(data, rankings)

## Train the Model

We'll train a user-based KNN model using cosine similarity on the leave-one-out training set.

In [None]:
# Get the leave-one-out training set
trainSet = evalData.GetLOOCVTrainSet()

# Configure and train the model
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()

## Generate Recommendations

For each user, we'll:
1. Find their k most similar users
2. Get items rated by similar users
3. Generate recommendations based on weighted ratings

In [None]:
# Get the leave-one-out test set
leftOutTestSet = evalData.GetLOOCVTestSet()

# Generate recommendations for each user
topN = defaultdict(list)
k = 10

for uiid in range(trainSet.n_users):
    # Get top N similar users
    similarityRow = simsMatrix[uiid]
    
    similarUsers = []
    for innerID, score in enumerate(similarityRow):
        if (innerID != uiid):
            similarUsers.append((innerID, score))
    
    kNeighbors = heapq.nlargest(k, similarUsers, key=lambda t: t[1])
    
    # Get items rated by similar users
    candidates = defaultdict(float)
    for similarUser in kNeighbors:
        innerID = similarUser[0]
        userSimilarityScore = similarUser[1]
        theirRatings = trainSet.ur[innerID]
        for rating in theirRatings:
            candidates[rating[0]] += (rating[1] / 5.0) * userSimilarityScore
    
    # Track items the user has already seen
    watched = {}
    for itemID, rating in trainSet.ur[uiid]:
        watched[itemID] = 1
    
    # Get top recommendations
    pos = 0
    for itemID, ratingSum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
        if not itemID in watched:
            movieID = trainSet.to_raw_iid(itemID)
            topN[int(trainSet.to_raw_uid(uiid))].append((int(movieID), 0.0))
            pos += 1
            if (pos > 40):
                break

## Evaluate Performance

Let's measure the hit rate of our recommendations against the test set.

In [None]:
# Calculate and print the hit rate
print("Hit Rate:", RecommenderMetrics.HitRate(topN, leftOutTestSet))