# User-Based Collaborative Filtering with Exploration

This notebook demonstrates how to evaluate a user-based collaborative filtering system that incorporates exploration of new movies. The system uses cosine similarity to find similar users and makes recommendations based on their ratings, while occasionally suggesting new movies to promote exploration.

## Setup and Imports

First, let's import the necessary libraries and load our data.

In [1]:
from recsys.MovieLens import MovieLens
from surprise import KNNBasic
import heapq
import random
from collections import defaultdict
from operator import itemgetter
from recsys.RecommenderMetrics import RecommenderMetrics
from recsys.EvaluationData import EvaluationData

ml, data, rankings = MovieLens.load()

## Prepare Evaluation Data

Set up the evaluation data using leave-one-out cross-validation (LOOCV) to test our recommender system.

In [2]:
evalData = EvaluationData(data, rankings)

# Train on leave-One-Out train set
trainSet = evalData.GetLOOCVTrainSet()
leftOutTestSet = evalData.GetLOOCVTestSet()

## Train the Model

Train a KNN-based model using cosine similarity for user-based collaborative filtering.

In [3]:
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()

## Get New Movies for Exploration

Retrieve a list of new movies that we can use to promote exploration in our recommendations.

In [4]:
# Get new movies that need data
newMovies = ml.getNewMovies()
explorationSlot = 9  # Position in recommendations where we'll insert a new movie

## Generate Recommendations with Exploration

For each user, we'll:
1. Find their k most similar users
2. Get movies rated by similar users
3. Weight the ratings by user similarity
4. Insert a new movie at the exploration slot
5. Generate top-N recommendations

In [5]:
# Build up dict to lists of (int(movieID), predictedrating) pairs
topN = defaultdict(list)
k = 10  # Number of similar users to consider

for uiid in range(trainSet.n_users):
    # Get top N similar users to this one
    similarityRow = simsMatrix[uiid]
    
    similarUsers = []
    for innerID, score in enumerate(similarityRow):
        if (innerID != uiid):
            similarUsers.append((innerID, score))
    
    kNeighbors = heapq.nlargest(k, similarUsers, key=lambda t: t[1])
    
    # Get the stuff they rated, and add up ratings for each item, weighted by user similarity
    candidates = defaultdict(float)
    for similarUser in kNeighbors:
        innerID = similarUser[0]
        userSimilarityScore = similarUser[1]
        theirRatings = trainSet.ur[innerID]
        for rating in theirRatings:
            candidates[rating[0]] += (rating[1] / 5.0) * userSimilarityScore
    
    # Build a dictionary of stuff the user has already seen
    watched = {}
    for itemID, rating in trainSet.ur[uiid]:
        watched[itemID] = 1
    
    # Get top-rated items from similar users:
    pos = 0
    for itemID, ratingSum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
        if not itemID in watched:
            movieID = 0
            if (pos == explorationSlot):
                movieID = random.choice(newMovies)
            else:
                movieID = trainSet.to_raw_iid(itemID)
            topN[int(trainSet.to_raw_uid(uiid))].append((int(movieID), 0.0))
            pos += 1
            if (pos > 40):
                break

## Evaluate Performance

Measure the hit rate of our recommendations to see how well the system performs with the exploration component.

In [6]:
# Measure hit rate
print("HR", RecommenderMetrics.HitRate(topN, leftOutTestSet))