# Simple User-Based Collaborative Filtering

This notebook demonstrates a simple implementation of user-based collaborative filtering using the MovieLens dataset.

The process includes:
1. Loading the MovieLens dataset
2. Training a user-based KNN model
3. Finding similar users to a test subject
4. Generating personalized recommendations based on similar users' ratings

In [1]:
from recsys.MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter

### Setup and Data Loading

We'll use user ID 85 as our test subject and set k=10 for the number of similar users to consider.

In [None]:
# Set parameters
test_uid = 85
k = 10

# Load the MovieLens dataset
lens, data, ratings = MovieLens.load()

# Build the training set
train_set = data.build_full_trainset()

### Train the Model

We'll train a user-based KNN model using cosine similarity.

In [None]:
# Configure and train the model
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options)
model.fit(train_set)
similarity_matrix = model.compute_similarities()

### Generate Recommendations

For our test user, we'll:
1. Find their k most similar users
2. Get items rated by similar users
3. Generate recommendations based on weighted ratings from similar users

In [None]:
# Get the test user's inner ID
test_inner_uid = train_set.to_inner_uid(str(test_uid))

# Find similar users
user_similarities = similarity_matrix[test_inner_uid]

similar_users = []
for inner_uid, score in enumerate(user_similarities):
    if (inner_uid != test_inner_uid):
        similar_users.append((inner_uid, score))

k_neighbours = heapq.nlargest(k, similar_users, key=lambda t: t[1])

# Get items rated by similar users
candidates = defaultdict(float)
for similar_user in k_neighbours:
    inner_uid = similar_user[0]
    user_similarity_score = similar_user[1]
    users_own_ratings = train_set.ur[inner_uid]
    for rating in users_own_ratings:
        candidates[rating[0]] += (rating[1] / 5.0) * user_similarity_score

# Track items the user has already seen
watched = {}
for item_id, rating in train_set.ur[test_inner_uid]:
    watched[item_id] = 1

print("Top 10 recommendations for user", test_uid, ":\n")
pos = 0
for item_id, rating_sum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
    if not item_id in watched:
        movie_id = train_set.to_raw_iid(item_id)
        print(lens.get_movie_name(int(movie_id)), rating_sum)
        pos += 1
        if (pos > 10):
            break