# Movie Recommendation System using Collaborative Filtering

This notebook implements a movie recommendation system using collaborative filtering techniques. We'll use the MovieLens dataset to build and evaluate our recommendation model.

## 1. Import Libraries

In [65]:
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate, train_test_split
from surprise import accuracy

## 2. Load and Explore the Dataset

We'll use the MovieLens 100K dataset which contains 100,000 ratings from 943 users on 1,682 movies.

In [66]:
data = Dataset.load_builtin('ml-100k')

## 3. Train-Test Split for Evaluation

In [67]:
trainset, testset = train_test_split(data, test_size=0.2)

## 4. Collaborative Filtering Implementation
We are using the **SVD (Singular Value Decomposition)**, which is a model-based collaborative filtering method for building the recommender system.

In [68]:
model = SVD()
model.fit(trainset)
predictions = model.test(testset)

## 5. Generate Recommendations for Sample User

In [69]:
from collections import defaultdict

def get_top_n(predictions, n=5):
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]
    return top_n 


In [70]:
movie_names = {}
with open('db/ml-100k/u.item', encoding='ISO-8859-1') as f:
    for line in f:
        parts = line.strip().split('|')
        movie_id = parts[0]
        movie_title = parts[1]
        movie_names[movie_id] = movie_title


In [71]:
user_id = '889'
print(f"Movie recommendation for the user {user_id}\n")
for iid, est_rating in top_n[user_id]:
    print(f"{movie_names[iid]}: {est_rating:.2f}")


Movie recommendation for the user 889

Rear Window (1954): 4.27
One Flew Over the Cuckoo's Nest (1975): 4.26
L.A. Confidential (1997): 4.22
Manchurian Candidate, The (1962): 4.21
Jean de Florette (1986): 4.20


## 6. Model Evaluation

We'll evaluate our recommendation models using common metrics like RMSE (Root Mean Square Error) and MAE (Mean Absolute Error).

In [72]:
print("RMSE:", accuracy.rmse(predictions))
print("MAE:", accuracy.mae(predictions))


RMSE: 0.9333
RMSE: 0.9332846946707201
MAE:  0.7351
MAE: 0.7350734366228762


In [73]:
def precision_recall_at_k(predictions, k=5, threshold=4.0):
    user_est_true = defaultdict(list)
    
    for uid, iid, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = {}
    recalls = {}

    for uid, user_ratings in user_est_true.items():
        # Sort by estimated rating
        user_ratings.sort(key=lambda x: x[0], reverse=True)
        # Top-K
        top_k = user_ratings[:k]

        # Count relevant items
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)
        # Count recommended items that are relevant
        n_rec_k = sum((true_r >= threshold) for (_, true_r) in top_k)
        # Precision and recall
        precisions[uid] = n_rec_k / k
        recalls[uid] = n_rec_k / n_rel if n_rel != 0 else 0

    avg_precision = sum(prec for prec in precisions.values()) / len(precisions)
    avg_recall = sum(rec for rec in recalls.values()) / len(recalls)
    return avg_precision, avg_recall


In [74]:
precision, recall = precision_recall_at_k(predictions, k=5, threshold=4.0)
f1_score = 2 * (precision * recall) / (precision + recall)

print(f"Precision@5: {precision:.4f}")
print(f"Recall@5: {recall:.4f}")
print(f"F1@5: {f1_score:.4f}")



Precision@5: 0.6995
Recall@5: 0.5145
F1@5: 0.5929
