# Movie Recommendation System using Collaborative Filtering (SVD)

This notebook implements a collaborative filtering–based recommendation system
using matrix factorization (SVD) on the MovieLens dataset.

The goal is to predict user preferences and generate top-N movie recommendations
to improve user engagement and content discovery.


## 1. Load and inspect the MovieLens dataset

We use the MovieLens ratings dataset, which contains user–movie interactions
in the form of explicit ratings.


In [1]:
import pandas as pd

ratings_path = 'ml-latest-small/ratings.csv'
ratings_df = pd.read_csv(ratings_path, usecols=["userId", "movieId", "rating"])     # Load user–movie rating data

ratings_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


## 2. Prepare data for the Surprise library

The Surprise library requires data to be provided in a specific format.
We convert the ratings DataFrame into a Surprise Dataset object.


In [None]:
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split

# Convert ratings data to Surprise-compatible format
reader = Reader(rating_scale=(0.5, 5.0))

data = Dataset.load_from_df(ratings_df[['userId', 'movieId', 'rating']], reader)

# Split the dataset into train and test sets
trainset, testset = train_test_split(data, test_size=0.25, random_state=42)

## 3. Train a baseline SVD model

We start with a baseline SVD model to establish a reference level of performance
before hyperparameter tuning.


In [None]:
from surprise import SVD
from surprise.accuracy import rmse

# Initialize and train the SVD model
model = SVD(random_state=42)
model.fit(trainset)

# Predict ratings for the test set
predictions = model.test(testset)

## 4. Model evaluation

Model performance is evaluated using Root Mean Squared Error (RMSE),
which measures the accuracy of rating predictions.


In [4]:
rmse(predictions)

RMSE: 0.8820


0.8820442070964672

## 5. Hyperparameter tuning with cross-validation

We use GridSearchCV to optimize SVD hyperparameters based on cross-validated RMSE, while the baseline RMSE is computed on a single train/test split.

In [None]:
from surprise.model_selection import GridSearchCV

# Define the hyperparameter grid for tuning
param_grid = {
    'n_epochs': [5, 10, 20],      # Number of training epochs
    'lr_all': [0.002, 0.005],     # Learning rate
    'n_factors': [50, 100, 200],  # Number of latent factors
    'reg_all': [0.02, 0.05, 0.1]  # Regularization strength
}

# Run GridSearchCV for SVD using the defined parameter grid
gs = GridSearchCV(SVD, param_grid, measures=['rmse'], cv=3)
gs.fit(data)

# Best hyperparameters and corresponding RMSE
best_params_SVD = gs.best_params['rmse']
best_score_SVD = gs.best_score['rmse']

print(f"Best RMSE (CV): {best_score_SVD:.4f}")
print("Best SVD parameters:", best_params_SVD)

# Train final model on the full dataset
trainset = data.build_full_trainset()
best_algo = SVD(**best_params_SVD, random_state=42)
best_algo.fit(trainset)

Best RMSE (CV): 0.8757
Best SVD parameters: {'n_epochs': 20, 'lr_all': 0.005, 'n_factors': 100, 'reg_all': 0.1}


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1d600dcdb90>

## 6. Example: generating recommendations for a single user

Below is a minimal example demonstrating how the trained SVD model
can be used to generate top-N movie recommendations for a given user.



In [None]:
# Select a user for demonstration
user_id = ratings_df["userId"].iloc[0]

# Get all unique movie IDs
movie_ids = ratings_df["movieId"].unique()

# Predict ratings for all movies
all_predictions = [
    (movie_id, best_algo.predict(user_id, movie_id).est)
    for movie_id in movie_ids
]
# Select top-N recommendations
top_n = sorted(all_predictions, key=lambda x: x[1], reverse=True)[:5]

pd.DataFrame(top_n, columns=["movie_id", "predicted_rating"])

Unnamed: 0,movie_id,predicted_rating
0,318,5.0
1,3451,4.986815
2,1204,4.986193
3,2959,4.984047
4,750,4.977597


## Conclusion

This notebook demonstrates a complete collaborative filtering pipeline
using matrix factorization (SVD), including data preparation, model training,
hyperparameter tuning, and evaluation.

The SVD model provides a strong and interpretable baseline
for production-oriented recommendation systems.

