# Collaborative Filtering Recommender System

A recommender system is an algorithm that suggests items to users based on their preferences. Common applications include movie recommendations on streaming services, product recommendations on e-commerce websites, and friend suggestions on social networks.  

This notebook demonstrates how to build a simple collaborative filtering recommender system using the `scikit-surprise` library. Collaborative filtering is a technique used in recommendation systems to identify user preferences and suggest items that users might like based on the preferences of similar users.

## Overview
In this notebook, we will walkthrough:

1. Installing the scikit-surprise library  
2. Importing libraries
2. Load and preprocess the dataset.  
3. Use the Surprise library to create a collaborative filtering model.
4. Make Top-N recommendations for users.

## Installation
First, we need to install the `scikit-surprise` library. The scikit-surprise is a Python library designed for building and analyzing recommender systems. It provides a simple and efficient way to create recommendation algorithms using collaborative filtering. The library includes various algorithms such as Singular Value Decomposition (SVD), Non-negative Matrix Factorization (NMF), and K-Nearest Neighbors (KNN), among others.

In [19]:
# Installing the scikit-surprise package
!pip install scikit-surprise




## Importing libraries

Next , we import the necessary libraries for building and evaluating a recommender system using the `scikit-surprise` library:

- `Dataset` and `Reader` from `surprise`: These classes are used to handle the dataset and define the format of the data.
- `SVD` from `surprise`: This is the Singular Value Decomposition algorithm, which is a matrix factorization technique commonly used in recommender systems.
- `accuracy` from `surprise`: This module provides functions for evaluating the accuracy of the recommender system.
- `train_test_split` from `surprise.model_selection`: This function is used to split the dataset into training and testiner system.


In [20]:
# Import necessary libraries
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split
import heapq
import pandas as pd

## Load and preprocess the dataset.

The dataset we will be using is the in-built MovieLens 100k dataset (If the dataset is not already downloaded, load_builtin will download it automatically). This dataset contains 100,000 ratings from 943 users on 1682 movies, with user ratings ranging from 1 to 5. The dataset is part of the larger MovieLens dataset collection, which includes larger datasets such as MovieLens 1M and MovieLens 20M.

We will load this dataset using the load_builtin method from Surprise. However, if you have your own data you can load it via the following method:  
    reader = Reader(rating_scale=(0.5, 5.0))  
    data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

In [21]:
# Load the movielens-100k dataset (download it if needed)
data = Dataset.load_builtin('ml-100k')

In [22]:
def load_movie_titles():
    movie_titles = {}
    with open(r"C:\Users\Victor\Downloads\ml-100k\ml-100k\u.item", encoding='ISO-8859-1') as file:
        for line in file:
            parts = line.strip().split('|')
            movie_id = parts[0]
            movie_title = parts[1]
            movie_titles[movie_id] = movie_title
    return movie_titles

movie_titles = load_movie_titles()


## Creating a collaborative filtering model using the SVD Algorithm
To build our collaborative model using the SVD (Singular Value Decomposition) algorithm from the Surprise library, we follow these steps:
1. Create an instance of the SVD algorithm using the SVD() constructor
2. Use the `fit()` method to train the algorithm on the training dataset (`trainset`).
3. Use the `test()` method to predict user ratings for the test set (`testset`)
4. Evaluate the model  computed using the `accuracy.rmse(predictions)` assesso evaluate the performance of the a5. Retrain training**: Tthen trained on the full dataset using the `build_full_trainset()` method to prepare it for making predictions 
6. Make predictions for a single userg is predicted using the `predict(user_id, it (method, where `user_id` and `item_id` are the identifiers of the user and item for which the rating is t)o bset (testset).en data.


In [23]:
# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.20, random_state = 42)

In [24]:
# Creating an SVD algorithm.
algo = SVD()

# Training the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x142b74c7890>

In [25]:
# Generating predictions for the testset
predictions = algo.test(testset)

# Assessing Model performance
accuracy.rmse(predictions)

RMSE: 0.9363


0.9363053235531996

In [26]:
# Retraining algorithm on the full dataset
trainset = data.build_full_trainset()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x142b74c7890>

In [27]:
# Predict a specific rating for a single user
user_id = '196'
item_id = '302'
predicted_rating = algo.predict(user_id, item_id)
print(predicted_rating)


user: 196        item: 302        r_ui = None   est = 4.06   {'was_impossible': False}


## Making Top N Recommendation for a User

Once we have trained our collaborative filtering model, we can use our model to recommend a list of items(movies) that a user is likely to be interested in but has not yet interacted with. To achieve this, we need to:  
1. Get a list of all movies in the dataset.
2. Identify which movies the user has already rated or interacted with.
3. Filter out the movies a user has already rated from the list of all movies, leaving only the unrated movies.
4. Use the trained collaborative filtering model to predict ratings for movies a user has yet to rate or interact with.
5. Sort the predicted ratings for the unrated movies in descending order, based on the likelihood of the user's interest.
6. Choose and display the top-N items with the highest predicted ratings as recommendations for the user.

In [28]:
# Creating function top N movies for a specific user
def get_top_n_recommendations(algo, user_id, n=10):
    """
    Get top N movie recommendations for a specific user.

    Args:
    - algo: Trained collaborative filtering algorithm (e.g., SVD).
    - user_id: ID of the user for whom recommendations are generated.
    - n: Number of recommendations to return (default is 10).

    Returns:
    - List of tuples containing movie ID and predicted rating, sorted by predicted rating in descending order.
    """
    # Get a list of all movie IDs
    all_movie_ids = trainset.all_items()
    
    # Get a list of all movie ratings for the user
    user_ratings = trainset.ur[trainset.to_inner_uid(user_id)]
    
    # Exclude already rated movies
    movie_already_rated = [item_id for item_id, rating in user_ratings]
    movie_ids_to_predict = [mid for mid in all_movie_ids if mid not in movie_already_rated]
    
    # Predict ratings for all unrated movies
    predictions = [algo.predict(user_id, trainset.to_raw_iid(mid)) for mid in movie_ids_to_predict]
    
    # Get the top N movie predictions
    top_n_predictions = heapq.nlargest(n, predictions, key=lambda x: x.est)
    
    # Extract the movie IDs and ratings
    top_n_movie_ids = [(pred.iid, pred.est) for pred in top_n_predictions]
    
    return top_n_movie_ids

In [29]:
## Getting Top-N Movies and Ratings for a User
user_id = '196'
top_n_recommendations = get_top_n_recommendations(algo, user_id, n=10)


recommendations_data = []

for movie_id, rating in top_n_recommendations:
    movie_title = movie_titles.get(movie_id, "Unknown Title")
    recommendations_data.append({"Movie ID": movie_id, "Movie Title": movie_title, "Predicted Rating": rating})

recommendations_df = pd.DataFrame(recommendations_data)


In [30]:
recommendations_df

Unnamed: 0,Movie ID,Movie Title,Predicted Rating
0,96,Terminator 2: Judgment Day (1991),4.63477
1,100,Fargo (1996),4.577455
2,169,"Wrong Trousers, The (1993)",4.562219
3,963,Some Folks Call It a Sling Blade (1993),4.483926
4,408,"Close Shave, A (1995)",4.480231
5,646,Once Upon a Time in the West (1969),4.402501
6,187,"Godfather: Part II, The (1974)",4.396875
7,114,Wallace & Gromit: The Best of Aardman Animatio...,4.375298
8,190,Henry V (1989),4.341503
9,79,"Fugitive, The (1993)",4.339178
