### Installing Scikit-Surprise

In [1]:
pip install scikit-surprise

Note: you may need to restart the kernel to use updated packages.


### Importing Packages

In [None]:
import os
from surprise.model_selection import cross_validate
import numpy as np
import pandas as pd
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split

### Dataset. 

Movie Ratings Data from MovieLens

In this collection we have a list of users with their correspondent features: 
- User IDs
- Movie IDs
- Ratings
- Timestamps

### Data Preprocessing

In [74]:
# Loading the ratings.csv dataset
df = pd.read_csv("C:/Users/dipti/Downloads/ml-latest-small/ml-latest-small/ratings.csv")
df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [75]:
reader = Reader(rating_scale = (1, 5))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

In [76]:
trainset, testset = train_test_split(data, test_size = 0.2, train_size = 0.8)

In [77]:
trainset.rating_scale

(1, 5)

In [80]:
numitems = trainset.n_items

In [81]:
numusers = trainset.n_users

### Print number of items and users in the training set

In [82]:
print("Number of items in training set:", numitems)
print("Number of users in training set:", numusers)

Number of items in training set: 8973
Number of users in training set: 610


### Algorithm for User-based Collaborative Filtering System

In [83]:
from surprise import KNNBasic

In [84]:
# User Based Individual Recommender System
sim_options = {"name": "MSD", "user_based": True, "shrinkage": 100}  # no shrinkage
algo1 = KNNBasic(sim_options=sim_options)
#algo = KNNBasic()
algo1.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x1a6a75a88e0>

In [85]:
# Item Based Individual Recommender System
from surprise import KNNBasic
sim_options = {"name": "MSD", "user_based": False, "shrinkage": 100}  # no shrinkage
algo2 = KNNBasic(sim_options=sim_options)
#algo = KNNBasic()
algo2.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x1a6a5b6cf40>

In [86]:
# Evaluate the model on test data
from surprise import accuracy 
predictions = algo1.test(testset)
RMSE = accuracy.rmse(predictions)

RMSE: 0.9533


In [87]:
# Evaluate the model
from surprise import accuracy 
predictions = algo2.test(testset)
RMSE = accuracy.rmse(predictions)

RMSE: 0.9184


#### Observation:
The RMSE score for Item-based Recommender System is <b>lower</b> than the User-based Recommender System, indicating better model performance

- One way to improve recommender systems is to incorporate more data sources, such as social network data, browsing history, and search queries, to provide more accurate and diverse recommendations.
- Another way is to use more sophisticated algorithms that can handle more complex data and provide more precise predictions. In addition, incorporating user feedback and preferences can help improve the accuracy of recommendations by providing a more personalized experience for the user. 
- Finally, regularly evaluating and testing the performance of the system can help identify any areas for improvement and ensure that the system continues to provide high-quality recommendations. By implementing these recommendations, recommender systems can provide even more accurate and personalized recommendations to users, improving the overall user experience.

### Bonus Task: Improving the Recommender System

In [88]:
from surprise import KNNBasic, SVD, Dataset, Reader, model_selection

# train a user-based KNN algorithm on the training set
def train_user_based(trainset):
    user_based_sim_options = {'name': 'cosine', 'user_based': True}
    user_based_algo = KNNBasic(sim_options = user_based_sim_options)
    user_based_algo.fit(trainset)
    return user_based_algo

# train an item-based KNN algorithm on the training set
def train_item_based(trainset):
    item_based_sim_options = {'name': 'cosine', 'user_based': False}
    item_based_algo = KNNBasic(sim_options = item_based_sim_options)
    item_based_algo.fit(trainset)
    return item_based_algo

# train an SVD algorithm on the training set
def train_svd(trainset):
    svd_algo = SVD()
    svd_algo.fit(trainset)
    return svd_algo

# define a hybrid recommender system that combines the user-based and item-based KNN algorithms
def hybrid_recommender(user_based_algo, item_based_algo, svd_algo):
    def estimate(user_id, item_id):
        user_based_rating = user_based_algo.predict(user_id, item_id).est
        item_based_rating = item_based_algo.predict(user_id, item_id).est
        svd_rating = svd_algo.predict(user_id, item_id).est
        return (user_based_rating + item_based_rating + svd_rating) / 3
    return estimate

# train the algorithms on the training set
user_based_algo = train_user_based(trainset)
item_based_algo = train_item_based(trainset)
svd_algo = train_svd(trainset)

# create a hybrid recommender system function
hybrid_algo = hybrid_recommender(user_based_algo, item_based_algo, svd_algo)

# make predictions on the test set using the hybrid algorithm
predictions = []
for user_id, item_id, rating in testset:
    predicted_rating = hybrid_algo(user_id, item_id)
    predictions.append(predicted_rating)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.


In [89]:
from sklearn.metrics import mean_squared_error

# compute and print the RMSE score of the predictions
rmse = np.sqrt(mean_squared_error([rating for user_id, item_id, rating in testset], predictions))
print("RMSE for Hybrid system:", rmse)

RMSE for Hybrid system: 0.8914867249029854


### Comparing performance of Hybrid and Individual Recommender Systems
#### RMSE Scores for the systems:
- RMSE Score for the Individual User-Based Recommender System: 0.9533
- RMSE Score for the Individual Item-Based Recommender System: 0.9184
- RMSE Score for the Hybrid Recommender System: 0.8914

#### On observing the RMSE scores for all the systems implemented, it is clear that the Hybrid Recommender System has performed better than both the individual Recommender Systems as the RMSE Score of the former is lower than the latter, indicating that the Hybrid System is more reliable than the Individual Systems.

The enhanced performance of the hybrid system is a result of the combination of the strengths of both the User-based and item-based algorithms. While the user-based algorithm captures user preferences and similarities effectively, the Item-based algorithm focuses on item similarities and characteristics. 
Through the integration of these two approaches with an SVD algorithm, the hybrid system delivers more precise recommendations by incorporating both user and item similarities.