# ***Netflix Recommendation System using SVD***

# Problem Statement

Users face difficulty discovering relevant content from large streaming catalogs. Build a personalized recommendation system using collaborative filtering to suggest movies based on user rating behavior.

# Objective

The objective of this project is to build a collaborative filtering recommendation system using matrix factorization (Funk SVD) implemented via the Surprise library.

Specifically, the project aims to:
Learn latent user and item representations from rating data
Minimize prediction error using stochastic gradient descent
Predict missing ratings in the user-item interaction matrix
Generate personalized movie recommendations
Evaluate model performance using RMSE

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#!pip install numpy==1.26.4
#SVD
!pip install scikit-surprise

from surprise import Reader,Dataset,SVD
from surprise.model_selection import cross_validate

In [None]:
df=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Netflix/Copy of combined_data_1.txt.zip',header=None,names=['Cust_ID','Ratings'],usecols=[0,1])
df.head(5)

In [None]:
#to calculate:
#Movie count
#customer count
#ratings count
#stars count- count of movies for each rating (Value_count)

**EDA**

In [None]:
df.isnull().sum()

In [None]:
movie_count=df.isnull().sum()['Ratings']
movie_count

In [None]:
df['Cust_ID'].nunique()

In [None]:
#Customer_ID column has movie count as well
customer_count=(df['Cust_ID'].nunique()) - movie_count
customer_count

In [None]:
ratings_count=(df['Ratings'].count())-movie_count
ratings_count

In [None]:
stars_count=df['Ratings'].value_counts()
stars_count

## **Data Cleaning**

In [None]:
#update dataframe with a new col name movie_list where all rows under 1: have this col updated with 1
movie_id=None
movie_list=[]

for cust in df['Cust_ID']:
  if ":" in cust:
    movie_id= int(cust.replace(":",""))
  movie_list.append(movie_id)


In [None]:
df['Movie_ID']=movie_list
df.head(5)

In [None]:
df.dropna(inplace=True) # dropping rows with Nulls i.e., 1: 2:
df.info()

In [None]:
df['Cust_ID']=df['Cust_ID'].astype(int)
df.info()

In [None]:
# less no.of rating (count)-> drop those movie
# less no.of ratings by customer -> drop the customers

In [None]:
movie_rat_count=df['Movie_ID'].value_counts()

In [None]:
benchmark1=round(movie_rat_count.quantile(0.6),0)
benchmark1

In [None]:
#drop movies which have rating count less than this benchmark
drop_movie_index=movie_rat_count[movie_rat_count<benchmark1].index
drop_movie_index

In [None]:
#drop customers who gave very less number of ratings
cust_rate_count=df['Cust_ID'].value_counts()

In [None]:
benchmark2=round(cust_rate_count.quantile(0.6),0)
benchmark2

In [None]:
drop_cust_index=cust_rate_count[cust_rate_count<benchmark2].index
drop_cust_index

In [None]:
df=df[~df['Movie_ID'].isin(drop_movie_index)]
df=df[~df['Cust_ID'].isin(drop_cust_index)]

In [None]:
movie_title=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Netflix/Copy of movie_titles.csv',encoding='ISO-8859-1',header=None,names=['Movie_ID','Year','Name'],usecols=[0,1,2])
movie_title.head(5)

In [None]:
movie_title.info()

**SVD**

In [None]:
reader=Reader()

In [None]:
data=Dataset.load_from_df(df[['Movie_ID','Cust_ID','Ratings']][:100000],reader)
data

In [None]:
model=SVD()

In [None]:
cross_validate(model,data,measures=['RMSE'],cv=3,verbose=True)

In [None]:
#1331154
user_1331154=df[df['Cust_ID']==1331154]
user_1331154.head(5)

In [None]:
dummy_title=movie_title.copy()

In [None]:
dummy_title=dummy_title[~dummy_title['Movie_ID'].isin(drop_movie_index)]
dummy_title.head(5)

In [None]:
data=Dataset.load_from_df(df[['Movie_ID','Cust_ID','Ratings']][:100000],reader)
trainset=data.build_full_trainset()
model.fit(trainset)

In [None]:
#the ratings user 1331154 will give for all the movies in the movie list

est_rating=[]
for movie in dummy_title['Movie_ID']:
  rating=model.predict(1331154,movie).est  #pred obj returns more info # .est will give the predicted rating out of 5
  est_rating.append(rating)

dummy_title['Rating']=est_rating
dummy_title.head(5)

In [None]:
top_ratings_user1331154=dummy_title.sort_values('Rating',ascending=False).head(5)
top_ratings_user1331154.drop(['Year','Movie_ID'],axis=1,inplace=True)
top_ratings_user1331154

In [None]:
#most popular movies (by number of ratings)
top10 = df.groupby("Movie_ID")["Ratings"].count().sort_values(ascending=False).head(10) #returns a Series not a DF
top10 = top10.reset_index()
top10.columns = ["Movie_ID", "Ratings_Count"]
top10_popular_movies=movie_title.merge(top10,on="Movie_ID").sort_values("Ratings_Count",ascending=False)
top10_popular_movies #based on Ratings count

In [None]:
#highly rated movies
top10_highly_rated=df.groupby("Movie_ID")["Ratings"].mean().sort_values(ascending=False).head(10)
top10_highly_rated.reset_index()
top10_highly_rated.columns=['Movie_ID','Ratings']
top10_highly_rated_movies=movie_title.merge(top10_highly_rated,on="Movie_ID").sort_values("Ratings",ascending=False)
top10_highly_rated_movies #based on Ratings

In [None]:
#least rated movies
least_rated=df.groupby("Movie_ID")["Ratings"].mean().sort_values(ascending=False).tail(10)
least_rated.reset_index()
least_rated.columns=['Movie_ID','Ratings']
least_rated_movies=movie_title.merge(least_rated,on="Movie_ID").sort_values("Ratings",ascending=False)
least_rated_movies

## Conclusion

In this project, a collaborative filtering recommendation system was developed using matrix factorization (Funk SVD) through the Surprise library.

The model learned latent user and item representations by optimizing rating prediction error using stochastic gradient descent. These latent factors captured hidden preference patterns that were not explicitly defined in the dataset.

The system successfully predicted unseen ratings and generated personalized Top-N recommendations. Model performance was evaluated using RMSE, demonstrating the effectiveness of matrix factorization for sparse user-item interaction data.

### Business Impact
A well-implemented recommendation system can:
- Increase user watch time
- Improve content discovery
- Enhance personalization
- Reduce churn
- Drive subscription retention and revenue growth
Effective recommendation systems are a core competitive advantage for streaming platforms.
