# 📺 Netflix Recommendation System Project

## **Project Overview**
In this project, we are building a **Netflix-style movie recommendation system** using real-world data. The goal is to predict **which movies a user might like** based on their past ratings and the ratings of other users.

We will focus on:

1. **Data Understanding & Cleaning**  
   - The dataset comes from the **Netflix Prize competition**, containing millions of ratings.  
   - Data is messy: movie IDs are listed separately, ratings are in text format, and there are missing or duplicate values.  

2. **Exploratory Data Analysis (EDA)**  
   - Understand the distribution of ratings.  
   - Find top-rated movies and active users.  
   - Visualize sparsity of the user-movie matrix.  

3. **Building the Recommendation System**  
   - Use **Model-based Collaborative Filtering**  technique.  
   - Implement **SVD (Singular Value Decomposition)** to handle sparse user-movie matrices.  
   - Use the **Surprise library** for easy model training, prediction, and evaluation.  

4. **Evaluation**  
   - Evaluate the model performance using **RMSE (Root Mean Squared Error)**.  
   - Test how well the system predicts ratings for unseen movies.  

5. **Generating Recommendations**  
   - Recommend top movies to a user they haven’t watched yet.  
   - Optionally, compare recommendations for multiple users.  

---

## **Key Concepts in This Project**
- **User-Item Matrix:** Table where rows are users, columns are movies, and cells are ratings.  
- **Collaborative Filtering:** Recommending movies based on **similar users**.  
- **SVD:** Reduces a large, sparse matrix into **latent factors** representing user and movie features.  
- **Surprise Library:** Python library to simplify **building and evaluating recommendation systems**.  

---

## **Why This Project is Useful**
- Real-world applications: Netflix, Amazon, Spotify, YouTube, etc.  
- Helps understand **how big tech recommends content** using user behavior.  
- Introduces **data cleaning, matrix factorization, and model evaluation** in a real dataset.  

---



In [122]:
import pandas as pd
import matplotlib_inline as plt
import seaborn as sns

In [123]:
df = pd.read_csv('combined_data_1.txt.zip', header=None, names = ['Cust_Id', 'Rating'], usecols = [0,1])
df

Unnamed: 0,Cust_Id,Rating
0,1:,
1,1488844,3.0
2,822109,5.0
3,885013,4.0
4,30878,4.0
...,...,...
24058258,2591364,2.0
24058259,1791000,2.0
24058260,512536,5.0
24058261,988963,3.0


In [124]:
df['Movie_Id'] =df['Cust_Id'].where(df['Cust_Id'].str.endswith(':'), pd.NA)

In [125]:
df['Movie_Id']= df['Movie_Id'].ffill()

In [126]:
df.dropna(inplace=True)

In [127]:
df['Movie_Id']= df['Movie_Id'].str.replace(':','')

In [128]:
df = df.astype('int')

In [129]:
df

Unnamed: 0,Cust_Id,Rating,Movie_Id
1,1488844,3,1
2,822109,5,1
3,885013,4,1
4,30878,4,1
5,823519,3,1
...,...,...,...
24058258,2591364,2,4499
24058259,1791000,2,4499
24058260,512536,5,4499
24058261,988963,3,4499


In [130]:
24058263 - 24053764 

4499