# **Movie Recommender System**

Welcome to the **Movie Recommender System** project!  
In this notebook, we’ll build a **hybrid recommender system** that combines:

- **Content-Based Filtering** (recommend movies based on genre similarity).
- **Collaborative Filtering** (recommend movies based on user rating patterns).

---

###  Project Goal

- **Input:** A movie title (e.g., *"Toy Story (1995)"*).  
- **Output:** Top 5 recommended movies.

---

###  Dataset: MovieLens 100k

We’ll use the **MovieLens 100k dataset**, which contains:

- **100,000 ratings**
- **943 users**
- **1682 movies**

This dataset is widely used in recommender system research and benchmarks.

---

###  Roadmap

1. **Setup & Dataset** (load + clean data)
2. **Exploratory Data Analysis (EDA)**
3. **Content-Based Filtering**
4. **Collaborative Filtering**
5. **Hybrid Recommender**
6. **Evaluation**



## **Setup & Dataset**

In this step, we will:

- Load the **ratings file (`u.data`)**
- Load the **movies file (`u.item`)**
- Merge them into a single DataFrame
- Perform a quick sanity check



In [1]:
import pandas as pd
import numpy as np


### Loading  Data .


In [3]:
ratings_columns = ["user_id", "movie_id", "rating", "timestamp"]

ratings = pd.read_csv(
    "data/u.data", 
    sep="\t", 
    names=ratings_columns, 
    encoding="latin-1"
)

print("Ratings DataFrame shape:", ratings.shape)
ratings.head()


Ratings DataFrame shape: (100000, 4)


Unnamed: 0,user_id,movie_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [4]:
movies_columns = [
    "movie_id", "title", "release_date", "video_release_date", "IMDb_URL",
    "unknown", "Action", "Adventure", "Animation", "Children", "Comedy", 
    "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", 
    "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"
]

movies = pd.read_csv(
    "data/u.item", 
    sep="|", 
    names=movies_columns, 
    encoding="latin-1"
)

print("Movies DataFrame shape:", movies.shape)
movies.head()


Movies DataFrame shape: (1682, 24)


Unnamed: 0,movie_id,title,release_date,video_release_date,IMDb_URL,unknown,Action,Adventure,Animation,Children,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


### Merge Ratings and Movies

We’ll join both DataFrames on **`movie_id`** so each rating has a corresponding movie title.


In [5]:
data = pd.merge(ratings, movies[["movie_id", "title"]], on="movie_id")

print("Merged DataFrame shape:", data.shape)
data.head(10)


Merged DataFrame shape: (100000, 5)


Unnamed: 0,user_id,movie_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)
5,298,474,4,884182806,Dr. Strangelove or: How I Learned to Stop Worr...
6,115,265,2,881171488,"Hunt for Red October, The (1990)"
7,253,465,5,891628467,"Jungle Book, The (1994)"
8,305,451,3,886324817,Grease (1978)
9,6,86,3,883603013,"Remains of the Day, The (1993)"


In [6]:
#Quick Stats
print("Unique users:", data['user_id'].nunique())
print("Unique movies:", data['movie_id'].nunique())
print("Average rating:", round(data['rating'].mean(), 2))


Unique users: 943
Unique movies: 1682
Average rating: 3.53
