# 🎬 Movie Recommendation System

This project recommends similar movies based on genres using Natural Language Processing.

### ✅ What it does:
- Cleans genre text
- Uses **TF-IDF Vectorizer** to extract features
- Computes **cosine similarity** between movies
- Recommends **Top 5 similar movies**

### 📁 Dataset:
- `movies.csv` from MovieLens (includes title, genres)

### 🔍 Example:
Query → `Toy Story (1995)`  
Output → Movies like `Toy Story 2 (1999)`, `Antz (1998)`, etc.

---

Made with ❤️ using **pandas**, **scikit-learn**, and **Jupyter Notebook**


## 🎯 Step 1: Importing Essential Libraries

To build our movie recommendation engine, we first import necessary libraries:
- `pandas` to handle data.
- `TfidfVectorizer` to convert text data (genres) into meaningful numerical features.
- `cosine_similarity` to measure how similar two movies are based on their genres.


In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## 📥 Step 2: Load and Explore the Dataset

We load our movie data from a CSV file. Let’s take a quick look at the first 5 movies to understand the structure.
- `movieId`: Unique ID for each movie.
- `title`: Name of the movie.
- `genres`: Pipe-separated genres associated with each movie.


In [None]:
# Load the dataset
movies = pd.read_csv("data/movies.csv")

# Show the first 5 rows
movies.head()


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## 🧹 Step 3: Clean Genre Data for Better Analysis

Genres are originally separated by `|`, which makes them hard to analyze.
We replace the `|` with spaces so that our vectorizer treats each genre as a separate word (feature).


In [None]:
# Replace '|' with space to treat each genre as a separate word
movies['genres'] = movies['genres'].str.replace('|', ' ', regex=False)
movies[['title', 'genres']].head()


Unnamed: 0,title,genres
0,Toy Story (1995),Adventure Animation Children Comedy Fantasy
1,Jumanji (1995),Adventure Children Fantasy
2,Grumpier Old Men (1995),Comedy Romance
3,Waiting to Exhale (1995),Comedy Drama Romance
4,Father of the Bride Part II (1995),Comedy


### Convert Movie Genres into TF-IDF Features

In this step, we convert the movie genres, which are text data, into numerical features using the TF-IDF (Term Frequency-Inverse Document Frequency) technique. 

- **TF-IDF Vectorizer** transforms the genre text by weighing the importance of each word, reducing the impact of very common words (like stop words).
- We use `stop_words='english'` to automatically remove common English stop words.
- The result is a sparse matrix (`tfidf_matrix`) where each row corresponds to a movie and each column corresponds to a unique genre term weighted by its TF-IDF score.

Finally, we print the shape of this matrix to see how many movies and unique genre terms we have.


In [None]:
# Convert genre text to TF-IDF features
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])

# Show the shape of the TF-IDF matrix
print(tfidf_matrix.shape)


(9742, 23)


### Compute Similarity Between Movie Genres Using Cosine Similarity

Now that we have transformed the movie genres into TF-IDF feature vectors, we calculate the similarity between all movies based on these features.

- **Cosine similarity** measures the cosine of the angle between two vectors in a multi-dimensional space.
- A higher cosine similarity (closer to 1) means the movies share more similar genres.
- This results in a similarity matrix (`cosine_sim`), where each entry `[i, j]` represents the similarity score between movie `i` and movie `j`.


In [None]:
# Compute similarity between all movie genres
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


### Create a Mapping from Movie Titles to DataFrame Indices

To quickly access movie information by title, we create a mapping (`indices`) from each movie title to its corresponding index in the DataFrame.

- This allows us to retrieve the index of a movie using its title efficiently.
- We use `drop_duplicates()` to ensure each title maps to only one unique index, in case there are duplicate titles.


In [None]:
# Map movie titles to their index
indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()


### Movie Recommendation Function Based on Genre Similarity

This function `recommend_movie` takes a movie title as input and returns the top 5 movies with the most similar genres.

- It first checks if the movie title exists in the dataset.
- Finds the index of the given movie.
- Retrieves the similarity scores for that movie from the cosine similarity matrix.
- Sorts the movies based on similarity scores in descending order.
- Skips the first result since it’s the movie itself, then selects the next top 5 most similar movies.
- Returns the titles of these recommended movies.

This function helps users discover movies with similar genres to their chosen movie.


In [None]:
# Function to recommend movies
def recommend_movie(title, cosine_sim=cosine_sim):
    if title not in indices:
        return "Movie not found."

    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:6]  # Top 5 similar (skip the movie itself)
    movie_indices = [i[0] for i in sim_scores]

    return movies['title'].iloc[movie_indices]


### Example: Get Movie Recommendations

Let’s test our recommendation function with a popular movie — **"Toy Story (1995)"** — to see the top 5 movies with similar genres.

```python
recommend_movie("Toy Story (1995)")


In [None]:
# Try with any movie name from your dataset
recommend_movie("Toy Story (1995)")


1706                                       Antz (1998)
2355                                Toy Story 2 (1999)
2809    Adventures of Rocky and Bullwinkle, The (2000)
3000                  Emperor's New Groove, The (2000)
3568                             Monsters, Inc. (2001)
Name: title, dtype: object

In [None]:
recommend_movie("Jumanji (1995)")
recommend_movie("Matrix, The (1999)")


68           Screamers (1995)
144    Johnny Mnemonic (1995)
296         Virtuosity (1995)
336            Timecop (1994)
474       Blade Runner (1982)
Name: title, dtype: object

### Interactive Movie Recommendation

You can enter any movie title from the dataset to get personalized recommendations. 

- The input prompt asks for a movie title (e.g., *Titanic (1997)*).
- The `recommend_movie` function then returns the top 5 movies with similar genres.
- If the movie title is not found, it notifies you accordingly.

```python
user_input = input("Enter a movie title (e.g., Titanic (1997)): ")
print(recommend_movie(user_input))


In [None]:
user_input = input("Enter a movie title (e.g., Titanic (1997)): ")
print(recommend_movie(user_input))


Enter a movie title (e.g., Titanic (1997)):  Matrix, The (1999)


68           Screamers (1995)
144    Johnny Mnemonic (1995)
296         Virtuosity (1995)
336            Timecop (1994)
474       Blade Runner (1982)
Name: title, dtype: object
