# 📺 Netflix Recommendation System

This is a basic content-based recommendation system built using the **Netflix Movies and TV Shows** dataset. The goal is to help users discover similar titles based on features like title, cast, director, genre, and description.

## 🔍 Dataset

- **Source:** `netflix_titles.csv`
- **Columns:** Title, Director, Cast, Country, Release Year, Rating, Duration, Listed In (Genre), Description

## 🧰 Tools Used

- **Pandas** for data manipulation and loading

## 📦 Step 1: Load the Dataset

The first step is loading the dataset into a pandas DataFrame:


In [26]:
import pandas as pd

df = pd.read_csv("netflix_titles.csv")
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


## 🧱 Step 2: Create Textual Representation

To build a content-based recommendation system, we first need to represent each title as a single piece of text that includes relevant metadata such as type, title, director, cast, genres, and description. This function combines all these fields into a structured string format for each row:


In [27]:
def create_textual_representation (row):
    textual_representation = f"""
Type: {row['type']},
Title: {row['title']},
Director: {row['director']},
Cast: {row['cast']},
Released: {row['release_year']},
Genres: {row["listed_in"]},

Description: {row["description"]}

"""
    return textual_representation

In [28]:
df["textual_representations"] = df.apply(create_textual_representation , axis=1)

In [29]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,textual_representations
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...","\nType: Movie,\nTitle: Dick Johnson Is Dead,\n..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...","\nType: TV Show,\nTitle: Blood & Water,\nDirec..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,"\nType: TV Show,\nTitle: Ganglands,\nDirector:..."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...","\nType: TV Show,\nTitle: Jailbirds New Orleans..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,"\nType: TV Show,\nTitle: Kota Factory,\nDirect..."


In [30]:
print(df.loc[0]["textual_representations"])


Type: Movie,
Title: Dick Johnson Is Dead,
Director: Kirsten Johnson,
Cast: nan,
Released: 2020,
Genres: Documentaries,

Description: As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.




## 🧠 Step 3: Generate Embeddings with Gemini API

We use the **Gemini Embedding API** to convert each show's textual representation into a numerical vector. These embeddings are 768-dimensional and allow us to compute semantic similarity between shows.

### ⚙️ Setup

- `FAISS` is used for fast similarity search.
- Each show is embedded using Gemini's `embedding-001` model.
- Vectors are stored in a FAISS `IndexFlatL2` index.

**Note:** You must replace the `GEMINI_API_KEY` with your actual API key.


In [31]:
import faiss
import requests
import numpy as np

dim = 4096
index = faiss.IndexFlatL2(dim)
X = np.zeros((len(df["textual_representations"]),dim),dtype="float32")

In [None]:
import faiss
import requests
import numpy as np

# Gemini API setup
dim = 768  # Gemini text embeddings are 768-dimensional
index = faiss.IndexFlatL2(dim)
X = np.zeros((len(df["textual_representations"]), dim), dtype="float32")

# Gemini API configuration
GEMINI_API_KEY = "GEMINI_API_KEY"  # Replace with your actual API key
GEMINI_API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent?key={GEMINI_API_KEY}"

def get_embedding(text):
    payload = {
        "model": "models/embedding-001",
        "content": {
            "parts": [{"text": text}]
        }
    }
    
    headers = {
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(GEMINI_API_URL, headers=headers, json=payload)
        if response.status_code == 200:
            result = response.json()
            embedding = result["embedding"]["values"]
            return embedding
        else:
            print(f"Error: {response.status_code}, {response.text}")
            return None
    except Exception as e:
        print(f"Request failed: {e}")
        return None

# Your loop
for i, representation in enumerate(df['textual_representations']):
    if i % 30 == 0:
        print(f'Processed {i}/{len(df)} instances')
    
    embedding = get_embedding(representation)
    if embedding is not None:
        embedding_array = np.array(embedding, dtype=np.float32)
        if len(embedding_array) == dim:
            X[i] = embedding_array
        else:
            print(f"Warning: Embedding dimension mismatch at index {i}. Expected {dim}, got {len(embedding_array)}")
    else:
        print(f"Failed to get embedding for index {i}")

index.add(X)
print(f"Index now contains {index.ntotal} vectors")

Processed 0/8807 instances
Processed 30/8807 instances
Processed 60/8807 instances
Processed 90/8807 instances
Processed 120/8807 instances
Processed 150/8807 instances
Processed 180/8807 instances
Processed 210/8807 instances
Processed 240/8807 instances
Processed 270/8807 instances
Processed 300/8807 instances
Processed 330/8807 instances
Processed 360/8807 instances
Processed 390/8807 instances
Processed 420/8807 instances
Processed 450/8807 instances
Processed 480/8807 instances
Processed 510/8807 instances
Processed 540/8807 instances
Processed 570/8807 instances
Processed 600/8807 instances
Processed 630/8807 instances
Processed 660/8807 instances
Processed 690/8807 instances
Processed 720/8807 instances
Processed 750/8807 instances
Processed 780/8807 instances
Processed 810/8807 instances
Processed 840/8807 instances
Processed 870/8807 instances
Processed 900/8807 instances
Processed 930/8807 instances
Processed 960/8807 instances
Processed 990/8807 instances
Processed 1020/8807

## 💾 Step 4: Save the FAISS Index

After building the FAISS index with all the vector embeddings, we save it to disk so it can be reused without recomputing the embeddings.

In [42]:
faiss.write_index(index,"index")

## 📂 Step 5: Load the FAISS Index

To perform recommendations without rebuilding the entire index, we can load the saved FAISS index from disk using `faiss.read_index()`.

In [43]:
index = faiss.read_index('index')

In [47]:
favorite_movie = df.iloc[1358]

## 🎯 Step 6: Recommend Similar Titles

To recommend similar shows or movies:

1. Use **Gemini** to embed the favorite movie's textual representation.
2. Perform a **vector similarity search** using FAISS.
3. Retrieve and display the top 5 similar titles.

In [54]:
import google.generativeai as genai

genai.configure(api_key="AIzaSyA2Umg0zYQAqCIQxyPJYCD5cgZjqpodEZg")

# Use genai.embed_content() directly, not model.embed_content()
response = genai.embed_content(
    model="models/embedding-001",
    content=favorite_movie['textual_representations'],
    task_type="retrieval_document"  # or "retrieval_query", etc.
)

# Access the embedding from the response
embedding = response['embedding']
embedding = np.array([response["embedding"]],dtype="float32")


In [57]:
D, I = index.search(embedding,5)
best_matches = np.array(df["textual_representations"])[(I.flatten())]
for match in best_matches:
    print("Next Movie:")
    print(match)
    print("--------------------------------------------------------------------------")

Next Movie:

Type: Movie,
Title: Shutter Island,
Director: Martin Scorsese,
Cast: Leonardo DiCaprio, Mark Ruffalo, Ben Kingsley, Max von Sydow, Michelle Williams, Emily Mortimer, Patricia Clarkson, Jackie Earle Haley, Ted Levine, John Carroll Lynch, Elias Koteas,
Released: 2010,
Genres: Thrillers,

Description: A U.S. marshal's troubling visions compromise his investigation into the disappearance of a patient from a hospital for the criminally insane.


--------------------------------------------------------------------------
Next Movie:

Type: Movie,
Title: The Departed,
Director: Martin Scorsese,
Cast: Leonardo DiCaprio, Matt Damon, Jack Nicholson, Mark Wahlberg, Martin Sheen, Ray Winstone, Vera Farmiga, Anthony Anderson, Alec Baldwin, Kevin Corrigan,
Released: 2006,
Genres: Dramas, Thrillers,

Description: Two rookie Boston cops are sent deep undercover – one inside the gang of a charismatic Irish mob boss and the other double-crossing his own department.


------------------------

## 🎬 Step 7: Test the Recommender with a Custom Movie

Instead of selecting from the dataset, we can manually input a movie description and find similar titles. Here we test the system using the movie **IT (2017)**

In [61]:
test_movie = f"""
Type: 'Movie',
Title: 'IT',
Director: Andy Muschietti,
Cast: 'Bill Skarsgard , Jaeden Martell , Finn Wolfhard',
Released: '2017',
Genres: 'Monster Horror , Horror',

Description: 'In the summer of 1989, a group of bullied kids band together to destroy a shape-shifting monster, which disguises itself as a clown and preys on the children of Derry, their small Maine town.'
"""

In [62]:
import google.generativeai as genai

genai.configure(api_key="AIzaSyA2Umg0zYQAqCIQxyPJYCD5cgZjqpodEZg")

# Use genai.embed_content() directly, not model.embed_content()
response = genai.embed_content(
    model="models/embedding-001",
    content=test_movie,
    task_type="retrieval_document"  # or "retrieval_query", etc.
)

# Access the embedding from the response
embedding = response['embedding']
embedding = np.array([response["embedding"]],dtype="float32")

D, I = index.search(embedding,5)
best_matches = np.array(df["textual_representations"])[(I.flatten())]
for match in best_matches:
    print("Next Movie:")
    print(match)
    print("--------------------------------------------------------------------------")

Next Movie:

Type: Movie,
Title: Hubie Halloween,
Director: Steve Brill,
Cast: Adam Sandler, Kevin James, Julie Bowen, Ray Liotta, Steve Buscemi, Maya Rudolph, Rob Schneider, June Squibb, Kenan Thompson, Tim Meadows, Michael Chiklis, Karan Brar, George Wallace, Paris Berelc, Noah Schnapp, China Anne McClain, Colin Quinn, Kym Whitley, Lavell Crawford, Mikey Day, Jackie Sandler, Sadie Sandler, Sunny Sandler,
Released: 2020,
Genres: Comedies, Horror Movies,

Description: Hubie's not the most popular guy in Salem, Mass., but when Halloween turns truly spooky, this good-hearted scaredy-cat sets out to keep his town safe.


--------------------------------------------------------------------------
Next Movie:

Type: Movie,
Title: House of the Witch,
Director: Alex Merkin,
Cast: Emily Bader, Darren Mann, Michelle Randolph, Coy Stewart, Jesse Pepe, Arden Belle, Joel Nagle, Nolan Bateman,
Released: 2017,
Genres: Horror Movies,

Description: A group of daring teens finds themselves in a fight fo