# 🎬 Netflix Content-Based Recommendation System ✨

> _"Where your next favorite show finds you automatically!"_

---

**👨‍💻 Author:** Godlove SHIMA
**📅 Date:** 20th/05/2025
**🛠 Tools Used:**  
`Python` | `Pandas` | `Scikit-learn (sklearn)` | `TF-IDF` | `Cosine Similarity`

**📂 Dataset:** Netflix TV Shows and Movies Metadata

---

## 🎯 Project Objective
> Build a lightweight recommendation system that suggests similar shows/movies based on their metadata (title, cast, director, genre, and description).

---

## 🧰 Key Techniques
- 🍲 Metadata Soup Creation
- ✍️ Text Preprocessing
- 🧠 TF-IDF Vectorization
- 📈 Cosine Similarity Computation
- 🛠 Building a Custom Recommendation Function

---

## 🌟 Expected Outcome
> A system that recommends **10 similar TV shows/movies** when you input your favorite title!

---




# 📑 Table of Contents

1. [Load and Inspect the Data](#load-and-inspect-the-data)
2. [Data Preprocessing](#data-preprocessing)
3. [TF-IDF Vectorization](#tf-idf-vectorization)
4. [Compute Cosine Similarity](#compute-cosine-similarity)
5. [Mapping Title Indices](#mapping-title-indices)
6. [Build the Recommendation Function](#build-the-recommendation-function)
7. [Test the System](#test-the-system)
8. [Conclusion and Reflections](#conclusion-and-reflections)




# 📚 Project Overview

Welcome to my **Netflix Recommendation System** project!  
In this project, I explored a dataset of Netflix titles to build a simple, content-based recommendation system.

By combining metadata such as **title**, **director**, **cast**, **genres**, and **description** into a "content profile," I applied Natural Language Processing (NLP) techniques like **TF-IDF Vectorization** and **Cosine Similarity** to find shows or movies similar to a given title.

🎯 **Main Goal:**  
Create a functional and understandable recommender system that can suggest Netflix titles based on content similarity.

🛠️ **Skills Practiced:**  
- Data Preprocessing
- TF-IDF Vectorization
- Cosine Similarity Computation
- Building a Custom Recommendation Function

✨ **Why This Project Matters:**  
This project helped me practice important data science skills and offered a fun introduction to building recommender systems from scratch!

---

📅 **Date Completed:** 20th April 2025  
👨‍💻 **Author:** Godlove SHIMA


# 🚀 Let's Get Started!

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


# Load and Inspect the Data

In [2]:

netglitch = pd.read_csv('C:/Users/user/Desktop/netflix_titles.csv')

netglitch.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


**Observation**:  
Here we check the basic structure of the dataset and identify which columns are useful for building our content profiles.


# Data Preprocessing

In [5]:

netglitch['description'] = netglitch['description'].fillna('')

netglitch['content_profile'] = netglitch['title'] + ' ' + netglitch['director'] + ' ' + netglitch['cast'] + ' ' + netglitch['listed_in'] + ' ' + netglitch['description']

netglitch['content_profile'] = netglitch['content_profile'].str.lower()


**Note**:  
We combine several text fields into one 'content_profile' and lowercase everything to ensure the model doesn't treat "Drama" and "drama" as different.


# TF-IDF Vectorization

In [9]:
# from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
netglitch['content_profile'] = netglitch['content_profile'].fillna(' ')
tfidf_matrix = tfidf.fit_transform(netglitch['content_profile'])
tfidf_matrix.shape

(8807, 42824)

# Compute Cosine Similarity

In [10]:
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


# Mapping Title Indices

In [11]:
netglitch = netglitch.reset_index()
indices = pd.Series(netglitch.index, index=netglitch['title'])


# Build the Recommendation Function

In [18]:
import pandas as pd

# Reset index to map titles to rows
netglitch = netglitch.reset_index()

# Create a Series to map title to index
indices = pd.Series(netglitch.index, index=netglitch['title'])

def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]

    # Get pairwise similarity scores for this title
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort movies by similarity score (high to low)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get top 10 similar shows/movies excluding itself
    sim_scores = sim_scores[1:11]

    # Get the indices of the most similar shows
    show_indices = [i[0] for i in sim_scores]

    # Return the top 10 recommended titles
    return netglitch['title'].iloc[show_indices]


# Test the System

In [30]:
recommendation = get_recommendations('Black Mirror')
print("Because you watched 'Black Mirror, you may also watch:\n")
recommendation


Because you watched 'Black Mirror, you may also watch:



1                           Blood & Water
2                               Ganglands
3                   Jailbirds New Orleans
4                            Kota Factory
5                           Midnight Mass
6        My Little Pony: A New Generation
7                                 Sankofa
8           The Great British Baking Show
9                            The Starling
10    Vendetta: Truth, Lies and The Mafia
Name: title, dtype: object

In [29]:
recommendation = get_recommendations('Sankofa')
print("Because you enjoyed 'Sankofa, you may also like:\n")
recommendation

Because you enjoyed 'Sankofa, you may also like:



8238        The Carter Effect
5044        When We First Met
8425                The Model
7037            I Am Jane Doe
3094                  The App
5689    Reggie Watts: Spatial
7904        Running for Grace
227               Really Love
6510             Coach Carter
7976    Secrets of Selfridges
Name: title, dtype: object

In [28]:
recommendation = get_recommendations('The Starling')
print("Since you liked 'The Starling, you may also enjoy:\n")
recommendation

Since you liked 'The Starling, you may also enjoy:



604                  The Life of David Gale
5596                        Growing Up Wild
3733    Adam Devine: Best Time of Our Lives
3957                               The Trap
1088                          Thunder Force
2035                     The Social Dilemma
5326          Colin Quinn: Unconstitutional
614                    What Dreams May Come
7827                   Ram Teri Ganga Maili
797                                I Am Sam
Name: title, dtype: object

# Conclusion and Reflections


# Conclusion
✅ We successfully built a content-based recommendation engine!  
It suggests shows based on textual metadata similarities.

# Future Improvements
- Include more metadata (e.g., release year, ratings).
- Try hybrid models (combine content and user ratings).

# Thank You! 🙌

