🎬 Netflix Content Recommender with Diversity Re-Ranking

A content-based recommendation system for Netflix titles that tackles the similarity-diversity paradox — delivering recommendations that are both relevant and varied, avoiding the "echo chamber" effect common in traditional recommenders.

Overview

Traditional content-based recommenders maximize cosine similarity, which results in highly homogeneous suggestions. This system introduces a tunable re-ranking mechanism that balances relevance with diversity, giving users more engaging and explorative recommendations.

Based on the research: "Improving Recommendation Diversity with Tunable Re-Ranking Techniques" — Arvin Subramanian (2025)

How It Works

The pipeline processes ~45,000 Netflix movies and TV shows through the following stages:

Content Catalog → Feature Engineering → TF-IDF Vectorization → Cosine Similarity → Re-Ranking → Top-N Recommendations

Scoring Formula

Each candidate item is scored as:

Score(i) = Similarity(i, seed) - λ × Overlap_Penalty(i)

Similarity — cosine similarity between the candidate and seed item (TF-IDF vectors)
λ (diversity_weight) — tunable parameter controlling the relevance/diversity trade-off
Overlap_Penalty — penalizes genre and keyword overlap with already-selected items

Features

📄 TF-IDF Vectorization of overview, genres, and keywords
🎛️ Tunable diversity via a single diversity_weight parameter
📊 Multi-faceted evaluation — cosine similarity, genre diversity index, keyword diversity index
🖥️ Interactive Streamlit UI for real-time configuration

Installation

git clone https://github.com/arvinaiengineer/netflix.git
cd netflix
pip install -r requirements.txt

Requirements

pandas
scikit-learn
streamlit
numpy

Usage

Run the Streamlit App

streamlit run app.py

Dataset

This project uses a Netflix titles dataset with the following fields:

title — movie/show name
overview — plot description
genres — comma-separated genre list
keywords — associated tags/keywords

A compatible dataset can be found on Kaggle: Netflix Movies and TV Shows.

Limitations

Relies solely on content metadata (no user interaction data)
Static TF-IDF representations (no semantic embeddings)
Evaluated on offline metrics only

Future Work

Neural embeddings (BERT / sentence transformers)
Reinforcement learning for adaptive diversity weights
A/B testing in live environments
Multi-stakeholder optimization

References

Abdollahpouri, H. (2022). User-centered evaluation of recommender systems beyond accuracy. ACM SIGIR Forum.
Zhang, Y., Chen, X., & Wang, L. (2023). Diversity-aware re-ranking strategies for recommender systems. IEEE Transactions on Knowledge and Data Engineering.

License

MIT License — feel free to use, modify, and distribute.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
artifacts		artifacts
LICENSE		LICENSE
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Netflix Content Recommender with Diversity Re-Ranking

Overview

How It Works

Scoring Formula

Features

Installation

Requirements

Usage

Run the Streamlit App

Dataset

Limitations

Future Work

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Netflix Content Recommender with Diversity Re-Ranking

Overview

How It Works

Scoring Formula

Features

Installation

Requirements

Usage

Run the Streamlit App

Dataset

Limitations

Future Work

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages