A content-based recommendation system for Netflix titles that tackles the similarity-diversity paradox — delivering recommendations that are both relevant and varied, avoiding the "echo chamber" effect common in traditional recommenders.
Traditional content-based recommenders maximize cosine similarity, which results in highly homogeneous suggestions. This system introduces a tunable re-ranking mechanism that balances relevance with diversity, giving users more engaging and explorative recommendations.
Based on the research: "Improving Recommendation Diversity with Tunable Re-Ranking Techniques" — Arvin Subramanian (2025)
The pipeline processes ~45,000 Netflix movies and TV shows through the following stages:
Content Catalog → Feature Engineering → TF-IDF Vectorization → Cosine Similarity → Re-Ranking → Top-N Recommendations
Each candidate item is scored as:
Score(i) = Similarity(i, seed) - λ × Overlap_Penalty(i)
- Similarity — cosine similarity between the candidate and seed item (TF-IDF vectors)
- λ (diversity_weight) — tunable parameter controlling the relevance/diversity trade-off
- Overlap_Penalty — penalizes genre and keyword overlap with already-selected items
- 📄 TF-IDF Vectorization of overview, genres, and keywords
- 🎛️ Tunable diversity via a single
diversity_weightparameter - 📊 Multi-faceted evaluation — cosine similarity, genre diversity index, keyword diversity index
- 🖥️ Interactive Streamlit UI for real-time configuration
git clone https://github.com/arvinaiengineer/netflix.git
cd netflix
pip install -r requirements.txtpandas
scikit-learn
streamlit
numpy
streamlit run app.pyThis project uses a Netflix titles dataset with the following fields:
title— movie/show nameoverview— plot descriptiongenres— comma-separated genre listkeywords— associated tags/keywords
A compatible dataset can be found on Kaggle: Netflix Movies and TV Shows.
- Relies solely on content metadata (no user interaction data)
- Static TF-IDF representations (no semantic embeddings)
- Evaluated on offline metrics only
- Neural embeddings (BERT / sentence transformers)
- Reinforcement learning for adaptive diversity weights
- A/B testing in live environments
- Multi-stakeholder optimization
- Abdollahpouri, H. (2022). User-centered evaluation of recommender systems beyond accuracy. ACM SIGIR Forum.
- Zhang, Y., Chen, X., & Wang, L. (2023). Diversity-aware re-ranking strategies for recommender systems. IEEE Transactions on Knowledge and Data Engineering.
MIT License — feel free to use, modify, and distribute.