Skip to content

ArvinAIEngineer/Netflix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Netflix Content Recommender with Diversity Re-Ranking

A content-based recommendation system for Netflix titles that tackles the similarity-diversity paradox — delivering recommendations that are both relevant and varied, avoiding the "echo chamber" effect common in traditional recommenders.


Overview

Traditional content-based recommenders maximize cosine similarity, which results in highly homogeneous suggestions. This system introduces a tunable re-ranking mechanism that balances relevance with diversity, giving users more engaging and explorative recommendations.

Based on the research: "Improving Recommendation Diversity with Tunable Re-Ranking Techniques" — Arvin Subramanian (2025)


How It Works

The pipeline processes ~45,000 Netflix movies and TV shows through the following stages:

Content Catalog → Feature Engineering → TF-IDF Vectorization → Cosine Similarity → Re-Ranking → Top-N Recommendations

Scoring Formula

Each candidate item is scored as:

Score(i) = Similarity(i, seed) - λ × Overlap_Penalty(i)
  • Similarity — cosine similarity between the candidate and seed item (TF-IDF vectors)
  • λ (diversity_weight) — tunable parameter controlling the relevance/diversity trade-off
  • Overlap_Penalty — penalizes genre and keyword overlap with already-selected items

Features

  • 📄 TF-IDF Vectorization of overview, genres, and keywords
  • 🎛️ Tunable diversity via a single diversity_weight parameter
  • 📊 Multi-faceted evaluation — cosine similarity, genre diversity index, keyword diversity index
  • 🖥️ Interactive Streamlit UI for real-time configuration

Installation

git clone https://github.com/arvinaiengineer/netflix.git
cd netflix
pip install -r requirements.txt

Requirements

pandas
scikit-learn
streamlit
numpy

Usage

Run the Streamlit App

streamlit run app.py

Dataset

This project uses a Netflix titles dataset with the following fields:

  • title — movie/show name
  • overview — plot description
  • genres — comma-separated genre list
  • keywords — associated tags/keywords

A compatible dataset can be found on Kaggle: Netflix Movies and TV Shows.


Limitations

  • Relies solely on content metadata (no user interaction data)
  • Static TF-IDF representations (no semantic embeddings)
  • Evaluated on offline metrics only

Future Work

  • Neural embeddings (BERT / sentence transformers)
  • Reinforcement learning for adaptive diversity weights
  • A/B testing in live environments
  • Multi-stakeholder optimization

References

  • Abdollahpouri, H. (2022). User-centered evaluation of recommender systems beyond accuracy. ACM SIGIR Forum.
  • Zhang, Y., Chen, X., & Wang, L. (2023). Diversity-aware re-ranking strategies for recommender systems. IEEE Transactions on Knowledge and Data Engineering.

License

MIT License — feel free to use, modify, and distribute.

About

Netflix Content Recommender with Diversity Re-Ranking - A content-based recommendation system for Netflix titles that tackles the similarity-diversity paradox — delivering recommendations that are both relevant and varied, avoiding the "echo chamber" effect common in traditional recommenders.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages