
---

###  Project: Content-Based Movie Recommender System (with Streamlit + TMDB API)

This was a complete end-to-end project where I built a content-based movie recommendation system using Python. It recommends movies based on **textual similarity of their metadata**, rather than user behavior or ratings.

---

###  Problem Statement

The goal was to recommend movies that are similar in *content* to a movie the user selects — things like plot, genres, actors, and keywords.

---

###  Step 1: Data Collection & Preprocessing

I used two TMDB datasets:

* `movies.csv`: includes movie overviews, genres, keywords.
* `credits.csv`: includes cast and crew details.

I merged these on the movie title and extracted relevant features:

* **Genres, keywords, top 3 cast members, and director**.
* Cleaned the text (e.g., removed spaces in names like "Tom Cruise" → "TomCruise").
* Combined everything into a single text field called **`tags`**, which served as the input for NLP.

---

###  Step 2: Vectorization and Similarity

I converted the `tags` field to numerical vectors using **CountVectorizer**, with:

* A max vocabulary size of 5000
* Removal of English stopwords

Then, I computed the **cosine similarity** between vectors to identify movies with similar content. This gave me a similarity matrix that I use to find top recommendations for any given movie.

---

###  Step 3: Recommendation Logic

When a user selects a movie, I:

* Look up its index in the dataset
* Fetch its similarity scores with all other movies
* Sort them and return the top 5 most similar ones (excluding the selected movie)

---

###  Step 4: Fetching Posters via TMDB API

To make the UI more engaging, I used the **TMDB API** to fetch high-resolution movie posters using the `movie_id`.

* The API returns a poster path, which I converted to a full image URL.
* This helps visually reinforce the recommendations.

---

###  Step 5: Frontend with Streamlit

I used **Streamlit** to build the UI:

* Dropdown to select a movie
* On button click, it shows 5 recommended movies
* Posters and titles are displayed side-by-side using columns

I also serialized the similarity matrix and movie metadata using `pickle` for faster loading in the web app.

---

###  Tech Stack Summary

* **NLP**: CountVectorizer + Cosine Similarity
* **Data**: Pandas, NumPy, CSVs
* **API**: TMDB for poster thumbnails
* **UI**: Streamlit for interactive frontend
* **Serialization**: Pickle for storing precomputed data

---

###  Key Learnings

* Learned how to extract and engineer meaningful features from messy text and JSON data.
* Understood how to apply NLP techniques for similarity-based recommendations.
* Gained experience integrating external APIs and building user-facing web apps.
* Demonstrated full-cycle development — from data wrangling to frontend deployment.

---
