

> One of the key projects I built was a **content-based movie recommendation system** using Python. The goal was to recommend similar movies based purely on their descriptions and metadata, without relying on user ratings or collaborative signals.
>
> I started by combining two datasets from the TMDB database — one containing movie details like overviews, genres, and keywords, and the other containing cast and crew information. I did extensive preprocessing: extracting relevant fields, keeping the top 3 cast members, identifying directors from the crew data, and cleaning text to remove spaces and tokenize multi-word phrases.
>
> Then, I consolidated all this into a single text field called `tags`, which effectively became the content representation of a movie. I vectorized these tags using **CountVectorizer** with a limit of 5000 features and English stopword removal. This turned each movie into a vector in high-dimensional space.
>
> Using **cosine similarity**, I computed pairwise similarity scores between all movies. From there, I created a recommendation function that, given a movie, retrieves the top 5 most similar movies based on content.
>
> To make the system interactive, I built a **Streamlit web app** where users can select a movie from a dropdown. When they click "Show Recommendation," the app displays the 5 most similar movies along with their posters. For the posters, I integrated the **TMDB API**, fetching thumbnails dynamically using each movie’s ID.
>
> I also optimized performance by storing the similarity matrix and movie metadata using `pickle`, which avoids recomputation during runtime.
>
> This project gave me solid hands-on experience with **feature engineering**, **NLP-based similarity modeling**, **API integration**, and **end-to-end deployment**. It was also a great way to explore how traditional content-based systems work and how they could potentially be enhanced later using embeddings or generative techniques.

---

