🎬 CineMatch

AI-Powered Movie & Series Recommendation System

Discover your next favourite watch — powered by TF-IDF & Cosine Similarity

📌 Table of Contents

About
Features
How It Works
Tech Stack
Dataset
Project Structure
Installation
Usage
Test Cases
Deployment

🎯 About

CineMatch is a content-based movie and web series recommendation system built with a Netflix-inspired dark UI. Search for any title and instantly get intelligent recommendations based on genre, theme, era, rating tier, director, and cast — not just popularity.

Built as a Data Mining Project using TF-IDF vectorization and Cosine Similarity on 20,000 IMDb titles.

✨ Features

Feature	Description
🔍 Smart Search	Substring matching with "Did you mean?" suggestions
🎬 Source Card	Full details of your searched title — rating, cast, director, IMDb link
🃏 Recommendation Cards	Top N results with genre tags, similarity %, rating bars
📊 Similarity Chart	Interactive Plotly bar chart showing cosine similarity scores
🎛️ Filters	Filter by content type — Movie, TV Series, Mini Series, TV Movie
🕐 Will Be Added Soon	Graceful screen when a title isn't in the library
🌑 Netflix Dark Theme	Full Netflix-style UI with red accents and dark backgrounds
⚡ Auto Model Build	Model builds automatically on first run — no manual setup needed

🧠 How It Works

CineMatch uses a hybrid weighted TF-IDF + Cosine Similarity approach:

Search Query
     │
     ▼
┌─────────────────────────────────────┐
│         Substring Matching          │
│   Finds all titles containing query │
└─────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────┐
│       Feature Soup (Weighted)       │
│                                     │
│  Inferred Subgenre  ████████ 4x     │
│  Genre Tags         ██████████ 5x   │
│  Content Type       ██████ 3x       │
│  Decade Bucket      ████ 2x         │
│  Rating Tier        ████ 2x         │
│  Director           ██ 1x           │
│  Lead Actor         ██ 1x           │
└─────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────┐
│    TF-IDF Vectorization             │
│    20,000 features, bigrams         │
└─────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────┐
│    Cosine Similarity                │
│    Ranked Top-N Results             │
└─────────────────────────────────────┘
     │
     ▼
   Results + Similarity Chart

Why Subgenre Inference?

The IMDb dataset uses broad genre tags — Action, Adventure, Drama is shared by Game of Thrones, Star Trek, and Mahabharat. To fix this, CineMatch infers thematic subgenres from title text:

Subgenre Token	Trigger Keywords
`medieval_fantasy`	dragon, throne, knight, viking, witch, magic...
`scifi_space`	space, galaxy, alien, robot, future...
`crime_thriller`	heist, murder, detective, cop, mafia...
`superhero`	avenger, batman, spider, marvel...
`horror`	zombie, demon, haunted, vampire...

This ensures Game of Thrones → House of the Dragon, Vikings rather than random action shows.

🛠️ Tech Stack

Component	Technology
Frontend	Streamlit + Custom CSS (Netflix Theme)
ML Model	scikit-learn TF-IDF + Cosine Similarity
Visualisation	Plotly interactive bar charts
Data Processing	Pandas, NumPy
Model Storage	Python Pickle
Language	Python 3.9+

📊 Dataset

Source: IMDb Top 20,000 Titles
Size: 20,000 titles
Fields: title, year, type, genre, rating, votes, director, cast, runtime, imdb_url

Content Type	Count
🎬 Movies	15,978
📺 TV Series	3,110
📽️ Mini Series	637
🎥 TV Movies	275
Total	20,000

Rating range: 1.0 ⭐ to 9.6 ⭐

📁 Project Structure

CINEMATCH/
│
├── app.py                  # Streamlit UI — Netflix dark theme
├── movie_analysis.py                  # Recommendation engine — TF-IDF model
├── imdb_dataset.csv        # IMDb dataset (20,000 titles)
├── requirements.txt        # Python dependencies
└── README.md               # You are here

recommendation_model.pkl is auto-generated on first run and not committed to the repo.

⚙️ Installation

1. Clone the repository:

git clone https://github.com/Lakshya438/CINEMATCH.git
cd CINEMATCH

2. Install dependencies:

pip install -r requirements.txt

3. Build the model (first time only):

python movie_analysis.py

4. Run the app:

streamlit run app.py

Open your browser at http://localhost:8501 🎉

🚀 Usage

Type any movie or series name in the search bar
Click a suggestion from the "Did you mean?" list if multiple matches appear
View the source card with full details of your searched title
Browse the recommendation cards below
Analyse the cosine similarity chart to understand match strength
Filter by content type using the sidebar
Adjust the number of recommendations (5–20) using the slider

Quick Search Examples

Search	What You Get
`Game of Thrones`	House of the Dragon, Vikings, The Last Kingdom
`Breaking Bad`	Better Call Saul, Ozark, Narcos
`Inception`	Interstellar, The Matrix, Tenet
`Stranger Things`	Dark, The OA, Haunting of Hill House
`Parasite`	Memories of Murder, Oldboy, The Host

🧪 Test Cases

ID	Query	Expected Output
TC-01	`Game of Thrones`	Medieval fantasy series — House of Dragon, Vikings
TC-02	`Inception`	Sci-Fi/Action movies — Mad Max, Pacific Rim
TC-03	`Breaking Bad`	Crime drama series — 9.5/10 rating shown
TC-04	`Asur`	"Did you mean?" → Asur, Asuran, Devasuram
TC-05	`Avengers`	MCU superhero cluster
TC-06	`xyznonexistent`	🕐 "Will Be Added Soon" screen
TC-07	`The Dark Knight` + Movie filter	Only movies — 96% match for Dark Knight Rises
TC-08	`Stranger Things` + TV Series filter	Only TV series — horror/supernatural cluster

🌐 Deployment

Deployed on Streamlit Community Cloud — free hosting.

To deploy your own:

Fork this repo
Go to Website
Connect your GitHub and select this repo
Set main file as app.py
Click Deploy!

👨‍💻 Author

Lakshya

GitHub: @Lakshya438

📄 License

This project is for educational purposes as part of a Data Mining course project.

Made with ❤️ and 🎬 | Data Mining Project 2026

⭐ Star this repo if you found it useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 CineMatch

AI-Powered Movie & Series Recommendation System

📌 Table of Contents

🎯 About

✨ Features

🧠 How It Works

Why Subgenre Inference?

🛠️ Tech Stack

📊 Dataset

📁 Project Structure

⚙️ Installation

🚀 Usage

Quick Search Examples

🧪 Test Cases

🌐 Deployment

👨‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
README.md		README.md
app.py		app.py
imdb_dataset.csv		imdb_dataset.csv
imdb_scraper.py		imdb_scraper.py
movie_analysis.py		movie_analysis.py
recommendation_model.pkl		recommendation_model.pkl
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎬 CineMatch

AI-Powered Movie & Series Recommendation System

📌 Table of Contents

🎯 About

✨ Features

🧠 How It Works

Why Subgenre Inference?

🛠️ Tech Stack

📊 Dataset

📁 Project Structure

⚙️ Installation

🚀 Usage

Quick Search Examples

🧪 Test Cases

🌐 Deployment

👨‍💻 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages