# CineMatch: Intelligent Movie Recommendation System

**Module E:** AI Applications â€“ Individual Open Project  
**Student Name:** Bhupesh Bhatia  
**Project Track:** Recommendation Systems (Content-Based + Hybrid Signals)  
**Tools & Technologies:** Python, Jupyter Notebook, Scikit-learn, Cosine Similarity,MinMaX Scaler,Pandas, Streamlit, TMDB API

---


## 1. Problem Definition & Objective

### a. Selected Project Track
This project falls under the **Movie Recommendation Systems** track.

### b. Problem Statement
Traditional movie recommendation systems rely only on similarity or popularity. This project introduces a **user-controlled dial** to balance similarity, popularity, and recency.

### c. Real-World Relevance
Such adaptive systems are widely used in OTT platforms like Netflix.

## 2. Data Understanding & Preparation

- **Dataset:** TMDB 5000 Movies Dataset (public)
- Movies metadata and credits were merged
- Missing values were removed
- Text features were engineered for similarity computation

In [5]:
import numpy as np
import pandas as pd
import ast
import pickle

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

In [7]:
#1. LOAD DATA
movies = pd.read_csv(r"C:\Users\sange\OneDrive\New folder\Desktop\python project\ropar project\archive(4)\tmdb_5000_movies.csv")
credits = pd.read_csv(r"C:\Users\sange\OneDrive\New folder\Desktop\python project\ropar project\archive(4)\tmdb_5000_credits.csv")

# Merge datasets
movies = movies.merge(credits, on='title')

In [9]:
#2. SELECT IMPORTANT FEATURES
movies = movies[['movie_id', 'title', 'overview', 'genres',
                 'keywords', 'cast', 'crew',
                 'popularity', 'release_date']]

movies.dropna(inplace=True)

In [11]:
# 3. HELPER FUNCTIONS FOR TEXT PROCESSING
def convert(text):
    """Extract names from list of dictionaries"""
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

def convert_cast(text):
    """Extract top 3 cast members"""
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
            counter += 1
    return L

def fetch_director(text):
    """Extract director name"""
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
            break
    return L

In [13]:
# 4. APPLY PROCESSING
movies['genres'] = movies['genres'].apply(convert)
movies['keywords'] = movies['keywords'].apply(convert)
movies['cast'] = movies['cast'].apply(convert_cast)
movies['crew'] = movies['crew'].apply(fetch_director)

# Remove spaces (important for vectorization)
movies['genres'] = movies['genres'].apply(lambda x: [i.replace(" ", "") for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x: [i.replace(" ", "") for i in x])
movies['cast'] = movies['cast'].apply(lambda x: [i.replace(" ", "") for i in x])
movies['crew'] = movies['crew'].apply(lambda x: [i.replace(" ", "") for i in x])

In [None]:
# 5. CREATE TAGS (CORE OF CONTENT-BASED SYSTEM)
movies['tags'] = (
    movies['overview'].apply(lambda x: x.split()) +
    movies['genres'] +
    movies['keywords'] +
    movies['cast'] +
    movies['crew']
)

new_df = movies[['movie_id', 'title', 'tags', 'popularity', 'release_date']]

new_df['tags'] = new_df['tags'].apply(lambda x: " ".join(x))
new_df['tags'] = new_df['tags'].str.lower()

## 3. Model / System Design

- Content-based filtering using Bag-of-Words
- Cosine similarity for relevance
- Popularity and recency normalization
- Weighted scoring mechanism

In [None]:
# 6. VECTORIZE TEXT
cv = CountVectorizer(max_features=5000, stop_words='english')
vectors = cv.fit_transform(new_df['tags']).toarray()

In [None]:
# 7. COSINE SIMILARITY
similarity = cosine_similarity(vectors)

In [None]:
# 8. ADD FEATURE-1 METADATA (POPULARITY + RECENCY)

# Extract release year
new_df['release_year'] = new_df['release_date'].apply(
    lambda x: int(x.split('-')[0]) if isinstance(x, str) else 0
)

# Normalize popularity & recency
scaler = MinMaxScaler()
new_df[['popularity_norm', 'recency_norm']] = scaler.fit_transform(
    new_df[['popularity', 'release_year']]
)

In [None]:
# 9. ORIGINAL RECOMMEND FUNCTION (BASELINE)
def recommend(movie):
    index = new_df[new_df['title'] == movie].index[0]
    distances = similarity[index]

    movies_list = sorted(
        list(enumerate(distances)),
        reverse=True,
        key=lambda x: x[1]
    )[1:6]

    return [new_df.iloc[i[0]].title for i in movies_list]

In [None]:
# 10. FEATURE-1: USER-CONTROLLED RECOMMENDATION FUNCTION
def recommend_with_control(
    movie,
    similarity_weight=0.7,
    popularity_weight=0.2,
    recency_weight=0.1
):
    """
    User-controlled recommendation dial:
    - similarity_weight
    - popularity_weight
    - recency_weight
    """

    index = new_df[new_df['title'] == movie].index[0]

    scores = []

    for i, sim_score in enumerate(similarity[index]):
        final_score = (
            similarity_weight * sim_score +
            popularity_weight * new_df.iloc[i]['popularity_norm'] +
            recency_weight * new_df.iloc[i]['recency_norm']
        )
        scores.append((i, final_score))

    scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:6]

    recommendations = []
    explanations = []

    for i, score in scores:
        recommendations.append(new_df.iloc[i].title)
        explanations.append({
            "similarity": round(similarity[index][i], 3),
            "popularity": round(new_df.iloc[i]['popularity_norm'], 3),
            "recency": round(new_df.iloc[i]['recency_norm'], 3)
        })

    return recommendations, explanations

In [None]:
# 11. SAVE FILES FOR STREAMLIT
pickle.dump(new_df.to_dict(), open('movie_dict.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))

print(" Model preprocessing complete.")
print(" Feature-1 (User-Controlled Dial) added.")
print(" Ready for Streamlit integration.")


## 4. Evaluation & Analysis

- Qualitative evaluation of recommendations
- Adaptive behavior observed via weight tuning
- Limitations include cold-start problem

## 5. Ethical Considerations & Responsible AI

- Dataset bias may influence results
- Transparency maintained via user controls
- Educational use only

## 6. Conclusion & Future Scope

The project demonstrates a flexible recommendation engine.
Future improvements include collaborative filtering and deployment.