
# Movie Recommendation System
This notebook demonstrates the implementation of a content-based Movie Recommendation System using Python and the Pandas, Scikit-learn libraries. The goal is to recommend similar movies based on textual features like genres, keywords, and cast. It demonstrates the use of TF-IDF Vectorizer in order to vectorize the movie details for recomendations.


### Import Required Libraries

In [88]:

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import ast


### Load the Datasets

In [89]:
movies_df = pd.read_csv('./tmdb_5000_movies.csv')
credits_df = pd.read_csv('./tmdb_5000_credits.csv')


### Merge Movies and Credits Data on Movie ID

In [90]:

credits_df.rename(columns={'movie_id': 'id'}, inplace=True)
movies_df['id'] = pd.to_numeric(movies_df['id'], errors='coerce')
merged_df.rename(columns={'title_x': 'title'}, inplace=True)
merged_df.dropna(subset=['overview'], inplace=True)  # drop movies without overview


### Parse and Extract Key Features from JSON-like Columns

In [91]:

def parse_cast(x):
    try:
        data = ast.literal_eval(x)
        return ' '.join([i['name'] for i in data[:5]])
    except (ValueError, SyntaxError):
        return ''

def parse_names(x):
    try:
        return ' '.join([i['name'] for i in ast.literal_eval(x)])
    except (ValueError, SyntaxError):
        return ''

def get_director(x):
    try:
        crew = ast.literal_eval(x)
        for member in crew:
            if member['job'] == 'Director':
                return member['name']
        return ''
    except (ValueError, SyntaxError):
        return ''


merged_df['cast'] = merged_df['cast'].apply(parse_cast)
merged_df['genres'] = merged_df['genres'].apply(parse_names)
merged_df['keywords'] = merged_df['keywords'].apply(parse_names)
merged_df['director'] = merged_df['crew'].apply(get_director)



### Combine Features into a Single Text Field

In [92]:

merged_df['combined_features'] = merged_df['overview'] + ' ' + merged_df['genres'] + ' ' + merged_df['keywords'] + ' ' + merged_df['cast'] + ' ' + merged_df['director']


### Convert Text to Vectors using TF-IDF Vectorizer

In [93]:

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(merged_df['combined_features'])


### Compute Cosine Similarity Matrix

In [94]:

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


### Define a Function to Recommend Similar Movies

In [97]:

def recommend(title, num_recommendations=5):
    idx = merged_df[merged_df['title'] == title].index
    if len(idx) == 0:
        return "Movie not found in database."
    idx = idx[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:num_recommendations+1]
    movie_indices = [i[0] for i in sim_scores]
    return merged_df['title'].iloc[movie_indices].tolist()


### Test the Improved Recommender

In [None]:
recommend("The Indian in the Cupboard")
