# AI-Powered Movie Recommendation System

## Introduction
This project is an AI-powered Movie Recommendation System that suggests movies similar to a given movie based on content analysis. Using Natural Language Processing (NLP) techniques, the system analyzes movie metadata such as overview, genres, keywords, cast, and crew. The text data is preprocessed and combined into a single "tags" feature, which is then converted into numerical vectors using **Count Vectorizer** and **TF-IDF Vectorizer**. 

The system computes similarity scores between movies using **cosine similarity** and allows users to receive recommendations based on movies they like. This approach helps in discovering movies that match a user's taste without relying on explicit ratings.


In [None]:
import numpy as np 
import pandas as pd
import os
import ast

## Dataset Description

This project uses two datasets from TMDb (The Movie Database):

1. **tmdb_5000_movies.csv**  
   Contains detailed information about 5000 movies, including features such as `title`, `overview`, `genres`, `keywords`, `budget`, `popularity`, and more. This dataset provides the core information about each movie.

2. **tmdb_5000_credits.csv**  
   Contains information about the cast and crew of the movies, including the actors, directors, and other crew members. This dataset is used to extract important contributors to the movie.

Both datasets are merged on the `title` column to combine movie details with their respective cast and crew information, forming the foundation for building the recommendation system.


In [50]:
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')

In [51]:
credits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4803 non-null   int64 
 1   title     4803 non-null   object
 2   cast      4803 non-null   object
 3   crew      4803 non-null   object
dtypes: int64(1), object(3)
memory usage: 150.2+ KB


## Merging Movie and Credits Data

The `movies` dataset is merged with the `credits` dataset using the `title` column as the key:



In [52]:
movies = movies.merge(credits,on='title')

## Selecting Relevant Features

To focus on the most important information for recommendations, we select only the relevant columns from the merged dataset:



In [53]:
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]

## Converting JSON Strings to Lists

The `convert_to_lis` function converts stringified JSON data in columns like `genres` or `keywords` into Python lists:

In [None]:
def convert_to_lis(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name']) 
    return L 

In [13]:
movies.dropna(inplace=True)

In [48]:
movies['genres'] = movies['genres'].apply(convert_to_lis)
movies['keywords'] = movies['keywords'].apply(convert_to_lis)

NameError: name 'convert_to_lis' is not defined

In [15]:
def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
        counter+=1
    return L 

In [16]:
movies['cast'] = movies['cast'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[Christian Bale, Michael Caine, Gary Oldman, A...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [17]:
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

In [18]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [19]:
def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L 

In [20]:
movies['crew'] = movies['crew'].apply(fetch_director)

In [21]:
#movies['overview'] = movies['overview'].apply(lambda x:x.split())
movies.sample(5)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
2549,266856,The Theory of Everything,The Theory of Everything is the extraordinary ...,"[Drama, Romance]","[{""id"": 1157, ""name"": ""wife husband relationsh...","[Eddie Redmayne, Felicity Jones, Charlie Cox]",[James Marsh]
4596,157909,Show Me,When two squeegee kids descend upon Sarah and ...,"[Drama, Thriller]","[{""id"": 1930, ""name"": ""kidnapping""}]","[Michelle Nolden, Katharine Isabelle, Kett Tur...",[Cassandra Nicolaou]
1435,9093,The Four Feathers,"The story, set in 1875, follows a British offi...","[War, Adventure, Drama, Romance]","[{""id"": 187, ""name"": ""islam""}, {""id"": 572, ""na...","[Heath Ledger, Wes Bentley, Kate Hudson]",[Shekhar Kapur]
2120,9963,Premonition,A depressed housewife who learns her husband w...,"[Thriller, Drama, Mystery]","[{""id"": 563, ""name"": ""deja vu""}, {""id"": 3737, ...","[Sandra Bullock, Julian McMahon, Courtney Tayl...",[Mennan Yapo]
1455,49478,Warriors of Virtue,"A young man, Ryan, suffering from a disability...","[Fantasy, Family, Action]","[{""id"": 579, ""name"": ""american football""}, {""i...","[Angus Macfadyen, Mario Yedidia, Marley Shelton]",[Ronny Yu]


In [22]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [23]:
movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

In [24]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, ScienceFiction]","[[, {, "", i, d, "", :, , 1, 4, 6, 3, ,, , "", n,...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[[, {, "", i, d, "", :, , 2, 7, 0, ,, , "", n, a,...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[[, {, "", i, d, "", :, , 4, 7, 0, ,, , "", n, a,...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[[, {, "", i, d, "", :, , 8, 4, 9, ,, , "", n, a,...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, ScienceFiction]","[[, {, "", i, d, "", :, , 8, 1, 8, ,, , "", n, a,...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


In [25]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [26]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [27]:
new = movies.drop(columns=['overview','genres','keywords','cast','crew'])
#new.head()

In [28]:
new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In [None]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=4500,stop_words='english')
    

In [30]:
vector = cv.fit_transform(new['tags']).toarray()

In [31]:
vector.shape

(4806, 5000)

In [32]:
from sklearn.metrics.pairwise import cosine_similarity

In [33]:
similarity = cosine_similarity(vector)

In [34]:
similarity

array([[1.        , 0.15389675, 0.0860663 , ..., 0.        , 0.        ,
        0.        ],
       [0.15389675, 1.        , 0.08830216, ..., 0.03673592, 0.        ,
        0.        ],
       [0.0860663 , 0.08830216, 1.        , ..., 0.03081668, 0.        ,
        0.        ],
       ...,
       [0.        , 0.03673592, 0.03081668, ..., 1.        , 0.07897472,
        0.02746175],
       [0.        , 0.        , 0.        , ..., 0.07897472, 1.        ,
        0.05638839],
       [0.        , 0.        , 0.        , ..., 0.02746175, 0.05638839,
        1.        ]], shape=(4806, 4806))

In [35]:
new[new['title'] == 'The Lego Movie'].index[0]

np.int64(744)

In [36]:
def recommend(movie):
    index = new[new['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:6]:
        print(new.iloc[i[0]].title)
        
    

In [37]:
recommend('Gandhi')

Gandhi, My Father
A Passage to India
Ramanujan
Chariots of Fire
Mr. Turner


In [38]:
import pickle

In [39]:
pickle.dump(new,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))

In [40]:
import requests

def fetch_poster(movie_id):
    url = "https://api.themoviedb.org/3/movie/{}?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US".format(movie_id)
    data = requests.get(url)
    data = data.json()
    poster_path = data['poster_path']
    full_path = "https://image.tmdb.org/t/p/w500/" + poster_path
    return full_path

In [42]:
def recommend(movie):
    index = movies[movies['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda x: x[1])
    recommended_movie_names = []
    recommended_movie_posters = []
    for i in distances[1:6]:
        # fetch the movie poster
        movie_id = movies.iloc[i[0]].movie_id
        recommended_movie_posters.append(fetch_poster(movie_id))
        recommended_movie_names.append(movies.iloc[i[0]].title)
    return recommended_movie_names,recommended_movie_posters

In [43]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

tfidf = TfidfVectorizer(max_features=5000, stop_words='english')
vector = tfidf.fit_transform(new['tags']).toarray()
similarity_tf_idf = cosine_similarity(vector)


In [None]:
pickle.dump(similarity,open('similarity_tf_idf','wb'))