# Movie Recommendation System

I developed a content-based recommendation system using Tmdb data and implemented it on a website using <b>Streamlit</b>. To create the model, I first preprocessed the data by removing duplicate entries, handling missing values, and converting text fields to lowercase. I then used <b>Stemming</b> and <b>Bag of Words</b> techniques to process the overview and tagline fields, which helped in identifying the relevant keywords and improving the accuracy of the model.

The recommendation system suggests movies based on the similarity of their <b>overview,cast,crew,genre and keywords</b> to the user's input. To achieve this, I calculated the <b>cosine similarity</b> of the processed overview and tagline fields, and sorted the results to suggest movies with the highest similarity score.

I implemented the model on a website using Streamlit, which allows users to enter a movie title, and get recommendations based on the similarity of their plots to the input movie. The website also displays additional information about the recommended movies, such as the release date, vote average, and overview.







In [1]:
import pandas as pd
import numpy as np

In [2]:
#Import the dataset
movies = pd.read_csv('tmdb_5000_movies.csv')
credits= pd.read_csv('tmdb_5000_credits.csv')

In [3]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [4]:
movies = movies.merge(credits,on = 'title')

In [5]:
movies.head() 

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [6]:
#Use only the columns that will be used to make movie recomendations
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]

In [7]:
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

In [8]:
movies.dropna(inplace = True)

In [9]:
 movies.iloc[0].genres


'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

The code below defines a function named "convert_genres" which takes a text parameter as input. It checks if the input text is a string, then converts it into a list of movie genres by parsing the string using ast.literal_eval() and appending the 'name' key's value of each genre object in the list.

In [10]:
import ast
def convert_genres(text):
    if isinstance(text, str):
        L = []
        for i in ast.literal_eval(text):
            L.append(i['name'])
        return L
    else:
        return text

In [11]:
movies.dropna(inplace= True)

In [12]:
movies['genres'] = movies['genres'].apply(convert_genres)
movies['genres']


0       [Action, Adventure, Fantasy, Science Fiction]
1                        [Adventure, Fantasy, Action]
2                          [Action, Adventure, Crime]
3                    [Action, Crime, Drama, Thriller]
4                [Action, Adventure, Science Fiction]
                            ...                      
4804                        [Action, Crime, Thriller]
4805                                [Comedy, Romance]
4806               [Comedy, Drama, Romance, TV Movie]
4807                                               []
4808                                    [Documentary]
Name: genres, Length: 4806, dtype: object

In [14]:
movies['keywords'] = movies['keywords'].apply(convert_genres)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [18]:
#The function fetches the first 3 names from the cast
def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
            counter+=1
    return L 


In [None]:
movies['cast'] = movies['cast'].apply(convert3)

In [None]:
movies.head()

In [None]:
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

In [17]:
#The function fetches only the Directors name from the "Crew" column
def fetch_director(text):
    if isinstance(text, str):
        L = []
        for i in ast.literal_eval(text):
            if i['job'] == 'Director':
                L.append(i['name'])
        return L 

movies['crew'] = movies['crew'].apply(fetch_director)

In [None]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [None]:
movies.head()

In [19]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [None]:
movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

In [None]:
movies.head()

In [None]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [None]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [None]:
new_df = movies[['movie_id','title','tags']]

In [None]:
new_df['tags'] = new_df['tags'].apply(lambda x: " ".join(x))


In [None]:
new_df.head()

In [None]:
new_df['tags'][0]

In [None]:
new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())

In [None]:
new_df.head()

In [None]:
#Bag of Words
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features= 5000,stop_words= 'english')

In [None]:
vectors = cv.fit_transform(new_df['tags']).toarray()

In [None]:
cv.get_feature_names_out()

In [None]:
#Stemming
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

In [None]:
def stem(text):
    y = []
    for i in text.split():
        y.append(ps.stem(i))
    
    return " ".join(y)

In [None]:
new_df['tags'] = new_df['tags'].apply(stem)

In [None]:
#Using Cosine Similarity to find out the distances between the vectors of different movies
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
similarity = cosine_similarity(vectors)

In [None]:
def recommend(movie):
    # This function takes a movie title as input and recommends 5 similar movies
    
    # Get the index of the input movie in the similarity matrix
    movie_index = new_df[new_df['title'] == movie].index[0]
    
    # Get the distances (similarities) between the input movie and all other movies
    distances = similarity[movie_index]
    
    # Create a list of tuples containing the index and distance of each movie
    movie_list = sorted(list(enumerate(distances)),reverse = True,key = lambda x:x[1])[1:6]
    
    # Print the titles of the top 5 similar movies
    for i in movie_list:
        print(new_df.iloc[i[0]].title)


The function above recommend takes a movie title as input and recommends 5 movies that are similar to it based on their similarity scores calculated using the cosine similarity method. The function first gets the index of the input movie, then calculates the similarity scores between the input movie and all other movies and returns the titles of the top 5 similar movies.

In [None]:
recommend('Avatar')

Pickle module dumps the movie dataframe converted to a dictionary into a binary file called 'movies_dict.pkl'.

In [None]:
import pickle
pickle.dump(new_df.to_dict(), open('movies_dict.pkl','wb'))

In [None]:
pickle.dump(similarity,open('similarity.pkl','wb'))

The project was a great opportunity to learn and implement content-based recommendation systems using natural language processing techniques. By implementing it on a website, I was able to create a practical tool that can help users discover new movies to watch.