# Movie Recommentation System

## Introduction
A Movie Recommendation System is designed to help users find movies that match their preferences by analyzing movie-related data. With an ever-growing collection of films available on streaming platforms, it becomes challenging for users to select the right content.

This project leverages the TMDB dataset, which contains extensive information about movies, including cast, crew, genres, and revenue. By applying machine learning techniques such as content-based filtering and collaborative filtering, the system aims to provide accurate and personalized movie recommendations.

## Objective
The main goals of this Movie Recommendation System are:

Data-Driven Suggestions – Utilize movie metadata (such as genres, cast, and crew) to recommend movies tailored to user preferences.
Enhance User Experience – Help users discover new movies effortlessly and reduce decision fatigue.
Implement Machine Learning Models – Apply recommendation algorithms, such as content-based filtering, to analyze movie similarities.
Work with Real-World Data – Process the TMDB dataset efficiently and handle missing or inaccurate information.
Scalability and Performance – Design a system that can provide fast and relevant recommendations across a vast movie database.

## Data Story

dataset contains full credits for both the cast and the crew, rather than just the first three actors.

Actor and actresses are now listed in the order they appear in the credits. It's unclear what ordering the original dataset used; for the movies I spot checked it didn't line up with either the credits order or IMDB's stars order.

The revenues appear to be more current. For example, IMDB's figures for Avatar seem to be from 2010 and understate the film's global revenues by over $2 billion.

Some of the movies that we weren't able to port over (a couple of hundred) were just bad entries. For example, this IMDB entry has basically no accurate information at all. It lists Star Wars Episode VII as a documentary.

In [1]:
#Importing Necessary libraries
import pandas as pd
import ast
from sklearn.feature_extraction.text import CountVectorizer
import nltk
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Fetching datasets from folder
movies_df = pd.read_csv('D:/Datas/tmdb_5000_movies.csv')
credits_df = pd.read_csv("D:/Datas/tmdb_5000_credits.csv")

In [3]:
movies_df

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",,9367,"[{""id"": 5616, ""name"": ""united states\u2013mexi...",es,El Mariachi,El Mariachi just wants to play his guitar and ...,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,"[{""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}]",Released,"He didn't come looking for trouble, but troubl...",El Mariachi,6.6,238
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",,72766,[],en,Newlyweds,A newlywed couple's honeymoon is upended by th...,0.642552,[],[],2011-12-26,0,85.0,[],Released,A newlywed couple's honeymoon is upended by th...,Newlyweds,5.9,5
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",http://www.hallmarkchannel.com/signedsealeddel...,231617,"[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...",en,"Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic...",1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,"Signed, Sealed, Delivered",7.0,6
4801,0,[],http://shanghaicalling.com/,126186,[],en,Shanghai Calling,When ambitious New York attorney Sam is sent t...,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,A New Yorker in Shanghai,Shanghai Calling,5.7,7


In [4]:
credits_df.head(2)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [5]:
movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [6]:
credits_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4803 non-null   int64 
 1   title     4803 non-null   object
 2   cast      4803 non-null   object
 3   crew      4803 non-null   object
dtypes: int64(1), object(3)
memory usage: 150.2+ KB


In [7]:
movies_df.isnull().sum()

budget                     0
genres                     0
homepage                3091
id                         0
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
dtype: int64

In [8]:
credits_df.isnull().sum()

movie_id    0
title       0
cast        0
crew        0
dtype: int64

In [9]:
print('Duplicates values in movie df:',movies_df.duplicated().sum()
      ,'\nDuplicate values in credits df:',credits_df.duplicated().sum())

Duplicates values in movie df: 0 
Duplicate values in credits df: 0


In [10]:
movies_df=movies_df[['genres','keywords','production_companies','title','overview']]
movies_df

Unnamed: 0,genres,keywords,production_companies,title,overview
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...",Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",Spectre,A cryptic message from Bond’s past sends him o...
3,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",The Dark Knight Rises,Following the death of District Attorney Harve...
4,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",John Carter,"John Carter is a war-weary, former military ca..."
...,...,...,...,...,...
4798,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 5616, ""name"": ""united states\u2013mexi...","[{""name"": ""Columbia Pictures"", ""id"": 5}]",El Mariachi,El Mariachi just wants to play his guitar and ...
4799,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",[],[],Newlyweds,A newlywed couple's honeymoon is upended by th...
4800,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...","[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","[{""name"": ""Front Street Pictures"", ""id"": 3958}...","Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic..."
4801,[],[],[],Shanghai Calling,When ambitious New York attorney Sam is sent t...


In [11]:
movies_df = movies_df.dropna()

In [12]:
def convert(obj):
    l=[]
    for i in ast.literal_eval(obj):# ast.literal_eval(obj) Convert string to list
        l.append(i['name'])
    return l

In [13]:
movies_df['genres']=movies_df['genres'].apply(convert)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies_df['genres']=movies_df['genres'].apply(convert)


In [14]:
movies_df['genres'][0]

['Action', 'Adventure', 'Fantasy', 'Science Fiction']

In [15]:
movies_df['keywords'][0]

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

In [16]:
movies_df['keywords']=movies_df['keywords'].apply(convert)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies_df['keywords']=movies_df['keywords'].apply(convert)


In [17]:
movies_df['keywords'][0]

['culture clash',
 'future',
 'space war',
 'space colony',
 'society',
 'space travel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alien planet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'love affair',
 'anti war',
 'power relations',
 'mind and soul',
 '3d']

In [18]:
movies_df

Unnamed: 0,genres,keywords,production_companies,title,overview
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""name"": ""Ingenious Film Partners"", ""id"": 289...",Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",Spectre,A cryptic message from Bond’s past sends him o...
3,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",The Dark Knight Rises,Following the death of District Attorney Harve...
4,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",John Carter,"John Carter is a war-weary, former military ca..."
...,...,...,...,...,...
4798,"[Action, Crime, Thriller]","[united states–mexico barrier, legs, arms, pap...","[{""name"": ""Columbia Pictures"", ""id"": 5}]",El Mariachi,El Mariachi just wants to play his guitar and ...
4799,"[Comedy, Romance]",[],[],Newlyweds,A newlywed couple's honeymoon is upended by th...
4800,"[Comedy, Drama, Romance, TV Movie]","[date, love at first sight, narration, investi...","[{""name"": ""Front Street Pictures"", ""id"": 3958}...","Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic..."
4801,[],[],[],Shanghai Calling,When ambitious New York attorney Sam is sent t...


In [19]:
movies_df['production_companies'][0]

'[{"name": "Ingenious Film Partners", "id": 289}, {"name": "Twentieth Century Fox Film Corporation", "id": 306}, {"name": "Dune Entertainment", "id": 444}, {"name": "Lightstorm Entertainment", "id": 574}]'

In [20]:
movies_df['production_companies']=movies_df['production_companies'].apply(convert)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies_df['production_companies']=movies_df['production_companies'].apply(convert)


In [21]:
movies_df.head(2)

Unnamed: 0,genres,keywords,production_companies,title,overview
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Ingenious Film Partners, Twentieth Century Fo...",Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Walt Disney Pictures, Jerry Bruckheimer Films...",Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."


In [22]:
def convert2(obj):
    l=[]
    for i in ast.literal_eval(obj):
        if i["job"]== "Director":
            l.append(i['name'])
    return l

In [23]:
credits_df['crew']=credits_df['crew'].apply(convert2)

In [24]:
credits_df

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...",[Gore Verbinski]
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...",[Sam Mendes]
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...",[Christopher Nolan]
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...",[Andrew Stanton]
...,...,...,...,...
4798,9367,El Mariachi,"[{""cast_id"": 1, ""character"": ""El Mariachi"", ""c...",[Robert Rodriguez]
4799,72766,Newlyweds,"[{""cast_id"": 1, ""character"": ""Buzzy"", ""credit_...",[Edward Burns]
4800,231617,"Signed, Sealed, Delivered","[{""cast_id"": 8, ""character"": ""Oliver O\u2019To...",[Scott Smith]
4801,126186,Shanghai Calling,"[{""cast_id"": 3, ""character"": ""Sam"", ""credit_id...",[Daniel Hsia]


In [25]:
def convert3(obj):
    l=[]
    counter=0
    for i in ast.literal_eval(obj):
        if counter!=3:
            l.append(i['name'])
            counter +=1
        else:
            break
        
    return l

In [26]:
credits_df['cast']=credits_df['cast'].apply(convert3)

In [27]:
credits_df

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,"[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,"[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]
...,...,...,...,...
4798,9367,El Mariachi,"[Carlos Gallardo, Jaime de Hoyos, Peter Marqua...",[Robert Rodriguez]
4799,72766,Newlyweds,"[Edward Burns, Kerry Bishé, Marsha Dietlein]",[Edward Burns]
4800,231617,"Signed, Sealed, Delivered","[Eric Mabius, Kristin Booth, Crystal Lowe]",[Scott Smith]
4801,126186,Shanghai Calling,"[Daniel Henney, Eliza Coupe, Bill Paxton]",[Daniel Hsia]


In [28]:
new_df=movies_df.merge(credits_df,on = 'title')

In [29]:
new_df

Unnamed: 0,genres,keywords,production_companies,title,overview,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Ingenious Film Partners, Twentieth Century Fo...",Avatar,"In the 22nd century, a paraplegic Marine is di...",19995,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,"[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Walt Disney Pictures, Jerry Bruckheimer Films...",Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",285,"[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Columbia Pictures, Danjaq, B24]",Spectre,A cryptic message from Bond’s past sends him o...,206647,"[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Legendary Pictures, Warner Bros., DC Entertai...",The Dark Knight Rises,Following the death of District Attorney Harve...,49026,"[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,"[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...",[Walt Disney Pictures],John Carter,"John Carter is a war-weary, former military ca...",49529,"[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]
...,...,...,...,...,...,...,...,...
4801,"[Action, Crime, Thriller]","[united states–mexico barrier, legs, arms, pap...",[Columbia Pictures],El Mariachi,El Mariachi just wants to play his guitar and ...,9367,"[Carlos Gallardo, Jaime de Hoyos, Peter Marqua...",[Robert Rodriguez]
4802,"[Comedy, Romance]",[],[],Newlyweds,A newlywed couple's honeymoon is upended by th...,72766,"[Edward Burns, Kerry Bishé, Marsha Dietlein]",[Edward Burns]
4803,"[Comedy, Drama, Romance, TV Movie]","[date, love at first sight, narration, investi...","[Front Street Pictures, Muse Entertainment Ent...","Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic...",231617,"[Eric Mabius, Kristin Booth, Crystal Lowe]",[Scott Smith]
4804,[],[],[],Shanghai Calling,When ambitious New York attorney Sam is sent t...,126186,"[Daniel Henney, Eliza Coupe, Bill Paxton]",[Daniel Hsia]


In [30]:
new_df['genres']=new_df['genres'].apply(lambda x: [i.replace(' ','') for i in x])
new_df['keywords']=new_df['keywords'].apply(lambda x: [i.replace(' ','') for i in x])
new_df['cast']=new_df['cast'].apply(lambda x: [i.replace(' ','') for i in x])
new_df['crew']=new_df['crew'].apply(lambda x: [i.replace(' ','') for i in x])
new_df['production_companies']=new_df['production_companies'].apply(lambda x: [i.replace(' ','') for i in x])

In [31]:
new_df

Unnamed: 0,genres,keywords,production_companies,title,overview,movie_id,cast,crew
0,"[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[IngeniousFilmPartners, TwentiethCenturyFoxFil...",Avatar,"In the 22nd century, a paraplegic Marine is di...",19995,"[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,"[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[WaltDisneyPictures, JerryBruckheimerFilms, Se...",Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",285,"[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[ColumbiaPictures, Danjaq, B24]",Spectre,A cryptic message from Bond’s past sends him o...,206647,"[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,"[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[LegendaryPictures, WarnerBros., DCEntertainme...",The Dark Knight Rises,Following the death of District Attorney Harve...,49026,"[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,"[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...",[WaltDisneyPictures],John Carter,"John Carter is a war-weary, former military ca...",49529,"[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]
...,...,...,...,...,...,...,...,...
4801,"[Action, Crime, Thriller]","[unitedstates–mexicobarrier, legs, arms, paper...",[ColumbiaPictures],El Mariachi,El Mariachi just wants to play his guitar and ...,9367,"[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez]
4802,"[Comedy, Romance]",[],[],Newlyweds,A newlywed couple's honeymoon is upended by th...,72766,"[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns]
4803,"[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[FrontStreetPictures, MuseEntertainmentEnterpr...","Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic...",231617,"[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith]
4804,[],[],[],Shanghai Calling,When ambitious New York attorney Sam is sent t...,126186,"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia]


In [32]:
new_df['overview']=new_df['overview'].apply(lambda x: x.split())

In [33]:
new_df['tags']= new_df['overview']+new_df['genres']+new_df['keywords']+new_df['cast']+new_df['crew']+new_df['production_companies']

In [34]:
new_df

Unnamed: 0,genres,keywords,production_companies,title,overview,movie_id,cast,crew,tags
0,"[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[IngeniousFilmPartners, TwentiethCenturyFoxFil...",Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...",19995,"[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,"[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[WaltDisneyPictures, JerryBruckheimerFilms, Se...",Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...",285,"[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[ColumbiaPictures, Danjaq, B24]",Spectre,"[A, cryptic, message, from, Bond’s, past, send...",206647,"[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
3,"[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[LegendaryPictures, WarnerBros., DCEntertainme...",The Dark Knight Rises,"[Following, the, death, of, District, Attorney...",49026,"[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[Following, the, death, of, District, Attorney..."
4,"[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...",[WaltDisneyPictures],John Carter,"[John, Carter, is, a, war-weary,, former, mili...",49529,"[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...,...,...,...,...,...,...
4801,"[Action, Crime, Thriller]","[unitedstates–mexicobarrier, legs, arms, paper...",[ColumbiaPictures],El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui...",9367,"[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez],"[El, Mariachi, just, wants, to, play, his, gui..."
4802,"[Comedy, Romance]",[],[],Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended...",72766,"[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],"[A, newlywed, couple's, honeymoon, is, upended..."
4803,"[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[FrontStreetPictures, MuseEntertainmentEnterpr...","Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,...",231617,"[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[""Signed,, Sealed,, Delivered"", introduces, a,..."
4804,[],[],[],Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is...",126186,"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],"[When, ambitious, New, York, attorney, Sam, is..."


In [35]:
new_df=new_df[['movie_id','title','tags']]
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4801,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4802,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4803,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4804,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [36]:
new_df['tags']=new_df['tags'].apply(lambda x: ' '.join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x: ' '.join(x))


In [37]:
new_df['tags']=new_df['tags'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:x.lower())


In [38]:
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a paraplegic marine is di..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believed to be dead, ha..."
2,206647,Spectre,a cryptic message from bond’s past sends him o...
3,49026,The Dark Knight Rises,following the death of district attorney harve...
4,49529,John Carter,"john carter is a war-weary, former military ca..."
...,...,...,...
4801,9367,El Mariachi,el mariachi just wants to play his guitar and ...
4802,72766,Newlyweds,a newlywed couple's honeymoon is upended by th...
4803,231617,"Signed, Sealed, Delivered","""signed, sealed, delivered"" introduces a dedic..."
4804,126186,Shanghai Calling,when ambitious new york attorney sam is sent t...


In [39]:
# CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')

In [40]:
cv.fit_transform(new_df['tags']).toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [41]:
vectors=cv.fit_transform(new_df['tags']).toarray()
vectors.shape

(4806, 5000)

In [42]:
# PorterStemmer
ps = PorterStemmer()

In [43]:
def stem(text):
    y=[]
    for i in text.split():
        y.append(ps.stem(i))
    return " ".join(y)

In [44]:
new_df['tags'] = new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(stem)


In [45]:
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."
2,206647,Spectre,a cryptic messag from bond’ past send him on a...
3,49026,The Dark Knight Rises,follow the death of district attorney harvey d...
4,49529,John Carter,"john carter is a war-weary, former militari ca..."
...,...,...,...
4801,9367,El Mariachi,el mariachi just want to play hi guitar and ca...
4802,72766,Newlyweds,a newlyw couple' honeymoon is upend by the arr...
4803,231617,"Signed, Sealed, Delivered","""signed, sealed, delivered"" introduc a dedic q..."
4804,126186,Shanghai Calling,when ambiti new york attorney sam is sent to s...


In [46]:
#cosine_similarity - cosine_similarity measure the similarity between two vectors
cosine_similarity(vectors)

array([[1.        , 0.08471737, 0.05647825, ..., 0.0244558 , 0.0270369 ,
        0.        ],
       [0.08471737, 1.        , 0.06060606, ..., 0.02624319, 0.        ,
        0.        ],
       [0.05647825, 0.06060606, 1.        , ..., 0.02624319, 0.        ,
        0.        ],
       ...,
       [0.0244558 , 0.02624319, 0.02624319, ..., 1.        , 0.07537784,
        0.0489116 ],
       [0.0270369 , 0.        , 0.        , ..., 0.07537784, 1.        ,
        0.05407381],
       [0.        , 0.        , 0.        , ..., 0.0489116 , 0.05407381,
        1.        ]])

In [47]:
cosine_similarity(vectors).shape

(4806, 4806)

In [48]:
similarity = cosine_similarity(vectors)

In [49]:
similarity[0]

array([1.        , 0.08471737, 0.05647825, ..., 0.0244558 , 0.0270369 ,
       0.        ])

In [50]:
sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])[1:6]

[(539, 0.27420424855354086),
 (1216, 0.26914524410099555),
 (507, 0.260132990857236),
 (260, 0.23693955110363693),
 (61, 0.23179316248638274)]

In [51]:
def recommentation_system(movie):
    movie_index = new_df[new_df['title']==movie].index[0]
    distance = similarity[movie_index]
    movie_lists = sorted(list(enumerate(distance)),reverse = True,key = lambda x:x[1])[1:6]
    
    for i in movie_lists:
        print(new_df.iloc[i[0]].title)

In [52]:
recommentation_system("Pirates of the Caribbean: At World's End")

Pirates of the Caribbean: Dead Man's Chest
Pirates of the Caribbean: On Stranger Tides
Pirates of the Caribbean: The Curse of the Black Pearl
20,000 Leagues Under the Sea
Puss in Boots


In [53]:
recommentation_system("Avatar")

Titan A.E.
Aliens vs Predator: Requiem
Independence Day
Ender's Game
Jupiter Ascending


In [54]:
recommentation_system("Inception")

Duplex
The Helix... Loaded
Star Trek II: The Wrath of Khan
Nancy Drew
Chicago Overcoat
