
# 🎬 Movie Recommendation System

This notebook demonstrates the development of a content-based movie recommendation system using metadata such as cast, crew, keywords, and genres. We use two datasets from TMDB: `credits.csv` and `movies.csv`.

---


In [1]:
# 📦 Importing required libraries
import pandas as pd
import matplotlib as plt
import seaborn as sns

In [2]:
# 📥 Reading data files (credits and movies)
data_1=pd.read_csv("credits.csv")
data_2=pd.read_csv("movies.csv")

In [3]:
# 👀 Displaying the first few rows to understand the data
data_1.head(2)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [4]:
data_1["crew"][0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

In [5]:
# 👀 Displaying the first few rows to understand the data
data_2.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [6]:
# 🔗 Merging the movies and credits data on movie ID
movies=pd.merge(data_1,data_2,left_on="movie_id",right_on="id")

In [7]:
movies.shape

(4803, 24)

In [8]:
type(movies["genres"][1])

str

In [9]:
# 📦 Importing required libraries
import json

In [10]:
json.loads(movies["genres"][1])

[{'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 28, 'name': 'Action'}]

In [11]:
movies.columns

Index(['movie_id', 'title_x', 'cast', 'crew', 'budget', 'genres', 'homepage',
       'id', 'keywords', 'original_language', 'original_title', 'overview',
       'popularity', 'production_companies', 'production_countries',
       'release_date', 'revenue', 'runtime', 'spoken_languages', 'status',
       'tagline', 'title_y', 'vote_average', 'vote_count'],
      dtype='object')

In [12]:
movies=movies[["movie_id","title_x","genres","keywords","overview","cast","crew"]]

In [13]:
movies

Unnamed: 0,movie_id,title_x,genres,keywords,overview,cast,crew
0,19995,Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."
...,...,...,...,...,...,...,...
4798,9367,El Mariachi,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 5616, ""name"": ""united states\u2013mexi...",El Mariachi just wants to play his guitar and ...,"[{""cast_id"": 1, ""character"": ""El Mariachi"", ""c...","[{""credit_id"": ""52fe44eec3a36847f80b280b"", ""de..."
4799,72766,Newlyweds,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",[],A newlywed couple's honeymoon is upended by th...,"[{""cast_id"": 1, ""character"": ""Buzzy"", ""credit_...","[{""credit_id"": ""52fe487dc3a368484e0fb013"", ""de..."
4800,231617,"Signed, Sealed, Delivered","[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...","[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","""Signed, Sealed, Delivered"" introduces a dedic...","[{""cast_id"": 8, ""character"": ""Oliver O\u2019To...","[{""credit_id"": ""52fe4df3c3a36847f8275ecf"", ""de..."
4801,126186,Shanghai Calling,[],[],When ambitious New York attorney Sam is sent t...,"[{""cast_id"": 3, ""character"": ""Sam"", ""credit_id...","[{""credit_id"": ""52fe4ad9c3a368484e16a36b"", ""de..."


In [14]:
def ext_value(list):
    values=[]
    for i in json.loads(list):
        values.append(i["name"])
    return values

In [15]:
movies["genres"]=movies["genres"].apply(ext_value)

In [16]:
movies["genres"]

0       [Action, Adventure, Fantasy, Science Fiction]
1                        [Adventure, Fantasy, Action]
2                          [Action, Adventure, Crime]
3                    [Action, Crime, Drama, Thriller]
4                [Action, Adventure, Science Fiction]
                            ...                      
4798                        [Action, Crime, Thriller]
4799                                [Comedy, Romance]
4800               [Comedy, Drama, Romance, TV Movie]
4801                                               []
4802                                    [Documentary]
Name: genres, Length: 4803, dtype: object

In [17]:
movies["keywords"]=movies["keywords"].apply(ext_value)

In [18]:
movies["cast"][1]

'[{"cast_id": 4, "character": "Captain Jack Sparrow", "credit_id": "52fe4232c3a36847f800b50d", "gender": 2, "id": 85, "name": "Johnny Depp", "order": 0}, {"cast_id": 5, "character": "Will Turner", "credit_id": "52fe4232c3a36847f800b511", "gender": 2, "id": 114, "name": "Orlando Bloom", "order": 1}, {"cast_id": 6, "character": "Elizabeth Swann", "credit_id": "52fe4232c3a36847f800b515", "gender": 1, "id": 116, "name": "Keira Knightley", "order": 2}, {"cast_id": 12, "character": "William \\"Bootstrap Bill\\" Turner", "credit_id": "52fe4232c3a36847f800b52d", "gender": 2, "id": 1640, "name": "Stellan Skarsg\\u00e5rd", "order": 3}, {"cast_id": 10, "character": "Captain Sao Feng", "credit_id": "52fe4232c3a36847f800b525", "gender": 2, "id": 1619, "name": "Chow Yun-fat", "order": 4}, {"cast_id": 9, "character": "Captain Davy Jones", "credit_id": "52fe4232c3a36847f800b521", "gender": 2, "id": 2440, "name": "Bill Nighy", "order": 5}, {"cast_id": 7, "character": "Captain Hector Barbossa", "credit_

In [19]:
def ext_value2(list):
    values=[]
    for i in json.loads(list):
        values.append(i["name"])
    return values[:3]

In [20]:
movies["cast"]=movies["cast"].apply(ext_value2)

In [21]:
movies["cast"]

0        [Sam Worthington, Zoe Saldana, Sigourney Weaver]
1           [Johnny Depp, Orlando Bloom, Keira Knightley]
2            [Daniel Craig, Christoph Waltz, Léa Seydoux]
3            [Christian Bale, Michael Caine, Gary Oldman]
4          [Taylor Kitsch, Lynn Collins, Samantha Morton]
                              ...                        
4798    [Carlos Gallardo, Jaime de Hoyos, Peter Marqua...
4799         [Edward Burns, Kerry Bishé, Marsha Dietlein]
4800           [Eric Mabius, Kristin Booth, Crystal Lowe]
4801            [Daniel Henney, Eliza Coupe, Bill Paxton]
4802    [Drew Barrymore, Brian Herzlinger, Corey Feldman]
Name: cast, Length: 4803, dtype: object

In [22]:
def find_dir(list):
    values=[]
    for i in json.loads(list):
        if i["job"]=="Director":
            values.append(i["name"])
            break
    return values

In [23]:
movies["crew"]=movies["crew"].apply(find_dir)
movies["crew"]

0           [James Cameron]
1          [Gore Verbinski]
2              [Sam Mendes]
3       [Christopher Nolan]
4          [Andrew Stanton]
               ...         
4798     [Robert Rodriguez]
4799         [Edward Burns]
4800          [Scott Smith]
4801          [Daniel Hsia]
4802     [Brian Herzlinger]
Name: crew, Length: 4803, dtype: object

In [24]:
movies["overview"]=movies["overview"].str.split()

In [25]:
movies["overview"]

0       [In, the, 22nd, century,, a, paraplegic, Marin...
1       [Captain, Barbossa,, long, believed, to, be, d...
2       [A, cryptic, message, from, Bond’s, past, send...
3       [Following, the, death, of, District, Attorney...
4       [John, Carter, is, a, war-weary,, former, mili...
                              ...                        
4798    [El, Mariachi, just, wants, to, play, his, gui...
4799    [A, newlywed, couple's, honeymoon, is, upended...
4800    ["Signed,, Sealed,, Delivered", introduces, a,...
4801    [When, ambitious, New, York, attorney, Sam, is...
4802    [Ever, since, the, second, grade, when, he, fi...
Name: overview, Length: 4803, dtype: object

In [26]:
movies.columns

Index(['movie_id', 'title_x', 'genres', 'keywords', 'overview', 'cast',
       'crew'],
      dtype='object')

In [27]:
def collapse(lst):
    final_lst = []
    for i in lst:
        final_lst.append(i.replace(" ",""))
    return final_lst

In [28]:
movies["genres"]=movies["genres"].apply(collapse)
movies["keywords"]=movies["keywords"].apply(collapse)
movies["crew"]=movies["crew"].apply(collapse)
movies["cast"]=movies["cast"].apply(collapse)

In [29]:
movies

Unnamed: 0,movie_id,title_x,genres,keywords,overview,cast,crew
0,19995,Avatar,"[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[In, the, 22nd, century,, a, paraplegic, Marin...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[Captain, Barbossa,, long, believed, to, be, d...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[A, cryptic, message, from, Bond’s, past, send...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[Following, the, death, of, District, Attorney...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[John, Carter, is, a, war-weary,, former, mili...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]
...,...,...,...,...,...,...,...
4798,9367,El Mariachi,"[Action, Crime, Thriller]","[unitedstates–mexicobarrier, legs, arms, paper...","[El, Mariachi, just, wants, to, play, his, gui...","[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez]
4799,72766,Newlyweds,"[Comedy, Romance]",[],"[A, newlywed, couple's, honeymoon, is, upended...","[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns]
4800,231617,"Signed, Sealed, Delivered","[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[""Signed,, Sealed,, Delivered"", introduces, a,...","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith]
4801,126186,Shanghai Calling,[],[],"[When, ambitious, New, York, attorney, Sam, is...","[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia]


In [30]:
movies["tag"]=movies["overview"]+movies["genres"]+movies["keywords"]+movies["cast"]+movies["crew"]

In [31]:
movies["tag"][0]

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.',
 'Action',
 'Adventure',
 'Fantasy',
 'ScienceFiction',
 'cultureclash',
 'future',
 'spacewar',
 'spacecolony',
 'society',
 'spacetravel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alienplanet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'loveaffair',
 'antiwar',
 'powerrelations',
 'mindandsoul',
 '3d',
 'SamWorthington',
 'ZoeSaldana',
 'SigourneyWeaver',
 'JamesCameron']

In [32]:
new_data=movies[["movie_id","title_x","tag"]]

In [33]:
new_data["tag"][0]

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.',
 'Action',
 'Adventure',
 'Fantasy',
 'ScienceFiction',
 'cultureclash',
 'future',
 'spacewar',
 'spacecolony',
 'society',
 'spacetravel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alienplanet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'loveaffair',
 'antiwar',
 'powerrelations',
 'mindandsoul',
 '3d',
 'SamWorthington',
 'ZoeSaldana',
 'SigourneyWeaver',
 'JamesCameron']

In [34]:
# 🧹 Dropping missing or duplicate entries
new_data.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data.dropna(inplace=True)


In [35]:
new_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4800 entries, 0 to 4802
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4800 non-null   int64 
 1   title_x   4800 non-null   object
 2   tag       4800 non-null   object
dtypes: int64(1), object(2)
memory usage: 150.0+ KB


In [36]:
new_data["tag"]=new_data["tag"].apply(lambda x: " ".join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data["tag"]=new_data["tag"].apply(lambda x: " ".join(x))


In [37]:
new_data["tag"][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

In [38]:
# 📦 Importing required libraries
from sklearn.feature_extraction.text import CountVectorizer

In [39]:
# 🔢 Transforming tags into vectors and calculating cosine similarity
cv=CountVectorizer(max_features=5000 , stop_words="english")

In [40]:
# 📦 Importing required libraries
import nltk

In [41]:
# 📦 Importing required libraries
from nltk.stem.porter import PorterStemmer

In [42]:
ps=PorterStemmer()

In [43]:
def stem(text):
    lit=[]
    for i in text.split():
        lit.append(ps.stem(i))
    return " ".join(lit)

In [44]:
new_data["tag"]=new_data["tag"].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data["tag"]=new_data["tag"].apply(stem)


In [45]:
ps.stem("loving")

'love'

In [46]:
vector=cv.fit_transform(new_data["tag"]).toarray()
vector

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [47]:
vector[0].sum()

34

In [48]:
print(cv.get_stop_words())

frozenset({'if', 'my', 'fire', 'therefore', 'however', 'due', 'become', 'cry', 'afterwards', 'fifty', 'been', 'next', 'along', 'when', 'this', 'someone', 'because', 'are', 'amoungst', 'take', 'too', 'fill', 'nothing', 'myself', 'sometime', 'being', 'although', 'before', 'six', 'former', 'many', 'such', 'get', 'were', 'seeming', 'always', 'behind', 'no', 'side', 'still', 'between', 'had', 'then', 'ours', 'out', 'which', 'during', 'per', 'otherwise', 'an', 'empty', 'wherever', 'via', 'for', 'she', 'should', 'hers', 'formerly', 'why', 'may', 'what', 'until', 'put', 'in', 'the', 'onto', 'neither', 'very', 'a', 'though', 'one', 'whether', 'and', 'never', 'alone', 'enough', 'mill', 'do', 'indeed', 'almost', 'could', 'after', 'his', 'of', 'themselves', 'anything', 'sometimes', 'here', 'each', 'everything', 'three', 'with', 'would', 'perhaps', 'whatever', 'ten', 'further', 'is', 'front', 'toward', 'more', 'on', 'full', 'first', 'detail', 'hundred', 'both', 'at', 'hence', 'except', 'eight', 'ag

In [49]:
list(cv.get_feature_names_out())

['000',
 '007',
 '10',
 '100',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '17th',
 '18',
 '18th',
 '18thcenturi',
 '19',
 '1910',
 '1920',
 '1930',
 '1940',
 '1944',
 '1950',
 '1950s',
 '1960',
 '1960s',
 '1970',
 '1970s',
 '1971',
 '1974',
 '1976',
 '1980',
 '1985',
 '1990',
 '1999',
 '19th',
 '19thcenturi',
 '20',
 '200',
 '2003',
 '2009',
 '20th',
 '21st',
 '23',
 '24',
 '25',
 '30',
 '300',
 '3d',
 '40',
 '50',
 '500',
 '60',
 '70',
 '80',
 'aaron',
 'aaroneckhart',
 'abandon',
 'abduct',
 'abigailbreslin',
 'abil',
 'abl',
 'aboard',
 'abov',
 'abus',
 'academ',
 'academi',
 'accept',
 'access',
 'accid',
 'accident',
 'acclaim',
 'accompani',
 'accomplish',
 'account',
 'accus',
 'ace',
 'achiev',
 'acquaint',
 'act',
 'action',
 'actionhero',
 'activ',
 'activist',
 'activities',
 'actor',
 'actress',
 'actual',
 'ad',
 'adam',
 'adamsandl',
 'adamshankman',
 'adapt',
 'add',
 'addict',
 'adjust',
 'admir',
 'admit',
 'adolesc',
 'adopt',
 'ador',
 'adrienbrodi',
 'adult'

In [50]:
# 📦 Importing required libraries
from sklearn.metrics.pairwise import cosine_similarity

In [51]:
# 🔢 Transforming tags into vectors and calculating cosine similarity
similarity=cosine_similarity(vector)

In [52]:
similarity

array([[1.        , 0.08346223, 0.0860309 , ..., 0.04499213, 0.        ,
        0.        ],
       [0.08346223, 1.        , 0.06063391, ..., 0.02378257, 0.        ,
        0.02615329],
       [0.0860309 , 0.06063391, 1.        , ..., 0.02451452, 0.        ,
        0.        ],
       ...,
       [0.04499213, 0.02378257, 0.02451452, ..., 1.        , 0.03962144,
        0.04229549],
       [0.        , 0.        , 0.        , ..., 0.03962144, 1.        ,
        0.08714204],
       [0.        , 0.02615329, 0.        , ..., 0.04229549, 0.08714204,
        1.        ]])

In [53]:
new_data[new_data["title_x"]=="John Carter"].index[0]

4

In [54]:
new_data.title_x

0                                         Avatar
1       Pirates of the Caribbean: At World's End
2                                        Spectre
3                          The Dark Knight Rises
4                                    John Carter
                          ...                   
4798                                 El Mariachi
4799                                   Newlyweds
4800                   Signed, Sealed, Delivered
4801                            Shanghai Calling
4802                           My Date with Drew
Name: title_x, Length: 4800, dtype: object

In [55]:
sorted(list(enumerate(similarity[4])),reverse=True,key=lambda x:x[1])

[(4, 1.0000000000000002),
 (1319, 0.2942868240199161),
 (3088, 0.29238044084454384),
 (3372, 0.2819871785071246),
 (610, 0.27397779068244577),
 (1254, 0.2728884114549076),
 (27, 0.2704840614027704),
 (2427, 0.2578553115646984),
 (939, 0.25617220560270193),
 (1937, 0.25264557631995566),
 (4377, 0.25166442474422296),
 (1008, 0.2511637713025725),
 (1217, 0.2475410991021104),
 (313, 0.24655683636076894),
 (2653, 0.24655683636076894),
 (4141, 0.24188972342344836),
 (193, 0.24085022689310479),
 (4442, 0.24068559325527497),
 (2977, 0.2398005688555842),
 (1044, 0.23968063858108835),
 (3808, 0.2394469177662798),
 (2955, 0.23941550505017686),
 (2550, 0.23657281649063716),
 (20, 0.23578313267801077),
 (1433, 0.23520023158188053),
 (3223, 0.23339691733197643),
 (697, 0.23272001922932353),
 (1523, 0.23272001922932353),
 (1140, 0.23175902399174644),
 (182, 0.2315917782932927),
 (1068, 0.23063280200722128),
 (2699, 0.23063280200722128),
 (778, 0.22921641521097713),
 (495, 0.22885182707624924),
 (1307

In [56]:
# 🎯 Function to recommend similar movies based on cosine similarity
def recommender(movie_name):
    movie_index=new_data[new_data["title_x"]==movie_name].index[0]
    distance=sorted(list(enumerate(similarity[movie_index])),reverse=True,key=lambda x:x[1])[1:6]
    for i in distance:
        print(new_data.iloc[i[0],1])

In [57]:
recommender("King Kong")

Rockaway
The Bounty
20,000 Leagues Under the Sea
Master and Commander: The Far Side of the World
The Black Hole



---

## 📝 Final Report

**Objective:** Build a content-based movie recommendation system using metadata (cast, crew, keywords, etc.).

**Tools & Technologies:**
- Python, Pandas, Scikit-learn, NLTK, Seaborn

**Steps Taken:**
1. Loaded and merged `movies.csv` and `credits.csv`.
2. Extracted relevant features like genres, keywords, cast, and crew.
3. Cleaned and preprocessed data including tokenization and normalization.
4. Created a unified `tags` column representing the content description.
5. Applied `CountVectorizer` to convert tags into feature vectors.
6. Used `cosine_similarity` to compute movie similarity.
7. Built a function to return top 5 recommended movies based on input.

**Outcome:** A content-based recommender that suggests movies similar to the one entered.

**Next Steps:**
- Add collaborative filtering using user ratings
- Build a Streamlit or Flask-based UI
- Deploy as a web application

---
