

# NumPy:
 - Fundamental package for scientific computing in Python.
 - Offers a powerful N-dimensional array object and tools for working with arrays.
 - Enables efficient operations on large datasets, including mathematical and statistical computations.
 - Widely used in data analysis, machine learning, and scientific research.

# Pandas:
- High-performance, easy-to-use data analysis library.
 - Provides data structures and operations for manipulating numerical tables and time series.
 - Offers data alignment and integrated handling of missing data.
 - Popular for data cleaning, data preparation, and data analysis tasks.
 - Commonly used in data science, finance, and social science applications.


In [1]:
import numpy as np
import pandas as pd

In [2]:
movies = pd.read_csv('/content/drive/MyDrive/Datasets/Movies/movies.csv')
credits = pd.read_csv('/content/drive/MyDrive/Datasets/Movies/credits.csv')

In [3]:
movies = movies.merge(credits, on='title') #Merging two datasets with common column Title

In [4]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4808 entries, 0 to 4807
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4808 non-null   int64  
 1   genres                4808 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4808 non-null   int64  
 4   keywords              4808 non-null   object 
 5   original_language     4808 non-null   object 
 6   original_title        4808 non-null   object 
 7   overview              4805 non-null   object 
 8   popularity            4808 non-null   float64
 9   production_companies  4808 non-null   object 
 10  production_countries  4808 non-null   object 
 11  release_date          4807 non-null   object 
 12  revenue               4808 non-null   int64  
 13  runtime               4806 non-null   float64
 14  spoken_languages      4808 non-null   object 
 15  status               

In [5]:
movies = movies[['movie_id', 'title', 'release_date', 'overview', 'genres', 'cast', 'crew', 'keywords']]

In [6]:
movies.head()

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na..."
2,206647,Spectre,26-10-2015,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name..."
3,49026,The Dark Knight Rises,16-07-2012,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,..."
4,49529,John Carter,07-03-2012,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":..."


In [7]:
movies.isnull().sum() #Shows if any null values exists in the dataset

movie_id        0
title           0
release_date    1
overview        3
genres          0
cast            0
crew            0
keywords        0
dtype: int64

In [8]:
movies.dropna(inplace=True)  #Drops the null values from the dataset

In [9]:
movies.duplicated().sum() #Shows if any duplicate values exists in the dataset

0

 # AST(Abstract Syntax Tree)

 The ast module in Python provides tools for parsing Python source code into an abstract syntax tree (AST).
 The AST represents the structure of the Python code in a hierarchical manner, making it easier to analyze and manipulate the code.

 Here's a brief overview of the ast library:

 1. Parsing Source Code:
 The ast.parse() function takes Python source code as input and returns an AST object.
 This object represents the structure of the code, including the different elements such as functions, classes, statements, and expressions.

 2. Traversing the AST:
 The AST object can be traversed using the ast.NodeVisitor class.
 This class provides methods for visiting each node in the tree and performing operations on them.

 3. Modifying the AST:
 The ast.NodeTransformer class can be used to modify the AST.
 It provides methods for transforming nodes in the tree, such as adding, removing, or replacing nodes.

 4. Compiling the AST:
 The ast.compile() function can be used to compile the AST back into Python source code.
 This is useful for generating new Python code based on the modified AST.

 The ast library is a powerful tool for analyzing and manipulating Python source code.
 It is commonly used in code analysis tools, code generators, and other applications that require a deep understanding of Python code structure.


In [10]:
import ast

In [11]:
def change(obj):
  List = []
  for i in ast.literal_eval(obj):
    List.append(i['name'])
  return List

In [12]:
movies['genres'] = movies['genres'].apply(change)
movies['keywords'] = movies['keywords'].apply(change)

In [14]:
movies.head()

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,26-10-2015,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,16-07-2012,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,07-03-2012,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel..."


In [15]:
def convert(obj):
  List = []
  counter = 0
  for i in ast.literal_eval(obj):
    if counter != 3:
      List.append(i['name'])
      counter += 1
    else:
      break
  return List

In [16]:
movies['cast'] = movies['cast'].apply(convert)

In [17]:
movies.head()

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,26-10-2015,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,16-07-2012,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,07-03-2012,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel..."


In [18]:
def fetchDir(obj):
  List = []
  for i in ast.literal_eval(obj):
    if i['job'] == 'Director':
      List.append(i['name'])
      break
  return List

In [19]:
movies['crew'] = movies['crew'].apply(fetchDir)

In [20]:
movies.head()

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron],"[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski],"[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,26-10-2015,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes],"[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,16-07-2012,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan],"[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,07-03-2012,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton],"[based on novel, mars, medallion, space travel..."


In [21]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [22]:
movies

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron],"[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski],"[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,26-10-2015,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes],"[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,16-07-2012,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan],"[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,07-03-2012,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton],"[based on novel, mars, medallion, space travel..."
...,...,...,...,...,...,...,...,...
4803,9367,El Mariachi,04-09-1992,"[El, Mariachi, just, wants, to, play, his, gui...","[Action, Crime, Thriller]","[Carlos Gallardo, Jaime de Hoyos, Peter Marqua...",[Robert Rodriguez],"[united states–mexico barrier, legs, arms, pap..."
4804,72766,Newlyweds,26-12-2011,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]","[Edward Burns, Kerry Bishé, Marsha Dietlein]",[Edward Burns],[]
4805,231617,"Signed, Sealed, Delivered",13-10-2013,"[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TV Movie]","[Eric Mabius, Kristin Booth, Crystal Lowe]",[Scott Smith],"[date, love at first sight, narration, investi..."
4806,126186,Shanghai Calling,03-05-2012,"[When, ambitious, New, York, attorney, Sam, is...",[],"[Daniel Henney, Eliza Coupe, Bill Paxton]",[Daniel Hsia],[]


In [23]:
movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" ","") for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" ","") for i in x])
movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" ","") for i in x])
movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" ","") for i in x])

In [24]:
movies

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords
0,19995,Avatar,10-12-2009,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[cultureclash, future, spacewar, spacecolony, ..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[ocean, drugabuse, exoticisland, eastindiatrad..."
2,206647,Spectre,26-10-2015,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[spy, basedonnovel, secretagent, sequel, mi6, ..."
3,49026,The Dark Knight Rises,16-07-2012,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[dccomics, crimefighter, terrorist, secretiden..."
4,49529,John Carter,07-03-2012,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[basedonnovel, mars, medallion, spacetravel, p..."
...,...,...,...,...,...,...,...,...
4803,9367,El Mariachi,04-09-1992,"[El, Mariachi, just, wants, to, play, his, gui...","[Action, Crime, Thriller]","[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez],"[unitedstates–mexicobarrier, legs, arms, paper..."
4804,72766,Newlyweds,26-12-2011,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]","[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],[]
4805,231617,"Signed, Sealed, Delivered",13-10-2013,"[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[date, loveatfirstsight, narration, investigat..."
4806,126186,Shanghai Calling,03-05-2012,"[When, ambitious, New, York, attorney, Sam, is...",[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],[]


In [25]:
movies['tag'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [26]:
movies

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords,tag
0,19995,Avatar,10-12-2009,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[cultureclash, future, spacewar, spacecolony, ...","[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[ocean, drugabuse, exoticisland, eastindiatrad...","[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,26-10-2015,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[spy, basedonnovel, secretagent, sequel, mi6, ...","[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,16-07-2012,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[dccomics, crimefighter, terrorist, secretiden...","[Following, the, death, of, District, Attorney..."
4,49529,John Carter,07-03-2012,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[basedonnovel, mars, medallion, spacetravel, p...","[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...,...,...,...,...,...,...
4803,9367,El Mariachi,04-09-1992,"[El, Mariachi, just, wants, to, play, his, gui...","[Action, Crime, Thriller]","[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez],"[unitedstates–mexicobarrier, legs, arms, paper...","[El, Mariachi, just, wants, to, play, his, gui..."
4804,72766,Newlyweds,26-12-2011,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]","[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],[],"[A, newlywed, couple's, honeymoon, is, upended..."
4805,231617,"Signed, Sealed, Delivered",13-10-2013,"[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[date, loveatfirstsight, narration, investigat...","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4806,126186,Shanghai Calling,03-05-2012,"[When, ambitious, New, York, attorney, Sam, is...",[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],[],"[When, ambitious, New, York, attorney, Sam, is..."


In [27]:
final = movies[['movie_id', 'title', 'tag']]

In [28]:
final

Unnamed: 0,movie_id,title,tag
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4803,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4804,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4805,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4806,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [29]:
final['tag'] = final['tag'].apply(lambda x:' '.join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final['tag'] = final['tag'].apply(lambda x:' '.join(x))


In [30]:
final

Unnamed: 0,movie_id,title,tag
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."
...,...,...,...
4803,9367,El Mariachi,El Mariachi just wants to play his guitar and ...
4804,72766,Newlyweds,A newlywed couple's honeymoon is upended by th...
4805,231617,"Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic..."
4806,126186,Shanghai Calling,When ambitious New York attorney Sam is sent t...


In [31]:
final['tag'] = final['tag'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final['tag'] = final['tag'].apply(lambda x:x.lower())


In [32]:
final

Unnamed: 0,movie_id,title,tag
0,19995,Avatar,"in the 22nd century, a paraplegic marine is di..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believed to be dead, ha..."
2,206647,Spectre,a cryptic message from bond’s past sends him o...
3,49026,The Dark Knight Rises,following the death of district attorney harve...
4,49529,John Carter,"john carter is a war-weary, former military ca..."
...,...,...,...
4803,9367,El Mariachi,el mariachi just wants to play his guitar and ...
4804,72766,Newlyweds,a newlywed couple's honeymoon is upended by th...
4805,231617,"Signed, Sealed, Delivered","""signed, sealed, delivered"" introduces a dedic..."
4806,126186,Shanghai Calling,when ambitious new york attorney sam is sent t...


# CountVectorizer

CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis).

In [33]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')

In [34]:
cv.fit_transform(final['tag']).toarray().shape

(4804, 5000)

In [35]:
vector = cv.fit_transform(final['tag']).toarray()

In [42]:
len(cv.get_feature_names_out())

5000

# NLTK(Natural Language Toolkit)
The Natural Language Toolkit (NLTK) is a Python programming environment for creating applications for statistical natural language processing (NLP).

It includes language processing libraries for tokenization, parsing, classification, stemming, labeling, and semantic reasoning. It also comes with a curriculum and even a book describing the usually presented language processing jobs NLTK offers, together with visual demos, including experimental data repositories.

In [43]:
import nltk

In [44]:
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

In [45]:
def stemming(obj):
  x = []
  for i in obj.split():
    x.append(ps.stem(i))
  return ' '.join(x)

In [46]:
final['tag'] = final['tag'].apply(stemming)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final['tag'] = final['tag'].apply(stemming)


In [47]:
from sklearn.metrics.pairwise import cosine_similarity

In [48]:
cosine_similarity(vector)

array([[1.        , 0.08858079, 0.05812382, ..., 0.02478408, 0.02739983,
        0.        ],
       [0.08858079, 1.        , 0.06350006, ..., 0.02707652, 0.        ,
        0.        ],
       [0.05812382, 0.06350006, 1.        , ..., 0.02665009, 0.        ,
        0.        ],
       ...,
       [0.02478408, 0.02707652, 0.02665009, ..., 1.        , 0.07537784,
        0.04828045],
       [0.02739983, 0.        , 0.        , ..., 0.07537784, 1.        ,
        0.05337605],
       [0.        , 0.        , 0.        , ..., 0.04828045, 0.05337605,
        1.        ]])

In [49]:
similar = cosine_similarity(vector)

In [52]:
sorted(list(enumerate(similar[0])),reverse = True, key = lambda x:x[1])[1:6]

[(539, 0.2537477434955704),
 (1194, 0.25112360116696136),
 (507, 0.24609055357847553),
 (1216, 0.24260844762468367),
 (582, 0.2372894989381248)]

In [53]:
def recommendFor(movie):
  movie_index = final[final['title'] == movie].index[0]
  distances = similar[movie_index]
  movies_list = sorted(list(enumerate(distances)),reverse = True, key = lambda x:x[1])[1:6]

  for i in movies_list:
    print(final.iloc[i[0]].title)

In [57]:
recommendFor("Avatar")

Titan A.E.
Small Soldiers
Independence Day
Aliens vs Predator: Requiem
Battle: Los Angeles


In [58]:
movies

Unnamed: 0,movie_id,title,release_date,overview,genres,cast,crew,keywords,tag
0,19995,Avatar,10-12-2009,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[cultureclash, future, spacewar, spacecolony, ...","[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,19-05-2007,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[ocean, drugabuse, exoticisland, eastindiatrad...","[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,26-10-2015,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[spy, basedonnovel, secretagent, sequel, mi6, ...","[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,16-07-2012,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[dccomics, crimefighter, terrorist, secretiden...","[Following, the, death, of, District, Attorney..."
4,49529,John Carter,07-03-2012,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[basedonnovel, mars, medallion, spacetravel, p...","[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...,...,...,...,...,...,...
4803,9367,El Mariachi,04-09-1992,"[El, Mariachi, just, wants, to, play, his, gui...","[Action, Crime, Thriller]","[CarlosGallardo, JaimedeHoyos, PeterMarquardt]",[RobertRodriguez],"[unitedstates–mexicobarrier, legs, arms, paper...","[El, Mariachi, just, wants, to, play, his, gui..."
4804,72766,Newlyweds,26-12-2011,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]","[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],[],"[A, newlywed, couple's, honeymoon, is, upended..."
4805,231617,"Signed, Sealed, Delivered",13-10-2013,"[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[date, loveatfirstsight, narration, investigat...","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4806,126186,Shanghai Calling,03-05-2012,"[When, ambitious, New, York, attorney, Sam, is...",[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],[],"[When, ambitious, New, York, attorney, Sam, is..."


In [65]:
recommendFor('Star Trek: Insurrection')

Star Trek: First Contact
Star Trek: Nemesis
Star Trek: Generations
Star Trek IV: The Voyage Home
Star Trek III: The Search for Spock
