<div style="text-align:center">
<h1> Content Based Filtering </h1>
</div>

In the content based filtering the content of the movie (e.g. plot, actor, director,
genre, tag line, crew etc) is used to find the similarity between two content and the
based upon that movies are suggested

<img src="https://image.ibb.co/f6mDXU/conten.png" />

For the current project we will use the plot of the movie to find the similarities.

plot is given in the "Overview" column of dataset. 

In [46]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [47]:
df = pd.read_csv('../DATA/tmdb_5000_movies.csv')

In [48]:
df["overview"].head()

0    In the 22nd century, a paraplegic Marine is di...
1    Captain Barbossa, long believed to be dead, ha...
2    A cryptic message from Bond’s past sends him o...
3    Following the death of District Attorney Harve...
4    John Carter is a war-weary, former military ca...
Name: overview, dtype: object

First we have to perform Text vectorization, basically converting the text 
into numerical representation.

We have to compute word vector for each overview, we'll compute Term frequency-
Inverse Document frequency (TF-IDF).

            TF (term frequency) = term instances / total instances
It is a relative frequency of word in document. 

       IDF (inverse document frequency) = log(number of documents / documents with term)
      
It is a relative count of document containing the term

##### So the overall importance of word in each document is given by : TF*IDF

    We'll compute a matrix where all the words in overview will be represented by column,
    the should be in at least one document to be part of it where each row represents 
    movie.

    This is done to reduce the importance of the words that appear frequently in plot. 
    therefor it improvises the result in similarity score.


In [49]:
# import TfIdfVectorizer from sklearn
from sklearn.feature_extraction.text import TfidfVectorizer

# define TF-IDF and remove all the english stop words e.g. the,a,an etc
tf_idf = TfidfVectorizer(stop_words='english')

# replace Nan with empty string
df["overview"] = df["overview"].fillna('')

# fit & tranform the data

tf_idf_matrix = tf_idf.fit_transform(df["overview"])

# shape
tf_idf_matrix.shape

(4803, 20978)

In [50]:
tf_idf_matrix

<4803x20978 sparse matrix of type '<class 'numpy.float64'>'
	with 125840 stored elements in Compressed Sparse Row format>

sparse matrix is used for better storage utilization.

    Right now we have over 20,000 different words across the 4803 movies. 


With this matrix in hand we will calculate the similarity score. 
The similarity score is basically how two movies are identical to each other,
does both have similar story line ? 

We need a numeric quantity to define the similarity between two movies for that we will use the cosine similarity. Cosine similarity is independent of magnitude and it is easy to
calculate. 

It is given as - 

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0a4c9a778656537624a3303e646559a429868863" />

In [51]:
# we have used IF-IDF so calcaulation dot product will directly give use the
# similarity score, we can use linear_kernel as it is faster 

from sklearn.metrics.pairwise import linear_kernel

# compute cosine metrix
cosine_similarity = linear_kernel(tf_idf_matrix,tf_idf_matrix)

In [52]:
# we need mechanism to identify the indice of the movie title
indices = pd.Series(df.index,index=df["title"]).drop_duplicates()

In [53]:
indices.head()

title
Avatar                                      0
Pirates of the Caribbean: At World's End    1
Spectre                                     2
The Dark Knight Rises                       3
John Carter                                 4
dtype: int64

In [54]:
cosine_similarity

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.02160533, 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.01488159, 0.        ,
        0.        ],
       ...,
       [0.        , 0.02160533, 0.01488159, ..., 1.        , 0.01609091,
        0.00701914],
       [0.        , 0.        , 0.        , ..., 0.01609091, 1.        ,
        0.01171696],
       [0.        , 0.        , 0.        , ..., 0.00701914, 0.01171696,
        1.        ]])

Now we have consine similarity of each movies along with the indices 
We will create a function that will return the similar movies based on the title received

In [55]:
def get_recommendations(title_of_movie,cosine_similarity = cosine_similarity,indices=indices):
        '''
            Takes movie title as input and return the top N similar movies
        '''
    
        # get the index of the moive that matches the title
        idx = indices[title_of_movie]

        # get the pairwise similarity score of all the movies with given movie
        sim_scores = list(enumerate(cosine_similarity[idx]))

        # sort the movies based on the similarity score
        sim_scores = sorted(sim_scores,key= lambda x : x[1],reverse=True)

        # get the top 10 most similar moives
        # skip the first one, as movie will be most similar with itself
        sim_scores = sim_scores[1:11]

        # indices
        indices = [x[0] for x in sim_scores]

        return df["title"].iloc[indices]
    
    
    

In [56]:
get_recommendations("The Dark Knight Rises")

65                              The Dark Knight
299                              Batman Forever
428                              Batman Returns
1359                                     Batman
3854    Batman: The Dark Knight Returns, Part 2
119                               Batman Begins
2507                                  Slow Burn
9            Batman v Superman: Dawn of Justice
1181                                        JFK
210                              Batman & Robin
Name: title, dtype: object

In [57]:
get_recommendations("The Avengers")

7               Avengers: Age of Ultron
3144                            Plastic
1715                            Timecop
4124                 This Thing of Ours
3311              Thank You for Smoking
3033                      The Corruptor
588     Wall Street: Money Never Sleeps
2136         Team America: World Police
1468                       The Fountain
1286                        Snowpiercer
Name: title, dtype: object

#### our recommender is doing the decent job in suggesting the movies based on the plot
however, the quality of the suggestions can be improved if we consider other features as 
well, e.g. user who liked the Batman movie, may like other movies by Christopher Nolan as well

Let's try to improve the quality of recommender by adding other features as well.

## Credits, Genres and Keywords based recommendation 

The quality of the recommender can be increased, by better meta data.
We will use the top actors, director and other keywords from plot.

In [58]:
df_credits = pd.read_csv('../DATA/tmdb_5000_credits.csv')
df_credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [59]:
df = df.merge(df_credits,left_on="id",right_on="movie_id")

In [60]:
df.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title_x', 'vote_average',
       'vote_count', 'movie_id', 'title_y', 'cast', 'crew'],
      dtype='object')

In [61]:
df = df.drop(columns=["movie_id","title_y"],axis=1)

In [62]:
df = df.rename(columns=
    {"title_x":"title"}
)

In [63]:
df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [64]:
# convert stringify features into safe python structure
from ast import literal_eval

features = ["genres","keywords","cast", "crew"]

for feature in features:
    df[feature] = df[feature].apply(literal_eval)

In [65]:
df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew
0,237000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",http://www.avatarmovie.com/,19995,"[{'id': 1463, 'name': 'culture clash'}, {'id':...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,"[{'cast_id': 242, 'character': 'Jake Sully', '...","[{'credit_id': '52fe48009251416c750aca23', 'de..."
1,300000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",http://disney.go.com/disneypictures/pirates/,285,"[{'id': 270, 'name': 'ocean'}, {'id': 726, 'na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,"[{'cast_id': 4, 'character': 'Captain Jack Spa...","[{'credit_id': '52fe4232c3a36847f800b579', 'de..."
2,245000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{'id': 470, 'name': 'spy'}, {'id': 818, 'name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,"[{'cast_id': 1, 'character': 'James Bond', 'cr...","[{'credit_id': '54805967c3a36829b5002c41', 'de..."
3,250000000,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",http://www.thedarkknightrises.com/,49026,"[{'id': 849, 'name': 'dc comics'}, {'id': 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,"[{'cast_id': 2, 'character': 'Bruce Wayne / Ba...","[{'credit_id': '52fe4781c3a36847f81398c3', 'de..."
4,260000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",http://movies.disney.com/john-carter,49529,"[{'id': 818, 'name': 'based on novel'}, {'id':...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,"[{'cast_id': 5, 'character': 'John Carter', 'c...","[{'credit_id': '52fe479ac3a36847f813eaa3', 'de..."


In [66]:
df["cast"][1]

[{'cast_id': 4,
  'character': 'Captain Jack Sparrow',
  'credit_id': '52fe4232c3a36847f800b50d',
  'gender': 2,
  'id': 85,
  'name': 'Johnny Depp',
  'order': 0},
 {'cast_id': 5,
  'character': 'Will Turner',
  'credit_id': '52fe4232c3a36847f800b511',
  'gender': 2,
  'id': 114,
  'name': 'Orlando Bloom',
  'order': 1},
 {'cast_id': 6,
  'character': 'Elizabeth Swann',
  'credit_id': '52fe4232c3a36847f800b515',
  'gender': 1,
  'id': 116,
  'name': 'Keira Knightley',
  'order': 2},
 {'cast_id': 12,
  'character': 'William "Bootstrap Bill" Turner',
  'credit_id': '52fe4232c3a36847f800b52d',
  'gender': 2,
  'id': 1640,
  'name': 'Stellan Skarsgård',
  'order': 3},
 {'cast_id': 10,
  'character': 'Captain Sao Feng',
  'credit_id': '52fe4232c3a36847f800b525',
  'gender': 2,
  'id': 1619,
  'name': 'Chow Yun-fat',
  'order': 4},
 {'cast_id': 9,
  'character': 'Captain Davy Jones',
  'credit_id': '52fe4232c3a36847f800b521',
  'gender': 2,
  'id': 2440,
  'name': 'Bill Nighy',
  'order': 5

In [67]:
df["crew"][1]

[{'credit_id': '52fe4232c3a36847f800b579',
  'department': 'Camera',
  'gender': 2,
  'id': 120,
  'job': 'Director of Photography',
  'name': 'Dariusz Wolski'},
 {'credit_id': '52fe4232c3a36847f800b4fd',
  'department': 'Directing',
  'gender': 2,
  'id': 1704,
  'job': 'Director',
  'name': 'Gore Verbinski'},
 {'credit_id': '52fe4232c3a36847f800b54f',
  'department': 'Production',
  'gender': 2,
  'id': 770,
  'job': 'Producer',
  'name': 'Jerry Bruckheimer'},
 {'credit_id': '52fe4232c3a36847f800b503',
  'department': 'Writing',
  'gender': 2,
  'id': 1705,
  'job': 'Screenplay',
  'name': 'Ted Elliott'},
 {'credit_id': '52fe4232c3a36847f800b509',
  'department': 'Writing',
  'gender': 2,
  'id': 1706,
  'job': 'Screenplay',
  'name': 'Terry Rossio'},
 {'credit_id': '52fe4232c3a36847f800b57f',
  'department': 'Editing',
  'gender': 0,
  'id': 1721,
  'job': 'Editor',
  'name': 'Stephen E. Rivkin'},
 {'credit_id': '52fe4232c3a36847f800b585',
  'department': 'Editing',
  'gender': 2,
 

In [68]:
df["genres"][1]

[{'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 28, 'name': 'Action'}]

In [69]:
df["keywords"][1]

[{'id': 270, 'name': 'ocean'},
 {'id': 726, 'name': 'drug abuse'},
 {'id': 911, 'name': 'exotic island'},
 {'id': 1319, 'name': 'east india trading company'},
 {'id': 2038, 'name': "love of one's life"},
 {'id': 2052, 'name': 'traitor'},
 {'id': 2580, 'name': 'shipwreck'},
 {'id': 2660, 'name': 'strong woman'},
 {'id': 3799, 'name': 'ship'},
 {'id': 5740, 'name': 'alliance'},
 {'id': 5941, 'name': 'calypso'},
 {'id': 6155, 'name': 'afterlife'},
 {'id': 6211, 'name': 'fighter'},
 {'id': 12988, 'name': 'pirate'},
 {'id': 157186, 'name': 'swashbuckler'},
 {'id': 179430, 'name': 'aftercreditsstinger'}]

In [70]:
# get director's name from crew
def get_director(crew):
    for member in crew:
        if member['job'] == 'Director':
            return member['name']
    return np.nan
            

In [71]:
# get top 3 from the cast list
def get_top_three(cast):
    if isinstance(cast,list):
        names = [member["name"] for member in cast]
        
        # if more than three, return only three
        if len(names) > 3:
            names = names[:3]
        return names
    # if data is missing or in incorrect format, return empty list
    return []

In [72]:
df['director'] = df['crew'].apply(get_director)

In [73]:
features = ['cast', 'keywords', 'genres']

for feature in features:
    df[feature] = df[feature].apply(get_top_three)

In [74]:
df[['title', 'cast', 'keywords', 'genres', 'director']]

Unnamed: 0,title,cast,keywords,genres,director
0,Avatar,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[culture clash, future, space war]","[Action, Adventure, Fantasy]",James Cameron
1,Pirates of the Caribbean: At World's End,"[Johnny Depp, Orlando Bloom, Keira Knightley]","[ocean, drug abuse, exotic island]","[Adventure, Fantasy, Action]",Gore Verbinski
2,Spectre,"[Daniel Craig, Christoph Waltz, Léa Seydoux]","[spy, based on novel, secret agent]","[Action, Adventure, Crime]",Sam Mendes
3,The Dark Knight Rises,"[Christian Bale, Michael Caine, Gary Oldman]","[dc comics, crime fighter, terrorist]","[Action, Crime, Drama]",Christopher Nolan
4,John Carter,"[Taylor Kitsch, Lynn Collins, Samantha Morton]","[based on novel, mars, medallion]","[Action, Adventure, Science Fiction]",Andrew Stanton
...,...,...,...,...,...
4798,El Mariachi,"[Carlos Gallardo, Jaime de Hoyos, Peter Marqua...","[united states–mexico barrier, legs, arms]","[Action, Crime, Thriller]",Robert Rodriguez
4799,Newlyweds,"[Edward Burns, Kerry Bishé, Marsha Dietlein]",[],"[Comedy, Romance]",Edward Burns
4800,"Signed, Sealed, Delivered","[Eric Mabius, Kristin Booth, Crystal Lowe]","[date, love at first sight, narration]","[Comedy, Drama, Romance]",Scott Smith
4801,Shanghai Calling,"[Daniel Henney, Eliza Coupe, Bill Paxton]",[],[],Daniel Hsia


#### cleaning 

We'll create a vectorizer, but before that we have to strip all the whitespace between
name and keywords and also convert them into lower case, So that our vectorizer 
don't count Chris of "Chris evans","Chirs pratt" or "Chirs hemsworth" as same. 

In [75]:
# functions to convert string inot lowecase and strip the whitespace

def get_lower_and_strip(row):
    if isinstance(row,list):
        return [str.lower(word.replace(" ", "")) for word in row]
    else:
        # check if director exsist, else return empty string
        if isinstance(row,str):
            return [str.lower(row.replace(" ", ""))]
        else : 
            return ""

In [76]:
features = ['cast', 'keywords', 'director', 'genres']

In [77]:
#df['cast'].apply(get_lower_and_strip)

In [78]:
for feature in features:
    df[feature] = df[feature].apply(get_lower_and_strip)

In [79]:
df[features].head()

Unnamed: 0,cast,keywords,director,genres
0,"[samworthington, zoesaldana, sigourneyweaver]","[cultureclash, future, spacewar]",[jamescameron],"[action, adventure, fantasy]"
1,"[johnnydepp, orlandobloom, keiraknightley]","[ocean, drugabuse, exoticisland]",[goreverbinski],"[adventure, fantasy, action]"
2,"[danielcraig, christophwaltz, léaseydoux]","[spy, basedonnovel, secretagent]",[sammendes],"[action, adventure, crime]"
3,"[christianbale, michaelcaine, garyoldman]","[dccomics, crimefighter, terrorist]",[christophernolan],"[action, crime, drama]"
4,"[taylorkitsch, lynncollins, samanthamorton]","[basedonnovel, mars, medallion]",[andrewstanton],"[action, adventure, sciencefiction]"


##### Metadata soup

We'll create a string that will contain all the metadata that we want to feed to our
vectorizer e.g. name, actor, director

In [80]:
def create_soup(row):
    return ' '.join(row['keywords']) + ' ' + ' '.join(row['cast']) + ' ' + ' '.join(row['director']) + ' ' +' '.join(row['genres'])

In [81]:
# df.iloc[1]
create_soup(df.iloc[1])

'ocean drugabuse exoticisland johnnydepp orlandobloom keiraknightley goreverbinski adventure fantasy action'

In [82]:
df['soup'] = df.apply(create_soup,axis=1)

In [83]:
df['soup'].head()

0    cultureclash future spacewar samworthington zo...
1    ocean drugabuse exoticisland johnnydepp orland...
2    spy basedonnovel secretagent danielcraig chris...
3    dccomics crimefighter terrorist christianbale ...
4    basedonnovel mars medallion taylorkitsch lynnc...
Name: soup, dtype: object

We have to convert our soup data into word vector, for that we will use CountVectorizer(). 
We should not use the IF-IDF as it will reduce the importance of the actor/directors who 
had worked on multiple movies, that is something that we don't want for this usecase.

In [84]:
from sklearn.feature_extraction.text import CountVectorizer

count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(df['soup'])

In [85]:
count_matrix

<4803x11520 sparse matrix of type '<class 'numpy.int64'>'
	with 42935 stored elements in Compressed Sparse Row format>

In [86]:
# compute the cosine similarity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity

soup_cosine_similarity = cosine_similarity(count_matrix,count_matrix)

In [87]:
df = df.reset_index()

In [88]:
indices = pd.Series(df.index,index=df['title'])

In [89]:
get_recommendations('The Dark Knight Rises',soup_cosine_similarity)

65               The Dark Knight
119                Batman Begins
4638    Amidst the Devil's Wings
1196                The Prestige
3073           Romeo Is Bleeding
3326              Black November
1503                      Takers
1986                      Faster
303                     Catwoman
747               Gangster Squad
Name: title, dtype: object

In [90]:
get_recommendations('The Avengers',soup_cosine_similarity)

7                  Avengers: Age of Ultron
26              Captain America: Civil War
79                              Iron Man 2
169     Captain America: The First Avenger
174                    The Incredible Hulk
85     Captain America: The Winter Soldier
31                              Iron Man 3
33                   X-Men: The Last Stand
68                                Iron Man
94                 Guardians of the Galaxy
Name: title, dtype: object

In [None]:
# Thank You!