<font size="3"> **Importing the Libraries** </font>

In [1]:
import numpy as np

In [2]:
import pandas as pd

In [3]:
import difflib

In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [5]:
from sklearn.metrics.pairwise import cosine_similarity 

<font size="3"> **Data Collection & Pre-Processing** </font>

**Loading the data from the CSV file to a PANDAS Dataframe**

In [6]:
movies_data = pd.read_csv('/Users/aarsh/Desktop/Movie Recommendation System/Data Set/movies.csv')

**Checking the First 5 Rows of the Dataframe**

In [7]:
movies_data.head()

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


**Checking Number of Rows & Columns in the Dataframe**

In [8]:
movies_data.shape

(4803, 24)

**Selecting Relevant Features (Columns) for Recommendation**

In [9]:
selected_features = ['genres', 'keywords', 'tagline', 'title', 'cast', 'director']

In [10]:
print(selected_features)

['genres', 'keywords', 'tagline', 'title', 'cast', 'director']


**Replacing the Null Values in Selected Columns with Null String (FOR LOOP)**

In [11]:
for feature in selected_features:
    movies_data[feature] = movies_data[feature].fillna('')

**Combining (Concatenating) all the 5 Selected Features (Columns)**

In [12]:
combined_features = movies_data['genres'] + ' ' + movies_data['keywords'] + ' ' + movies_data['tagline'] + ' ' + movies_data['title'] + ' ' + movies_data['cast'] + ' ' + movies_data['director']

In [13]:
print(combined_features)

0       Action Adventure Fantasy Science Fiction cultu...
1       Adventure Fantasy Action ocean drug abuse exot...
2       Action Adventure Crime spy based on novel secr...
3       Action Crime Drama Thriller dc comics crime fi...
4       Action Adventure Science Fiction based on nove...
                              ...                        
4798    Action Crime Thriller united states\u2013mexic...
4799    Comedy Romance  A newlywed couple's honeymoon ...
4800    Comedy Drama Romance TV Movie date love at fir...
4801      A New Yorker in Shanghai Shanghai Calling Da...
4802    Documentary obsession camcorder crush dream gi...
Length: 4803, dtype: object


**Converting Text Data to Numerical Data (Feature Vectors)**

In [14]:
vectorizer = TfidfVectorizer()

In [15]:
feature_vectors = vectorizer.fit_transform(combined_features)

**Converted Text Data (Combined Features) to Numerical Data (Combined Features)**

In [16]:
print(feature_vectors)

  (0, 2720)	0.16779665077750835
  (0, 8651)	0.10950771233518673
  (0, 14400)	0.18869504935935053
  (0, 11288)	0.15600564366704217
  (0, 9709)	0.22061174669983702
  (0, 16170)	0.1471845509560594
  (0, 18477)	0.19277176743945276
  (0, 15542)	0.2000852661458036
  (0, 14718)	0.21153518149440184
  (0, 19148)	0.19502634639381392
  (0, 18845)	0.22968831190527222
  (0, 14749)	0.14568185359096344
  (0, 1183)	0.2771429775697421
  (0, 12700)	0.2552737122112953
  (0, 12356)	0.0757123618230531
  (0, 18835)	0.12179929157015998
  (0, 16904)	0.05365726945306951
  (0, 5519)	0.22061174669983702
  (0, 15785)	0.2044948505609709
  (0, 3587)	0.24248101205828376
  (0, 18385)	0.12067236175146148
  (0, 15901)	0.3255851082321633
  (0, 6519)	0.15884357175977004
  (0, 3408)	0.21574818782392274
  (0, 4103)	0.20781904654682548
  :	:
  (4801, 7731)	0.25682086501772416
  (4801, 12869)	0.1918304797254379
  (4801, 1880)	0.1383742924803071
  (4801, 12062)	0.1190062927991386
  (4801, 8323)	0.094776382167548
  (4801, 4234

<font size="3"> **COSINE Similarity** </font>

**Getting Similarity Scores (Confidence Value) using Cosine Similarity**

In [17]:
similarity = cosine_similarity(feature_vectors)

In [18]:
print(similarity)

[[1.         0.07294698 0.03533251 ... 0.         0.         0.        ]
 [0.07294698 1.         0.02792771 ... 0.04419983 0.         0.        ]
 [0.03533251 0.02792771 1.         ... 0.         0.04636139 0.        ]
 ...
 [0.         0.04419983 0.         ... 1.         0.         0.05551043]
 [0.         0.         0.04636139 ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.05551043 0.         1.        ]]


In [19]:
print(similarity.shape)

(4803, 4803)


<font size="3"> **User Input** </font>

**Asking the Movie Name from the User**

In [20]:
movie_name = input('Enter Name of the Movie: ')

Enter Name of the Movie: iron man 


**Creating a list of Movie Name from our CSV file for comparing with the User Input Movie Name**

In [21]:
list_of_all_titles = movies_data['title'].tolist()

**All 4803 Movie Names we have in our Dataframe (CSV)**

In [22]:
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

**Finding the Closest Match for the User Input Movie**

In [23]:
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)

In [24]:
print(find_close_match)

['Iron Man 3', 'Iron Man 2', 'Iron Man']


**Getting just 1 Value out of all the Closest Matches found**

In [25]:
close_match = find_close_match[0]

In [26]:
print(close_match)

Iron Man 3


**Finding the Index of the User Input Movie Name from our List (CSV)**

In [27]:
index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]

In [28]:
print(index_of_the_movie)

31


**Getting List of Similar Movies based on Index Number we have & Similarity Score**

In [29]:
similarity_score = list(enumerate(similarity[index_of_the_movie]))

**Comparision of User Input Movie with all the Movies in the list (CSV) as of with Similarity Score Value is obtained**

In [30]:
print(similarity_score)

[(0, 0.05340354529079786), (1, 0.03491968024279245), (2, 0.023581977612045567), (3, 0.055647611162272966), (4, 0.042983757393603564), (5, 0.03562131495974751), (6, 0.0076125433428773336), (7, 0.19714182650430656), (8, 0.012810360893783405), (9, 0.04809760282674853), (10, 0.04220276502188202), (11, 0.011568209008581246), (12, 0.03226267858273752), (13, 0.02147233015890163), (14, 0.10748355263858196), (15, 0.020547923543034127), (16, 0.20790865838849198), (17, 0.026147968796938578), (18, 0.05042109920055349), (19, 0.03406498150598573), (20, 0.10333321349024224), (21, 0.010918386291460793), (22, 0.019258887047195482), (23, 0.01906361479096136), (24, 0.04343922852485216), (25, 0.009371918439671702), (26, 0.20310933498385425), (27, 0.035959494717843564), (28, 0.038939430424471316), (29, 0.02400483888109696), (30, 0.09632475579350715), (31, 1.0000000000000002), (32, 0.014697957369346138), (33, 0.13953656867539782), (34, 0.0), (35, 0.038072465274508534), (36, 0.03946331526278513), (37, 0.0127

In [31]:
len(similarity_score)

4803

**Sorting the Similarity Score obtained from High to Low , to recommend Similar Movies**

In [32]:
sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True)

In [33]:
print(sorted_similar_movies)

[(31, 1.0000000000000002), (79, 0.5329843461479505), (68, 0.35885416360040373), (16, 0.20790865838849198), (26, 0.20310933498385425), (7, 0.19714182650430656), (182, 0.16861307886358345), (511, 0.1449239976849786), (33, 0.13953656867539782), (38, 0.13947983864256103), (356, 0.13643772916398803), (46, 0.13500467510859668), (64, 0.1335630857092072), (203, 0.1316404054308858), (174, 0.12925251916247507), (2625, 0.126233090221626), (607, 0.12586705737534967), (85, 0.12348870191625315), (2487, 0.12212602514492878), (466, 0.12209707158301437), (1439, 0.12131579653038137), (126, 0.11899117075444422), (2651, 0.11850108447902731), (1180, 0.11832552544327252), (788, 0.11628386950846692), (505, 0.11439076452077881), (977, 0.11366733238990834), (205, 0.11299324347965223), (169, 0.11278004219425282), (101, 0.11173085442489493), (2548, 0.11019800332996746), (232, 0.10908205057028116), (320, 0.10844531949775721), (2357, 0.1084002697200079), (14, 0.10748355263858196), (122, 0.107262203881333), (1931, 

**Getting Names of Similar Movies as from User Input Movie**

In [34]:
print('Top 30 Suggested Movies : \n')

i = 1

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = movies_data[movies_data.index == index]['title'].values[0]
    if (i < 31):
        print(i, '.', title_from_index)
        i += 1

Top 30 Suggested Movies : 

1 . Iron Man 3
2 . Iron Man 2
3 . Iron Man
4 . The Avengers
5 . Captain America: Civil War
6 . Avengers: Age of Ultron
7 . Ant-Man
8 . X-Men
9 . X-Men: The Last Stand
10 . The Amazing Spider-Man 2
11 . Sherlock Holmes
12 . X-Men: Days of Future Past
13 . X-Men: Apocalypse
14 . X2
15 . The Incredible Hulk
16 . Kiss Kiss Bang Bang
17 . Sky Captain and the World of Tomorrow
18 . Captain America: The Winter Soldier
19 . Duets
20 . The Time Machine
21 . Lions for Lambs
22 . Thor: The Dark World
23 . The Good Night
24 . Shallow Hal
25 . Deadpool
26 . The League of Extraordinary Gentlemen
27 . The Iron Giant
28 . Sherlock Holmes: A Game of Shadows
29 . Captain America: The First Avenger
30 . X-Men: First Class


<font size="3"> **Movie Recommendation System** </font>

In [35]:
movie_name = input('Enter Name of the Movie: ')

list_of_all_titles = movies_data['title'].tolist()

find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)

close_match = find_close_match[0]

index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]

similarity_score = list(enumerate(similarity[index_of_the_movie]))

sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True)

print('Top 30 Suggested Movies : \n')

i = 1

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = movies_data[movies_data.index == index]['title'].values[0]
    if (i < 31):
        print(i, '.', title_from_index)
        i += 1

Enter Name of the Movie: karate kid
Top 30 Suggested Movies : 

1 . The Karate Kid
2 . Pat Garrett & Billy the Kid
3 . Joe Somebody
4 . Dragonball Evolution
5 . The Last Samurai
6 . 2016: Obama's America
7 . Royal Kill
8 . Double Impact
9 . Lone Star
10 . The Man from Snowy River
11 . March of the Penguins
12 . The Call of Cthulhu
13 . My Cousin Vinny
14 . Renaissance Man
15 . Coal Miner's Daughter
16 . The Aviator
17 . Talladega Nights: The Ballad of Ricky Bobby
18 . The Raven
19 . The Color of Money
20 . The Outsiders
21 . The Lost Medallion: The Adventures of Billy Stone
22 . Old Dogs
23 . The Ballad of Jack and Rose
24 . Lies in Plain Sight
25 . Rocky
26 . The Legend of the Lone Ranger
27 . Beer League
28 . Spy Kids
29 . Harry Potter and the Order of the Phoenix
30 . Heavenly Creatures
