Intusion:-

1. User enters the name of his favourite movies
2. He is suggested the movies based on similar genre and other parameters

Recommendation Systems
1. Content based | content watched similar genre
2. popularity based
3. Collaborative | Groups based on watching pattern (your friends watched ...)

**Workflow:-**
- Data -> Director, Genre
- Data Pre processing
1. Feature extraction
2. Similarity score
3. User input and suggestion with cosine similarity


Importing Dependencies

In [85]:
import numpy as np
import pandas as pd
import difflib                   # User input (Close match of the name and compare with names)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Data Collection & Processing

In [86]:
# Loading the data

movies_data = pd.read_csv('/content/movies (2).csv')

In [87]:
movies_data.head()

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


In [88]:
movies_data.shape

(4803, 24)

In [89]:
# Checking for null values

movies_data.isnull().sum()

index                      0
budget                     0
genres                    28
homepage                3091
id                         0
keywords                 412
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
cast                      43
crew                       0
director                  30
dtype: int64

In [90]:
# Our system is content and popularity based so we need the select the relevant columns

selected_features = ['genres','keywords','tagline','cast','director']
print(selected_features)

['genres', 'keywords', 'tagline', 'cast', 'director']


In [91]:
# replacing the NULL values with NULL string | for the selected columns

for feature in selected_features:
  movies_data[feature] = movies_data[feature].fillna('')

In [92]:
# Combining all the selected features
# Concatinating the features

combined_features = movies_data['genres']+' '+movies_data['keywords']+' '+movies_data['tagline']+' '+movies_data['cast']+' '+movies_data['director']

In [93]:
print(selected_features)

['genres', 'keywords', 'tagline', 'cast', 'director']


In [94]:
# Converting text data to feature vectors

vectorizer = TfidfVectorizer()

In [95]:
feature_vectors = vectorizer.fit_transform(combined_features)

In [96]:
print(feature_vectors)

  (0, 2432)	0.17272411194153
  (0, 7755)	0.1128035714854756
  (0, 13024)	0.1942362060108871
  (0, 10229)	0.16058685400095302
  (0, 8756)	0.22709015857011816
  (0, 14608)	0.15150672398763912
  (0, 16668)	0.19843263965100372
  (0, 14064)	0.20596090415084142
  (0, 13319)	0.2177470539412484
  (0, 17290)	0.20197912553916567
  (0, 17007)	0.23643326319898797
  (0, 13349)	0.15021264094167086
  (0, 11503)	0.27211310056983656
  (0, 11192)	0.09049319826481456
  (0, 16998)	0.1282126322850579
  (0, 15261)	0.07095833561276566
  (0, 4945)	0.24025852494110758
  (0, 14271)	0.21392179219912877
  (0, 3225)	0.24960162956997736
  (0, 16587)	0.12549432354918996
  (0, 14378)	0.33962752210959823
  (0, 5836)	0.1646750903586285
  (0, 3065)	0.22208377802661425
  (0, 3678)	0.21392179219912877
  (0, 5437)	0.1036413987316636
  :	:
  (4801, 17266)	0.2886098184932947
  (4801, 4835)	0.24713765026963996
  (4801, 403)	0.17727585190343226
  (4801, 6935)	0.2886098184932947
  (4801, 11663)	0.21557500762727902
  (4801, 1672

Cosine Similarity

In [97]:
# Getting similarity score

similarity = cosine_similarity(feature_vectors)
print(similarity)

[[1.         0.07219487 0.037733   ... 0.         0.         0.        ]
 [0.07219487 1.         0.03281499 ... 0.03575545 0.         0.        ]
 [0.037733   0.03281499 1.         ... 0.         0.05389661 0.        ]
 ...
 [0.         0.03575545 0.         ... 1.         0.         0.02651502]
 [0.         0.         0.05389661 ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.02651502 0.         1.        ]]


In [98]:
similarity.shape

(4803, 4803)

In [99]:
# Taking user input

movie_name = input('Enter your favourite movie name :')

Enter your favourite movie name :alien


In [100]:
# Creating a list with all the movie names & compare with user value

list_of_all_titles = movies_data['title'].tolist()
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

In [101]:
# Finding the close match for the movie name

find_best_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_best_match)

['Alien', 'Alien³', 'Aliens']


In [102]:
# We need only 1 close match

best_match = find_best_match[0]
print(best_match)

Alien


In [103]:
# Locate the movie based on the title

movie_index = movies_data[movies_data.title == best_match]['index'].values[0]
print(movie_index)

# Locating the name of the movie and finding its index value
# Now use the index value to find similar movies based on similarity score

3158


In [104]:
# Getting the list of similar values
# similar movies will have high similarity score wrt user entered movie
# enumerate runs a loop in a list

similarity_score = list(enumerate(similarity[movie_index]))
print(similarity_score)

[(0, 0.24946766307532411), (1, 0.015779236647552503), (2, 0.03612759739989673), (3, 0.028635320615871873), (4, 0.16107142159805013), (5, 0.01872824835470054), (6, 0.0), (7, 0.02465761661549906), (8, 0.015596028639645541), (9, 0.005380521315099234), (10, 0.027476677492651404), (11, 0.009368460619628413), (12, 0.005570915405867712), (13, 0.005029762702786647), (14, 0.050933013972209076), (15, 0.0), (16, 0.02338066201972841), (17, 0.005329227319724689), (18, 0.032930875479148185), (19, 0.004731489194622565), (20, 0.020971727305445793), (21, 0.07793617020617961), (22, 0.0), (23, 0.0), (24, 0.005147269374451373), (25, 0.005124686680977586), (26, 0.025448804729144948), (27, 0.02688997293489526), (28, 0.029522017004874574), (29, 0.011033905786519933), (30, 0.02833822173273775), (31, 0.024007261524334984), (32, 0.01854065831910909), (33, 0.02961006794312044), (34, 0.009244467546758856), (35, 0.02572350124859141), (36, 0.023178927407014348), (37, 0.010555287569379172), (38, 0.03907902726665918)

In [105]:
len(similarity_score)

# Sort this based on high -> low similarity score

sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) # reverse = True => Descending
print(sorted_similar_movies)

[(3158, 1.0), (1531, 0.27695879799110396), (2403, 0.27282302322765406), (838, 0.2683980715052248), (278, 0.25265563528089535), (0, 0.24946766307532411), (239, 0.23611322481971045), (1053, 0.20583807316416527), (770, 0.2030780062429142), (2696, 0.20123563770177924), (1473, 0.19900612671101547), (1951, 0.19782478235215215), (2695, 0.1913894251697834), (1354, 0.18337533039871987), (365, 0.18313514669281056), (2015, 0.17358032878438012), (94, 0.1722944041751902), (4332, 0.1703857693604622), (4, 0.16107142159805013), (643, 0.15957089563446852), (1650, 0.15949014610922413), (3730, 0.15673062864649356), (4108, 0.1531649745674611), (740, 0.15095450235744454), (1914, 0.14348266800886922), (1990, 0.1422508425754015), (1318, 0.1416422341849889), (305, 0.13895550792252648), (4400, 0.1385858817745686), (4401, 0.13661287345904274), (4225, 0.13622758556148848), (461, 0.13392507429944336), (150, 0.12996594997292404), (3116, 0.12935178419209123), (1275, 0.12892640667155572), (541, 0.12885117675606775),

In [106]:
# Printing the names of similar movies based on index:-

print('Movies suggested for you : \n')

i = 1

for movie in sorted_similar_movies:
  index = movie[0]
  title_from_index = movies_data[movies_data.index==index]['title'].values[0]
  if (i<30):
    print(i, '.',title_from_index)
    i+=1

Movies suggested for you : 

1 . Alien
2 . Moonraker
3 . Aliens
4 . Alien³
5 . Planet of the Apes
6 . Avatar
7 . Gravity
8 . Galaxy Quest
9 . Event Horizon
10 . Jason X
11 . The Astronaut's Wife
12 . Space Dogs
13 . I Think I Love My Wife
14 . Space Chimps
15 . Contact
16 . Spaceballs
17 . Guardians of the Galaxy
18 . Silent Running
19 . John Carter
20 . Space Cowboys
21 . Wing Commander
22 . Cargo
23 . In the Shadow of the Moon
24 . Alien: Resurrection
25 . Lifeforce
26 . The Empire Strikes Back
27 . The Thing
28 . Treasure Planet
29 . Sparkler
