# Recommender systems
Recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries).

## 1.Importing Necessary Libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
###### helper functions. Use them when needed #######
def get_title_from_index(index):
	return df[df.index == index]["title"].values[0]

def get_index_from_title(title):
	return df[df.title == title]["index"].values[0]

## 2. Read CSV File



In [6]:
df = pd.read_csv("movie_dataset.csv")

In [7]:
df.head()

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


## 3.Select Fearutes and Combine it into a new dataframe

In [11]:

features = ['keywords','cast','genres','director']
for feature in features:
	df[feature] = df[feature].fillna('')

def combine_features(row):
	try:
		return row['keywords'] +" "+row['cast']+" "+row["genres"]+" "+row["director"]
	except:
		print("Error:", row	) 

df["combined_features"] = df.apply(combine_features,axis=1)

df["combined_features"].head()

0    culture clash future space war space colony so...
1    ocean drug abuse exotic island east india trad...
2    spy based on novel secret agent sequel mi6 Dan...
3    dc comics crime fighter terrorist secret ident...
4    based on novel mars medallion space travel pri...
Name: combined_features, dtype: object

## 3.Vectorization
In order to use textual data for predictive modeling, the text must be parsed to remove certain words – this process is called tokenization. These words need to then be encoded as integers, or floating-point values, for use as inputs in machine learning algorithms. This process is called feature extraction (or vectorization).

Scikit-learn’s CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the ​pre-processing of text data prior to generating the vector representation. This functionality makes it a highly flexible feature representation module for text.

In [34]:

cv = CountVectorizer()
count_matrix = cv.fit_transform(df["combined_features"])


In [35]:
count_matrix.shape

(4803, 14845)

In [36]:
count_matrix[1]

<1x14845 sparse matrix of type '<class 'numpy.int64'>'
	with 30 stored elements in Compressed Sparse Row format>

## 4.Cosine similarity
Cosine similarity is the normalised dot product between two vectors. I guess it is called "cosine" similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them.

In [30]:
##Cosine Similarity 
cosine_sim = cosine_similarity(count_matrix) 


## 5.Getting Users similar Likes 

In [38]:
#Get index of this movie from its title
movie_user_likes = "The Time Machine"
movie_index = get_index_from_title(movie_user_likes)
similar_movies =  list(enumerate(cosine_sim[movie_index]))
similar_movies

[(0, 0.22075539284417395),
 (1, 0.0837707816583391),
 (2, 0.09567297464698797),
 (3, 0.04499212706658475),
 (4, 0.1835325870964494),
 (5, 0.0917662935482247),
 (6, 0.0),
 (7, 0.20942695414584772),
 (8, 0.04891159880445184),
 (9, 0.0936585811581694),
 (10, 0.1835325870964494),
 (11, 0.0917662935482247),
 (12, 0.09567297464698797),
 (13, 0.0917662935482247),
 (14, 0.17342199390482396),
 (15, 0.04499212706658475),
 (16, 0.20602141085758227),
 (17, 0.09782319760890368),
 (18, 0.29319773580418684),
 (19, 0.09782319760890368),
 (20, 0.0936585811581694),
 (21, 0.1404878717372541),
 (22, 0.05263157894736841),
 (23, 0.0512989176042577),
 (24, 0.08111071056538126),
 (25, 0.0),
 (26, 0.17342199390482396),
 (27, 0.1873171623163388),
 (28, 0.1873171623163388),
 (29, 0.09782319760890368),
 (30, 0.0899842541331695),
 (31, 0.2752988806446741),
 (32, 0.04335549847620599),
 (33, 0.17342199390482396),
 (34, 0.0),
 (35, 0.20025046972870353),
 (36, 0.2341464528954235),
 (37, 0.0512989176042577),
 (38, 0.08

In [41]:
#Get a list of similar movies in descending order of similarity score
sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)

In [42]:
## Print titles of first 50 movies
i=0
for element in sorted_similar_movies:
		print (get_title_from_index(element[0]))
		i=i+1
		if i>50:
			break

The Time Machine
The Helix... Loaded
Men in Black 3
Red Planet
Lockout
Iron Man 3
In Time
Star Trek Into Darkness
Hot Tub Time Machine
Star Trek Beyond
Timecop
Captain America: The Winter Soldier
Star Trek
Resident Evil: Retribution
Resident Evil: Apocalypse
Babylon A.D.
Star Wars
G.I. Joe: The Rise of Cobra
Prometheus
The Fifth Element
U.F.O.
Mad Max 2: The Road Warrior
Transformers: Age of Extinction
Mad Max: Fury Road
Sherlock Holmes
A Sound of Thunder
Looper
The Empire Strikes Back
Divergent
Timeline
Congo
The Covenant
Project Almanac
Green Lantern
Sky Captain and the World of Tomorrow
9
Damnation Alley
Time Changer
Avatar
Terminator Genisys
Knowing
Bill & Ted's Bogus Journey
Beastmaster 2: Through the Portal of Time
The Matrix Revolutions
Sheena
The Lovers
X-Men: Days of Future Past
Return of the Jedi
Journey to Saturn
Six-String Samurai
Safety Not Guaranteed
