### Project

- Review an already written code for the recommender system. Understand each line and do a bit of data visualisation. Also use other features for the recommendation. Maybe also get a different dataset from kaggle and try and replicate the same code with those dataset?

# Recommender System

In [2]:
#Importing the libraries

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity



### Step 1 : Read csv file containing dataset

In [3]:
df = pd.read_csv("movie_dataset.csv")
print (df.columns) #Print all the columns of the dataset to get a feel of the data
#df.info() #Prints information about the dataset
#df.head()

Index(['index', 'budget', 'genres', 'homepage', 'id', 'keywords',
       'original_language', 'original_title', 'overview', 'popularity',
       'production_companies', 'production_countries', 'release_date',
       'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title',
       'vote_average', 'vote_count', 'cast', 'crew', 'director'],
      dtype='object')


### Step 2: Select Features

In [4]:
features = ['keywords','cast','genres','director']

##Step 3: Create a column in DF which combines all selected features
for feature in features:
	df[feature] = df[feature].fillna('')

def combine_features(row):
	try:
		return row['keywords'] +" "+row['cast']+" "+row["genres"]+" "+row["director"]
	except:
		print ("Error:", row)	

df["combined_features"] = df.apply(combine_features,axis=1)

df["combined_features"].head()


0    culture clash future space war space colony so...
1    ocean drug abuse exotic island east india trad...
2    spy based on novel secret agent sequel mi6 Dan...
3    dc comics crime fighter terrorist secret ident...
4    based on novel mars medallion space travel pri...
Name: combined_features, dtype: object

In [5]:
first_element = df.iloc[0]['combined_features']  # get the first element as a Pandas Series
#print (first_element)


### Step 4: Create count matrix from this new combined column

In [6]:
cv = CountVectorizer()

count_matrix = cv.fit_transform(df["combined_features"])


`count_matrix` gives us a sparse matrix. To make it in human readable form, we need to apply `toarrray()` method over it. And before printing out this `count_matrix`, let us first print out the feature list(or, word list), which have been fed to our `CountVectorizer()` object.

In [16]:
feature_names = list(cv.vocabulary_.keys()) #Print the vocabulary of the count matrix or the feature list
array = count_matrix.toarray()

In [17]:
feature_names[0:10] #Print the first 10 features

['culture',
 'clash',
 'future',
 'space',
 'war',
 'colony',
 'society',
 'sam',
 'worthington',
 'zoe']

In [9]:
##Step 5: Compute the Cosine Similarity based on the count_matrix
cosine_sim = cosine_similarity(count_matrix) 
movie_user_likes = "Avatar"

## Step 6: Get index of this movie from its title
def get_title_from_index(index):
	return df[df.index == index]["title"].values[0]

def get_index_from_title(title):
	return df[df.title == title]["index"].values[0]

movie_index = get_index_from_title(movie_user_likes)

similar_movies =  list(enumerate(cosine_sim[movie_index]))

## Step 7: Get a list of similar movies in descending order of similarity score
sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)

## Step 8: Print titles of first 10 movies
i=0
for element in sorted_similar_movies:
		print (get_title_from_index(element[0]))
		i=i+1
		if i>10:
			break

Avatar
Guardians of the Galaxy
Aliens
Star Wars: Clone Wars: Volume 1
Star Trek Into Darkness
Star Trek Beyond
Alien
Lockout
Jason X
The Helix... Loaded
Moonraker


In [13]:
a =  get_index_from_title("Aliens") #get index of movie with the title

In [14]:
movie_index = df[df.title == "Aliens"]["index"].values[0] #
