# Movie Recommendation Engine

### Content Based recommendation engine:

This type of recommendation systems takes in a movie that a user currently likes. Then it analyzes the contents (genre, cast, director etc.) of the movie to find other movies with similar content. After that, it ranks similar movies according to their similarity scores and recommends movies with relative high similarity scores to the user.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
df = pd.read_csv("movies.csv")

Build feature set

In [4]:
features = ['genres', 'cast', 'director']

Combine values of these columns into a single string.

In [5]:
def combine_features(row):
    return row['genres'] + " " + row['cast'] + " " + row['director']

Clean and preprocess the data by filling all NaN value with a blank string in the dataframe. After that, we need to apply combine_features function on each row of the dataframe.

In [6]:
for feature in features:
    df[feature] = df[feature].fillna('')

df["combined_features"] = df.apply(combine_features, axis=1)

In [7]:
df.iloc[0].combined_features

'Romance Comedy  Anand Tucker Amy Adams Matthew Goode Adam Scott John Lithgow'

In [8]:
cv = CountVectorizer()
count_matrix = cv.fit_transform(df["combined_features"])

In [11]:
cosine_sim = cosine_similarity(count_matrix)

Now we will define two helper functions to get movie title from movie index and vice-versa.

In [12]:
def get_title_from_index(index):
    return df[df.index == index]["title"].values[0]

def get_index_from_title(title):
    return df[df.title == title]["index"].values[0]

Let's try this out.

In [15]:
movie = 'Leap Year'
movie_index = get_index_from_title(movie)
similar_movies = list(enumerate(cosine_sim[movie_index]))
sorted_similar_movies = sorted(similar_movies, key=lambda x:x[1], reverse=True)[1:]

In [18]:
count = 0
print("Top 5 similar movied to " + movie + " are:\n")
for m in sorted_similar_movies:
    print(get_title_from_index(m[0]))
    count = count + 1
    if(count > 5):
        break

Top 5 similar movied to Leap Year are:

To all the boys I've loved before
Love, Rosie
Letters to Juliet
Me Before You
The Silence of the Lambs
The Martin
