# Movie Recommendation

It turns out that there are (mostly) three ways to build a recommendation engine:

1. Popularity based recommendation engine
2. Content based recommendation engine
3. Collaborative filtering based recommendation engine

In this section, I'm going to implement a Content based recommendation system using the scikit-learn library.

### What is Content based recommendation engine?

<img src="https://miro.medium.com/max/828/1*1b-yMSGZ1HfxvHiJCiPV7Q.png">

This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents (genre, cast, director etc.) of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('movie_dataset.csv')
df.head(3)

Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [3]:
df.isnull().sum()

index                      0
budget                     0
genres                    28
homepage                3091
id                         0
keywords                 412
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
cast                      43
crew                       0
director                  30
dtype: int64

In [4]:
# drop missing value in selected feature that will be used
df = df.dropna(subset = ['keywords', 'cast', 'genres', 'director'])

# create new feature that combination of our selected feature
df['Criteria'] = df['keywords'].str.cat(df[['cast', 'genres', 'director']], sep = ' ')

In [5]:
# count matrix
from sklearn.feature_extraction.text import CountVectorizer
model = CountVectorizer(tokenizer = lambda i: i.split(' '))
count_matrix = model.fit_transform(df['Criteria'])

In [6]:
# cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
score = cosine_similarity(count_matrix)

In [7]:
# test model
movie_user_like = 'Avatar'
userlike1 = df[df['title']== movie_user_like]['index'].tolist()[0]
userlike = [userlike1]

In [8]:
# list all movie + cosine similarity score
user_like = list(enumerate(score[userlike1]))

In [9]:
# show 5 first datas, sorted by index
sort_movie = sorted(user_like,key = lambda i: i[1],reverse = True)

In [10]:
# Recommendation Top 5  
print("Top 5 similar movies to "+movie_user_like+" are:")
for i in range(0,5):
    if sort_movie[i][0] not in userlike:
        print('-',df['title'].iloc[sort_movie[i][0]])
    else:
        i+=5
        print('-',df['title'].iloc[sort_movie[i][0]])

print(' ')

Top 5 similar movies to Avatar are:
- Star Trek Beyond
- Guardians of the Galaxy
- Aliens
- Star Wars: Clone Wars: Volume 1
- Star Trek Into Darkness
 


After seeing the output, I went one step further to compare it to other recommendation engines.

So, I searched Google for similar movies to “Avatar” and here is what I got

<img src="https://miro.medium.com/max/960/0*iLUcfpK2XT10HEn4">

The output of simple movie recommendation engine works pretty good in a basic level implementation. I would like to close by mentioning that there is always scope for improvement, I also open to any other constructive comment that will be helped.