### Content based recommendation engine:

This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents (storyline, genre, cast, director etc.) of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user.


### Finding the similarity

We know that our recommendation engine will be content based. So, we need to find similar movies to a given movie and then recommend those similar movies to the user.


In [25]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

df = pd.read_csv("movie_dataset.csv")

In [26]:
df.head(2)
df.columns

Index(['index', 'budget', 'genres', 'homepage', 'id', 'keywords',
       'original_language', 'original_title', 'overview', 'popularity',
       'production_companies', 'production_countries', 'release_date',
       'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title',
       'vote_average', 'vote_count', 'cast', 'crew', 'director'],
      dtype='object')

In [27]:
df.isna().sum()
df=df.fillna("")


In [28]:
features = ['keywords','cast','genres','director']

In [29]:
## Combining 'keywords' , 'cast' , 'genres' , 'director' columns 
def combine_features(df):
    return df['keywords']+" "+df['cast']+" "+df['genres']+" "+df['director']


## apply combined_features() method over each rows & storing combined string in "combined_features" column
df["combined_features"] = df.apply(combine_features,axis=1) 

In [62]:
df["original_title"].head(100)

0                                       Avatar
1     Pirates of the Caribbean: At World's End
2                                      Spectre
3                        The Dark Knight Rises
4                                  John Carter
                        ...                   
95                                Interstellar
96                                   Inception
97                                      シン・ゴジラ
98           The Hobbit: An Unexpected Journey
99                    The Fast and the Furious
Name: original_title, Length: 100, dtype: object

In [32]:
## Transforming Vectors out of words(Combined_features)
c_vector=CountVectorizer() 
count_matrix=c_vector.fit_transform(df["combined_features"])   # Transform "combined_features" to matrix

In [33]:
## Calculating Similarities within combined_features
cosine_sim=cosine_similarity(count_matrix)

In [36]:
## Movie title from Movie Index & vice-versa

def title_from_index(index):
    return df[df.index == index]["title"].values[0]
def index_from_title(title):
    return df[df.title == title]["index"].values[0]

In [65]:
## Input Users Input

movie_name = "xXx"

movie_index=index_from_title(movie_name)
similar_movies=list(enumerate(cosine_sim[movie_index])) 
sorted_similar_movies=sorted(similar_movies,key=lambda x:x[1],reverse=True)[1:]


In [66]:
## Shows the top 5 Recomended Movies

i=0
print("Recomended movies As "+movie_name+":\n")
for item in sorted_similar_movies:
    print(title_from_index(item[0]))
    i=i+1
    if i>5:
        break

Recomended movies As xXx:

The Fast and the Furious
The Hunt for Red October
2 Fast 2 Furious
The Jackal
Furious 7
Daylight


In [46]:
i=0
print()
for item in sorted_similar_movies:
    print(title_from_index(item[0]))
    i=i+1
    if i>5:
        break

Recomended movies As Avatar:

Aliens
Alien³
Moonraker
Planet of the Apes
Avatar
Gravity
