# <center>CONTENT BASED RECOMMENDATION SYSTEM

* Notebook by: AKSHAY BHAT
* Overview: Content based recommendation system works on the content that the user provides, which in our case is movie data.
This recommendation system makes recommendations based on the movie plot(overview) given in the overview column. So if the user inputs a movie title, the recommender recommends movies that share similar overview(plot).
* Dataset info : Pre-processed file from the TMDB 5000 Movie Dataset | https://www.kaggle.com/tmdb/tmdb-movie-metadata

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

In [2]:
data = pd.read_csv("movies1.csv")

In [3]:
data.shape

(4803, 22)

In [4]:
data.head(2)

Unnamed: 0,budget,genres,id,keywords,original_language,original_title,overview,popularity,production_companies,release_date,...,spoken_languages,tagline,vote_average,vote_count,cast,crew,weighted_average,normalized_weight,normalized_popularity,score
0,165000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 18, ""...",157336,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",en,Interstellar,Interstellar chronicles the adventures of a gr...,724.247784,"[{""name"": ""Paramount Pictures"", ""id"": 4}, {""na...",2014-11-05,...,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Mankind was born on Earth. It was never meant ...,8.1,10867,"[{""cast_id"": 9, ""character"": ""Joseph Cooper"", ...","[{""credit_id"": ""52fe4bbf9251416c910e4801"", ""de...",7.9981,0.906439,0.827162,0.866801
1,74000000,"[{""id"": 10751, ""name"": ""Family""}, {""id"": 16, ""...",211672,"[{""id"": 3487, ""name"": ""assistant""}, {""id"": 179...",en,Minions,"Minions Stuart, Kevin and Bob are recruited by...",875.581305,"[{""name"": ""Universal Pictures"", ""id"": 33}, {""n...",2015-06-17,...,"[{""iso_639_1"": ""en"", ""name"": ""English""}]","Before Gru, they had a history of bad bosses",6.4,4571,"[{""cast_id"": 22, ""character"": ""Scarlet Overkil...","[{""credit_id"": ""5431b2b10e0a2656e20026c7"", ""de...",6.365286,0.46063,1.0,0.730315


Let's have a look at the overview column

In [5]:
data.head()['overview']

0    Interstellar chronicles the adventures of a gr...
1    Minions Stuart, Kevin and Bob are recruited by...
2    Light years from Earth, 26 years after being a...
3    Deadpool tells the origin story of former Spec...
4    An apocalyptic story set in the furthest reach...
Name: overview, dtype: object

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfv = TfidfVectorizer(min_df=3,  max_features=None, 
            strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
            ngram_range=(1, 3),
            stop_words = 'english')

# Filling NaNs with empty string
data['overview'] = data['overview'].fillna('')

In [7]:
# Fitting the TF-IDF on the 'overview' text
tfv_matrix = tfv.fit_transform(data['overview'])

In [8]:
tfv_matrix.shape

(4803, 10417)

In [9]:
from sklearn.metrics.pairwise import sigmoid_kernel

# Compute the sigmoid kernel
sig = sigmoid_kernel(tfv_matrix, tfv_matrix)

In [11]:
# Reverse mapping of indices and movie titles
indices = pd.Series(data.index, index=data['original_title']).drop_duplicates()

In [12]:
indices

original_title
Interstellar                  0
Minions                       1
Guardians of the Galaxy       2
Deadpool                      3
Mad Max: Fury Road            4
                           ... 
Epic Movie                 4798
Batman & Robin             4799
The Boy Next Door          4800
Fantastic Four             4801
Dragonball Evolution       4802
Length: 4803, dtype: int64

In [13]:
def give_rec(title, sig=sig):
    # Get the index corresponding to original_title
    idx = indices[title]

    # Get the pairwsie similarity scores 
    sig_scores = list(enumerate(sig[idx]))

    # Sort the movies 
    sig_scores = sorted(sig_scores, key=lambda x: x[1], reverse=True)

    # Scores of the 10 most similar movies
    sig_scores = sig_scores[1:11]

    # Movie indices
    movie_indices = [i[0] for i in sig_scores]

    # Top 10 most similar movies
    return data['original_title'].iloc[movie_indices]

In [14]:
# Testing our content-based recommendation system 
give_rec('Spy Kids')

4606    Spy Kids 2: The Island of Lost Dreams
4764                  Spy Kids 3-D: Game Over
4719      Spy Kids: All the Time in the World
3491                               Go for It!
2830                              In Too Deep
4166                                 Mr. 3000
4116                Jimmy Neutron: Boy Genius
150                           The Incredibles
3554                     The Velocity of Gary
966                        Revolutionary Road
Name: original_title, dtype: object

In [15]:
give_rec('Avatar')

3614                Obitaemyy Ostrov
30                        The Matrix
4677                       Apollo 18
3958                    The American
4382                       Supernova
1367                Tears of the Sun
4471                         Beowulf
4611    The Adventures of Pluto Nash
4383                        Semi-Pro
470                 The Book of Life
Name: original_title, dtype: object

In [16]:
give_rec('Deadpool')

984              Underworld: Evolution
1614                     Mars Attacks!
614                       Spider-Man 2
813                            Bronson
3758                    Silent Trigger
4242                             Shaft
1996                      Ночной дозор
1219                      Spider-Man 3
572                           Superman
4732    Nutty Professor II: The Klumps
Name: original_title, dtype: object

In [17]:
give_rec('Minions')

180                           Despicable Me 2
4542                          Stuart Little 2
476                                Home Alone
2171                                  Freeway
1647                          Velvet Goldmine
4743                           Wild Wild West
3630                        Darling Companion
2015                               The Mighty
4377                           The Guilt Trip
4012    The League of Extraordinary Gentlemen
Name: original_title, dtype: object