## What Recommendation Systems are?

Recommendation Systems are a type of information filtering systems as they improve the quality of search results and provides items that are more relevant to the search item or are realted to the search history of the user.

## About This Notebook

This Notebook attempts to expain how we can make a simple content based Film Recommendation Engine.
A content Based technique is a way to find similarity based on the content of the movie a user watches.
![image.png](attachment:image.png)

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

## Loading DataSets

In [4]:
movies=pd.read_csv('tmdb_5000_movies.csv')
credits=pd.read_csv('tmdb_5000_credits.csv')

-  The first dataset contains the following features:-
    -  movie_id - A unique identifier for each movie.
    -  cast - The name of lead and supporting actors.
    -  crew - The name of Director, Editor, Composer, Writer etc.
    -  title - name of the movie
-  The second dataset has the following features:-
    -  budget - The budget in which the movie was made.
    -  genre - The genre of the movie, Action, Comedy ,Thriller etc.
    -  homepage - A link to the homepage of the movie.
    -  id - This is infact the movie_id as in the first dataset.
    -  keywords - The keywords or tags related to the movie.
    -  original_language - The language in which the movie was made.
    -  original_title - The title of the movie before translation or adaptation.
    -  overview - A brief description of the movie.
    -  popularity - A numeric quantity specifying the movie popularity.
    -  production_companies - The production house of the movie.
    -  production_countries - The country in which it was produced.
    -  release_date - The date on which it was released.
    -  revenue - The worldwide revenue generated by the movie.
    -  runtime - The running time of the movie in minutes.
    -  status - "Released" or "Rumored".
    -  tagline - Movie's tagline.
    -  title - Title of the movie.
    -  vote_average - average ratings the movie recieved.
    -  vote_count - the count of votes recieved.



In [6]:
movies.shape

(4803, 20)

In [7]:
credits.shape

(4803, 4)

In [8]:
movies.isnull().sum()#it tells us about how many null values are there in our Dataset 

budget                     0
genres                     0
homepage                3091
id                         0
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
dtype: int64

In [9]:
credits.isnull().sum()

movie_id    0
title       0
cast        0
crew        0
dtype: int64

Since there are no missing values of id in both the dataframes.Let us join both the Df's on ID
But first we have to rename the column name "**movie_id**" in credits dataframe to just "**id**" as to match to our movies dataframe. 

In [12]:
credits.rename(columns={'movie_id':'id'},inplace=True)#renaming
movies=movies.merge(credits,on='id')#merging two dataframe to one inplace algorithm 
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


## What Now?

Now we are going to read correlate each overview with one another and keep a record of it . Suppose we have to suggest 10 movies based on a movie we watched , it will map to the highly corelated 10 movies with the given particuular movie.
For the general purpose content based filtering we do not need any of the other parameters(other than "**overview**") but for a
personalized recommender we can play with our recommender to give us more suitable output . 
This notebook will implement a model which will suggest 10 movies based solely on content **i.e** it will not exclude movies on the basis of any **other variables**(rating,language,genre).

## Text to Vectors 

For our text-preprocessing part we are going to words to vectors. 
Now we'll compute Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each overview.

## Why not Bag of Words?