# Problem Statement

* Develope a content-based recommender system using the genres and/or descriptions.
* Identify the main content available on the streaming.
* Perform Exploratory data analysis to find interesting insights.

[You can download the dataset from here.](https://www.kaggle.com/datasets/victorsoeiro/netflix-tv-shows-and-movies)

In [1]:
# # This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/netflix-tv-shows-and-movies/credits.csv
/kaggle/input/netflix-tv-shows-and-movies/titles.csv


In [2]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.tokenize import word_tokenize

In [3]:
credits = pd.read_csv('/kaggle/input/netflix-tv-shows-and-movies/credits.csv')
titles = pd.read_csv('/kaggle/input/netflix-tv-shows-and-movies/titles.csv')

In [4]:
credits.head()

Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR
2,7064,tm84618,Albert Brooks,Tom,ACTOR
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR


In [5]:
titles.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.6,
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.01,7.3
3,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,"['fantasy', 'action', 'comedy']",['GB'],,tt0071853,8.2,534486.0,15.461,7.811
4,tm120801,The Dirty Dozen,MOVIE,12 American military prisoners in World War II...,1967,,150,"['war', 'action']","['GB', 'US']",,tt0061578,7.7,72662.0,20.398,7.6


In [6]:
titles.shape, credits.shape

((5850, 15), (77801, 5))

### About the features

#### Features in titles.csv

* **id**: The title ID on JustWatch.
* **title**: The name of the title.
* **show_type**: TV show or movie.
* **description**: A brief description.
* **release_year**: The release year.
* **age_certification**: The age certification.
* **runtime**: The length of the episode (SHOW) or movie.
* **genres**: A list of genres.
* **production_countries**: A list of countries that produced the title.
* **seasons**: Number of seasons if it's a SHOW.
* **imdb_id**: The title ID on IMDB.
* **imdb_score**: Score on IMDB.
* **imdb_votes**: Votes on IMDB.
* **tmdb_popularity**: Popularity on TMDB.
* **tmdb_score**: Score on TMDB.

#### Features in credits.csv

* **person_ID**: The person ID on JustWatch.
* **id**: The title ID on JustWatch.
* **name**: The actor or director's name.
* **character_name**: The character name.
* **role**: ACTOR or DIRECTOR.

In [7]:
df = pd.merge(credits,titles,on='id',how='left')
df.head()

Unnamed: 0,person_id,id,name,character,role,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
2,7064,tm84618,Albert Brooks,Tom,ACTOR,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179


In [8]:
required_columns = ['person_id','name','character','title','description','genres','imdb_score','imdb_votes']
df = df[required_columns]
df.head()

Unnamed: 0,person_id,name,character,title,description,genres,imdb_score,imdb_votes
0,3748,Robert De Niro,Travis Bickle,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']",8.2,808582.0
1,14658,Jodie Foster,Iris Steensma,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']",8.2,808582.0
2,7064,Albert Brooks,Tom,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']",8.2,808582.0
3,3739,Harvey Keitel,Matthew 'Sport' Higgins,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']",8.2,808582.0
4,48933,Cybill Shepherd,Betsy,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']",8.2,808582.0


In [9]:
titles.shape

(5850, 15)

In [10]:
titles.imdb_votes.fillna(0, inplace=True)
titles.description.fillna(' ', inplace=True)
titles.fillna(0, inplace=True)

# Search Based Recommendation System

In [11]:
tfidf = TfidfVectorizer(stop_words='english')
matrix = tfidf.fit_transform(titles['description'])
cosine_sim = cosine_similarity(matrix, matrix)

In [12]:
cosine_sim.shape

(5850, 5850)

In [13]:
def search(string):

    index = titles[titles['title']==string].index
    all_movies = []

    for i in index:
        scores = list(enumerate(cosine_sim[i]))
        scores = sorted(scores,key=lambda x: x[1], reverse=True)[0:11]
        movies = [titles.iloc[n]['title'] for n,j in scores]
        all_movies.extend(movies)
    return all_movies

In [14]:
search('Avatar: The Last Airbender')

['Avatar: The Last Airbender',
 'The Legend of Korra',
 'Blood and Bone',
 'The Dragon Prince',
 'Vivo',
 'Five Came Back',
 'Shadow and Bone',
 'Violet Evergarden: The Movie',
 'The Worthy',
 'The Giver',
 'The Liberator']

Here we go! Avatar the last air bender is an animated series. The Legend Of Korra is a sequel to the show and Violet Evergarden and The Dragon Prince are also animated movies. Let's try some more.

In [15]:
search("Monty Python's Flying Circus")

["Monty Python's Flying Circus",
 'Standup and Away! with Brian Regan',
 'Monty Python Conquers America',
 'Parrot Sketch Not Included: Twenty Years of Monty Python',
 'I Think You Should Leave with Tim Robinson',
 'The Who Was? Show',
 'Shor in the City',
 'Plastic Cup Boyz: Laughing My Mask Off!',
 'Hot Date',
 'Horrid Henry',
 'All That']

In [16]:
search('Violet Evergarden: The Movie')

['Violet Evergarden: The Movie',
 'Violet Evergarden: Eternity and the Auto Memories Doll',
 'Gunjan Saxena: The Kargil Girl',
 'Violet Evergarden',
 'Nappily Ever After',
 'The End',
 'Five Came Back',
 'Going for Gold',
 'Avatar: The Last Airbender',
 'Becoming',
 'The Haunting in Connecticut 2: Ghosts of Georgia']

These are some of the recommendations. If I want, I can set a threshold to imdb_score and votings to show only the shows that have best score.

# Keywords, Character, Actor and Genre based Recommendation System