## Film Finder

This is a movie recommendation application.
It does the following things:
1. xyz
1. abc


In [4]:
import numpy as np
import pandas as pd
import difflib # Will compare the movies and find the most similar ones
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#### Reading Comma Separated Values(CSV) input Data as a pandas DataFrame(df)

In [5]:
csv_path = "./dataset/TMDB_movie_dataset_v11.csv"
movies_df = pd.read_csv(csv_path)
movies_df.head()

Unnamed: 0,id,title,vote_average,vote_count,status,release_date,revenue,runtime,adult,backdrop_path,...,original_title,overview,popularity,poster_path,tagline,genres,production_companies,production_countries,spoken_languages,keywords
0,27205,Inception,8.364,34495,Released,2010-07-15,825532764,148,False,/8ZTVqvKDQ8emSGUEMjsS4yHAwrp.jpg,...,Inception,"Cobb, a skilled thief who commits corporate es...",83.952,/oYuLEt3zVCKq57qu2F8dT7NIa6f.jpg,Your mind is the scene of the crime.,"Action, Science Fiction, Adventure","Legendary Pictures, Syncopy, Warner Bros. Pict...","United Kingdom, United States of America","English, French, Japanese, Swahili","rescue, mission, dream, airplane, paris, franc..."
1,157336,Interstellar,8.417,32571,Released,2014-11-05,701729206,169,False,/pbrkL804c8yAv3zBZR4QPEafpAR.jpg,...,Interstellar,The adventures of a group of explorers who mak...,140.241,/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg,Mankind was born on Earth. It was never meant ...,"Adventure, Drama, Science Fiction","Legendary Pictures, Syncopy, Lynda Obst Produc...","United Kingdom, United States of America",English,"rescue, future, spacecraft, race against time,..."
2,155,The Dark Knight,8.512,30619,Released,2008-07-16,1004558444,152,False,/nMKdUUepR0i5zn0y1T4CsSB5chy.jpg,...,The Dark Knight,Batman raises the stakes in his war on crime. ...,130.643,/qJ2tW6WMUDux911r6m7haRef0WH.jpg,Welcome to a world without rules.,"Drama, Action, Crime, Thriller","DC Comics, Legendary Pictures, Syncopy, Isobel...","United Kingdom, United States of America","English, Mandarin","joker, sadism, chaos, secret identity, crime f..."
3,19995,Avatar,7.573,29815,Released,2009-12-15,2923706026,162,False,/vL5LR6WdxWPjLPFRLe133jXWsh5.jpg,...,Avatar,"In the 22nd century, a paraplegic Marine is di...",79.932,/kyeqWdyUXW608qlYkRqosgbbJyK.jpg,Enter the world of Pandora.,"Action, Adventure, Fantasy, Science Fiction","Dune Entertainment, Lightstorm Entertainment, ...","United States of America, United Kingdom","English, Spanish","future, society, culture clash, space travel, ..."
4,24428,The Avengers,7.71,29166,Released,2012-04-25,1518815515,143,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,...,The Avengers,When an unexpected enemy emerges and threatens...,98.082,/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg,Some assembly required.,"Science Fiction, Action, Adventure",Marvel Studios,United States of America,"English, Hindi, Russian","new york city, superhero, shield, based on com..."


#### Performing Exploratory Data Analysis (EDA)

In [6]:
movies_df.shape

(1124758, 24)

In [7]:
movies_df.columns

Index(['id', 'title', 'vote_average', 'vote_count', 'status', 'release_date',
       'revenue', 'runtime', 'adult', 'backdrop_path', 'budget', 'homepage',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'tagline', 'genres',
       'production_companies', 'production_countries', 'spoken_languages',
       'keywords'],
      dtype='object')

In [19]:
selected_columns = ['id', 'title','runtime','production_companies','genres','popularity']
selected_movies_df = movies_df[selected_columns] 
selected_movies_df.head()

Unnamed: 0,id,title,runtime,production_companies,genres,popularity
0,27205,Inception,148,"Legendary Pictures, Syncopy, Warner Bros. Pict...","Action, Science Fiction, Adventure",83.952
1,157336,Interstellar,169,"Legendary Pictures, Syncopy, Lynda Obst Produc...","Adventure, Drama, Science Fiction",140.241
2,155,The Dark Knight,152,"DC Comics, Legendary Pictures, Syncopy, Isobel...","Drama, Action, Crime, Thriller",130.643
3,19995,Avatar,162,"Dune Entertainment, Lightstorm Entertainment, ...","Action, Adventure, Fantasy, Science Fiction",79.932
4,24428,The Avengers,143,Marvel Studios,"Science Fiction, Action, Adventure",98.082


In [9]:
selected_movies_df.shape

(1124758, 6)

In [10]:
selected_movies_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1124758 entries, 0 to 1124757
Data columns (total 6 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   id                    1124758 non-null  int64  
 1   title                 1124745 non-null  object 
 2   runtime               1124758 non-null  int64  
 3   production_companies  508773 non-null   object 
 4   genres                675874 non-null   object 
 5   popularity            1124758 non-null  float64
dtypes: float64(1), int64(2), object(3)
memory usage: 51.5+ MB


In [20]:
for selected_column in selected_columns:
    selected_movies_df[selected_column] = selected_movies_df[selected_column].fillna('')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_movies_df[selected_column] = selected_movies_df[selected_column].fillna('')


In [17]:
type(movies_df['id'])

pandas.core.series.Series

In [13]:
type(selected_columns)

list

In [18]:
type(selected_movies_df)

pandas.core.series.Series

In [21]:
selected_movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1124758 entries, 0 to 1124757
Data columns (total 6 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   id                    1124758 non-null  int64  
 1   title                 1124758 non-null  object 
 2   runtime               1124758 non-null  int64  
 3   production_companies  1124758 non-null  object 
 4   genres                1124758 non-null  object 
 5   popularity            1124758 non-null  float64
dtypes: float64(1), int64(2), object(3)
memory usage: 51.5+ MB
