# Purpose

Students have the option of using Pandas and SQL to complete this project. A Phase 1 project done completely in pandas, completely in SQL, or a mixture of the two can all be considered a fine Phase 1 project. This notebook serves as a resource for setting up an SQL database for students who wish to use SQL in their Phase 1 project.

**To create the `movies.db` database, run the code cells below.**
> The Entity Relational Diagram is below

In [1]:
from src.make_db import create_movies_db

In [2]:
create_movies_db()

imdb_title_principals table created successfully....
imdb_name_basic table created successfully....
imdb_title_crew table created successfully....
imdb_title_ratings table created successfully....
imdb_title_basics table created successfully....
imdb_title_akas table created successfully....
tn_movie_budgets table created successfully....
tmdb_movies table created successfully....
bom_movie_gross table created successfully....
rotten_tomatoes_critic_reviews table created successfully....
rotten_tomatoes_movies table created successfully....
Inserting data into the imdb_title_crew table....
Inserting data into the tmdb_movies table....
Inserting data into the imdb_title_akas table....
Inserting data into the imdb_title_ratings table....
Inserting data into the imdb_name_basics table....
Inserting data into the rotten_tomatoes_movies table....
Inserting data into the rotten_tomatoes_critic_reviews table....
Inserting data into the imdb_title_basics table....
Inserting data into the tn_mo

![movies.db schema](images/movies_db_schema.png)

In [3]:
import os
import sqlite3
import pandas as pd
# Open up a connection
conn = sqlite3.connect('data/movies.db')
# Initialize a cursor
cursor = conn.cursor()

In [4]:
budgets = pd.read_csv('data/zippedData/tn.movie_budgets.csv.gz')
budgets.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"


In [5]:
budgets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 5782 non-null   int64 
 1   release_date       5782 non-null   object
 2   movie              5782 non-null   object
 3   production_budget  5782 non-null   object
 4   domestic_gross     5782 non-null   object
 5   worldwide_gross    5782 non-null   object
dtypes: int64(1), object(5)
memory usage: 271.2+ KB


In [6]:
budgets['domestic_gross'] = budgets['domestic_gross'].replace({'\$': '', ',': ''}, regex=True).astype(float)
budgets['worldwide_gross'] = budgets['worldwide_gross'].replace({'\$': '', ',': ''}, regex=True).astype(float)
budgets['production_budget'] = budgets['production_budget'].replace({'\$': '', ',': ''}, regex=True).astype(float)

In [7]:
budgets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 5782 non-null   int64  
 1   release_date       5782 non-null   object 
 2   movie              5782 non-null   object 
 3   production_budget  5782 non-null   float64
 4   domestic_gross     5782 non-null   float64
 5   worldwide_gross    5782 non-null   float64
dtypes: float64(3), int64(1), object(2)
memory usage: 271.2+ KB


In [8]:
pop_query = '''SELECT *
                        FROM tmdb_movies
                        ORDER BY `popularity`+0 DESC; '''
pd.read_sql(pop_query, conn)

Unnamed: 0,idx,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,23811,"[12, 28, 14]",299536,en,Avengers: Infinity War,80.773,2018-04-27,Avengers: Infinity War,8.3,13948
1,11019,"[28, 53]",245891,en,John Wick,78.123,2014-10-24,John Wick,7.2,10081
2,23812,"[28, 12, 16, 878, 35]",324857,en,Spider-Man: Into the Spider-Verse,60.534,2018-12-14,Spider-Man: Into the Spider-Verse,8.4,4048
3,11020,"[28, 12, 14]",122917,en,The Hobbit: The Battle of the Five Armies,53.783,2014-12-17,The Hobbit: The Battle of the Five Armies,7.3,8392
4,5179,"[878, 28, 12]",24428,en,The Avengers,50.289,2012-05-04,The Avengers,7.6,19673
...,...,...,...,...,...,...,...,...,...,...
26512,26512,"[27, 18]",488143,en,Laboratory Conditions,0.600,2018-10-13,Laboratory Conditions,0.0,1
26513,26513,"[18, 53]",485975,en,_EXHIBIT_84xxx_,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1
26514,26514,"[14, 28, 12]",381231,en,The Last One,0.600,2018-10-01,The Last One,0.0,1
26515,26515,"[10751, 12, 28]",366854,en,Trailer Made,0.600,2018-06-22,Trailer Made,0.0,1


In [9]:
va_query = '''SELECT *
                        FROM tmdb_movies
                        ORDER BY `vote_average`+0 DESC; '''
pd.read_sql(va_query, conn)

Unnamed: 0,idx,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,770,"[28, 80, 18, 53]",51488,en,Full Love,2.288,2010-01-01,Full Love,10.0,1
1,1154,[16],130974,en,A Cloudy Lesson,1.374,2010-04-01,A Cloudy Lesson,10.0,1
2,1230,[],371702,en,All That Glitters,1.241,2010-09-25,All That Glitters,10.0,1
3,1277,[18],62503,en,Almost Kings,1.154,2010-11-11,Almost Kings,10.0,2
4,1296,[35],140489,en,The Mother Of Invention,1.124,2010-06-25,The Mother Of Invention,10.0,1
...,...,...,...,...,...,...,...,...,...,...
26512,26512,"[27, 18]",488143,en,Laboratory Conditions,0.600,2018-10-13,Laboratory Conditions,0.0,1
26513,26513,"[18, 53]",485975,en,_EXHIBIT_84xxx_,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1
26514,26514,"[14, 28, 12]",381231,en,The Last One,0.600,2018-10-01,The Last One,0.0,1
26515,26515,"[10751, 12, 28]",366854,en,Trailer Made,0.600,2018-06-22,Trailer Made,0.0,1


In [10]:
table_name_query = """SELECT * 
                      FROM tn_movie_budgets 
                      ORDER BY `production_budget`+0 DESC;"""

pd.read_sql(table_name_query, conn)

Unnamed: 0,idx,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"
...,...,...,...,...,...,...,...
5777,5777,78,"Dec 31, 2018",Red 11,"$7,000",$0,$0
5778,5778,79,"Apr 2, 1999",Following,"$6,000","$48,482","$240,495"
5779,5779,80,"Jul 13, 2005",Return to the Land of Wonders,"$5,000","$1,338","$1,338"
5780,5780,81,"Sep 29, 2015",A Plague So Pleasant,"$1,400",$0,$0


In [11]:
budgets['end_gross'] = budgets['domestic_gross'] + budgets['worldwide_gross'] - budgets['production_budget'] 
budgets.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2776345000.0,3111853000.0
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,410600000.0,241063875.0,1045664000.0,876127800.0
2,3,"Jun 7, 2019",Dark Phoenix,350000000.0,42762350.0,149762400.0,-157475300.0
3,4,"May 1, 2015",Avengers: Age of Ultron,330600000.0,459005868.0,1403014000.0,1531420000.0
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,317000000.0,620181382.0,1316722000.0,1619903000.0


In [12]:
budgets = budgets.sort_values(by='end_gross', ascending=False)
budgets

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
5,6,"Dec 18, 2015",Star Wars Ep. VII: The Force Awakens,306000000.0,936662225.0,2.053311e+09,2.683973e+09
42,43,"Dec 19, 1997",Titanic,200000000.0,659363944.0,2.208208e+09,2.667572e+09
6,7,"Apr 27, 2018",Avengers: Infinity War,300000000.0,678815482.0,2.048134e+09,2.426950e+09
33,34,"Jun 12, 2015",Jurassic World,215000000.0,652270625.0,1.648855e+09,2.086125e+09
...,...,...,...,...,...,...,...
480,81,"Dec 31, 2019",Army of the Dead,90000000.0,0.0,0.000000e+00,-9.000000e+07
479,80,"Dec 13, 2017",Bright,90000000.0,0.0,0.000000e+00,-9.000000e+07
341,42,"Jun 14, 2019",Men in Black: International,110000000.0,3100000.0,3.100000e+06,-1.038000e+08
194,95,"Dec 31, 2020",Moonfall,150000000.0,0.0,0.000000e+00,-1.500000e+08


In [13]:
budgets = budgets.sort_values(by='id')
budgets

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
5200,1,"May 14, 2003",Dracula: Pages from a Virgin's Diary,1100000.0,39659.0,8.137100e+04,-9.789700e+05
2300,1,"Dec 7, 2005",The World's Fastest Indian,25000000.0,5128124.0,1.899129e+07,-8.805880e+05
1500,1,"Mar 9, 2012",A Thousand Words,40000000.0,18450127.0,2.079049e+07,-7.593870e+05
5400,1,"Dec 14, 2018",That Way Madness Lies,650000.0,1447.0,1.447000e+03,-6.471060e+05
...,...,...,...,...,...,...,...
4199,100,"Mar 7, 1965",The Train,5800000.0,6800000.0,6.800000e+06,7.800000e+06
3899,100,"Jun 12, 2015",Me and Earl and the Dying Girl,8000000.0,6758416.0,9.266180e+06,8.024596e+06
4399,100,"Oct 27, 2015",Running Forever,5000000.0,0.0,0.000000e+00,-5.000000e+06
5699,100,"Aug 30, 1972",The Last House on the Left,87000.0,3100000.0,3.100000e+06,6.113000e+06


In [14]:
type(budgets)

pandas.core.frame.DataFrame

In [15]:
budgets.to_sql

<bound method NDFrame.to_sql of        id  release_date                                 movie  \
0       1  Dec 18, 2009                                Avatar   
5200    1  May 14, 2003  Dracula: Pages from a Virgin's Diary   
2300    1   Dec 7, 2005            The World's Fastest Indian   
1500    1   Mar 9, 2012                      A Thousand Words   
5400    1  Dec 14, 2018                 That Way Madness Lies   
...   ...           ...                                   ...   
4199  100   Mar 7, 1965                             The Train   
3899  100  Jun 12, 2015        Me and Earl and the Dying Girl   
4399  100  Oct 27, 2015                       Running Forever   
5699  100  Aug 30, 1972            The Last House on the Left   
3099  100  Dec 18, 1985                                Brazil   

      production_budget  domestic_gross  worldwide_gross     end_gross  
0           425000000.0     760507625.0     2.776345e+09  3.111853e+09  
5200          1100000.0         39659.0  

In [27]:
stream_platform = pd.read_csv('data/MoviesOnStreamingPlatforms_updated.csv')
stream_platform

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
0,0,1,Inception,2010,13+,8.8,87%,1,0,0,0,0,Christopher Nolan,"Action,Adventure,Sci-Fi,Thriller","United States,United Kingdom","English,Japanese,French",148.0
1,1,2,The Matrix,1999,18+,8.7,87%,1,0,0,0,0,"Lana Wachowski,Lilly Wachowski","Action,Sci-Fi",United States,English,136.0
2,2,3,Avengers: Infinity War,2018,13+,8.5,84%,1,0,0,0,0,"Anthony Russo,Joe Russo","Action,Adventure,Sci-Fi",United States,English,149.0
3,3,4,Back to the Future,1985,7+,8.5,96%,1,0,0,0,0,Robert Zemeckis,"Adventure,Comedy,Sci-Fi",United States,English,116.0
4,4,5,"The Good, the Bad and the Ugly",1966,18+,8.8,97%,1,0,1,0,0,Sergio Leone,Western,"Italy,Spain,West Germany",Italian,161.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16739,16739,16740,The Ghosts of Buxley Hall,1980,,6.2,,0,0,0,1,0,Bruce Bilson,"Comedy,Family,Fantasy,Horror",United States,English,120.0
16740,16740,16741,The Poof Point,2001,7+,4.7,,0,0,0,1,0,Neal Israel,"Comedy,Family,Sci-Fi",United States,English,90.0
16741,16741,16742,Sharks of Lost Island,2013,,5.7,,0,0,0,1,0,Neil Gelinas,Documentary,United States,English,
16742,16742,16743,Man Among Cheetahs,2017,,6.6,,0,0,0,1,0,Richard Slater-Jones,Documentary,United States,English,


In [28]:
stream_platform

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
0,0,1,Inception,2010,13+,8.8,87%,1,0,0,0,0,Christopher Nolan,"Action,Adventure,Sci-Fi,Thriller","United States,United Kingdom","English,Japanese,French",148.0
1,1,2,The Matrix,1999,18+,8.7,87%,1,0,0,0,0,"Lana Wachowski,Lilly Wachowski","Action,Sci-Fi",United States,English,136.0
2,2,3,Avengers: Infinity War,2018,13+,8.5,84%,1,0,0,0,0,"Anthony Russo,Joe Russo","Action,Adventure,Sci-Fi",United States,English,149.0
3,3,4,Back to the Future,1985,7+,8.5,96%,1,0,0,0,0,Robert Zemeckis,"Adventure,Comedy,Sci-Fi",United States,English,116.0
4,4,5,"The Good, the Bad and the Ugly",1966,18+,8.8,97%,1,0,1,0,0,Sergio Leone,Western,"Italy,Spain,West Germany",Italian,161.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16739,16739,16740,The Ghosts of Buxley Hall,1980,,6.2,,0,0,0,1,0,Bruce Bilson,"Comedy,Family,Fantasy,Horror",United States,English,120.0
16740,16740,16741,The Poof Point,2001,7+,4.7,,0,0,0,1,0,Neal Israel,"Comedy,Family,Sci-Fi",United States,English,90.0
16741,16741,16742,Sharks of Lost Island,2013,,5.7,,0,0,0,1,0,Neil Gelinas,Documentary,United States,English,
16742,16742,16743,Man Among Cheetahs,2017,,6.6,,0,0,0,1,0,Richard Slater-Jones,Documentary,United States,English,


In [29]:
stream_platform = stream_platform.dropna(subset = ['Rotten Tomatoes'])
stream_platform

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
0,0,1,Inception,2010,13+,8.8,87%,1,0,0,0,0,Christopher Nolan,"Action,Adventure,Sci-Fi,Thriller","United States,United Kingdom","English,Japanese,French",148.0
1,1,2,The Matrix,1999,18+,8.7,87%,1,0,0,0,0,"Lana Wachowski,Lilly Wachowski","Action,Sci-Fi",United States,English,136.0
2,2,3,Avengers: Infinity War,2018,13+,8.5,84%,1,0,0,0,0,"Anthony Russo,Joe Russo","Action,Adventure,Sci-Fi",United States,English,149.0
3,3,4,Back to the Future,1985,7+,8.5,96%,1,0,0,0,0,Robert Zemeckis,"Adventure,Comedy,Sci-Fi",United States,English,116.0
4,4,5,"The Good, the Bad and the Ugly",1966,18+,8.8,97%,1,0,1,0,0,Sergio Leone,Western,"Italy,Spain,West Germany",Italian,161.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16671,16671,16672,George of the Jungle 2,2003,7+,3.3,17%,0,0,0,1,0,David Grossman,"Adventure,Comedy,Family","United States,Australia",English,87.0
16677,16677,16678,That Darn Cat,1997,7+,4.7,13%,0,0,0,1,0,Robert Stevenson,"Comedy,Crime,Family,Thriller",United States,"English,French",116.0
16687,16687,16688,Kazaam,1996,7+,3.0,6%,0,0,0,1,0,Paul Michael Glaser,"Comedy,Family,Fantasy,Musical",United States,English,93.0
16705,16705,16706,Meet the Deedles,1998,7+,4.1,7%,0,0,0,1,0,Steve Boyum,"Comedy,Family",United States,English,93.0


In [30]:
stream_platform.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5158 entries, 0 to 16719
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       5158 non-null   int64  
 1   ID               5158 non-null   int64  
 2   Title            5158 non-null   object 
 3   Year             5158 non-null   int64  
 4   Age              3371 non-null   object 
 5   IMDb             5156 non-null   float64
 6   Rotten Tomatoes  5158 non-null   object 
 7   Netflix          5158 non-null   int64  
 8   Hulu             5158 non-null   int64  
 9   Prime Video      5158 non-null   int64  
 10  Disney+          5158 non-null   int64  
 11  Type             5158 non-null   int64  
 12  Directors        5052 non-null   object 
 13  Genres           5153 non-null   object 
 14  Country          5139 non-null   object 
 15  Language         5111 non-null   object 
 16  Runtime          5122 non-null   float64
dtypes: float64(2)

In [31]:
stream_platform['Title']

0                             Inception
1                            The Matrix
2                Avengers: Infinity War
3                    Back to the Future
4        The Good, the Bad and the Ugly
                      ...              
16671            George of the Jungle 2
16677                     That Darn Cat
16687                            Kazaam
16705                  Meet the Deedles
16719                        Pocahontas
Name: Title, Length: 5158, dtype: object

In [32]:
stream_title = stream_platform['Title']

In [33]:
budgets['movie']

0                                     Avatar
5200    Dracula: Pages from a Virgin's Diary
2300              The World's Fastest Indian
1500                        A Thousand Words
5400                   That Way Madness Lies
                        ...                 
4199                               The Train
3899          Me and Earl and the Dying Girl
4399                         Running Forever
5699              The Last House on the Left
3099                                  Brazil
Name: movie, Length: 5782, dtype: object

In [34]:
data_title = budgets['movie']

In [35]:
stream_platform[stream_platform['Title'].isin(data_title)]


Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
0,0,1,Inception,2010,13+,8.8,87%,1,0,0,0,0,Christopher Nolan,"Action,Adventure,Sci-Fi,Thriller","United States,United Kingdom","English,Japanese,French",148.0
1,1,2,The Matrix,1999,18+,8.7,87%,1,0,0,0,0,"Lana Wachowski,Lilly Wachowski","Action,Sci-Fi",United States,English,136.0
2,2,3,Avengers: Infinity War,2018,13+,8.5,84%,1,0,0,0,0,"Anthony Russo,Joe Russo","Action,Adventure,Sci-Fi",United States,English,149.0
3,3,4,Back to the Future,1985,7+,8.5,96%,1,0,0,0,0,Robert Zemeckis,"Adventure,Comedy,Sci-Fi",United States,English,116.0
6,6,7,The Pianist,2002,18+,8.5,95%,1,0,1,0,0,Roman Polanski,"Biography,Drama,Music,War","United Kingdom,France,Poland,Germany","English,German,Russian",150.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16552,16552,16553,Confessions of a Teenage Drama Queen,2004,7+,4.6,14%,0,0,0,1,0,Sara Sugarman,"Comedy,Family,Music,Romance","Germany,United States",English,89.0
16633,16633,16634,The Country Bears,2002,all,4.1,29%,0,0,0,1,0,Peter Hastings,"Comedy,Family,Music,Musical",United States,English,88.0
16643,16643,16644,Doug's 1st Movie,1999,all,5.0,26%,0,0,0,1,0,Maurice Joyce,"Animation,Adventure,Comedy,Drama,Family,Fantas...",United States,English,77.0
16705,16705,16706,Meet the Deedles,1998,7+,4.1,7%,0,0,0,1,0,Steve Boyum,"Comedy,Family",United States,English,93.0


In [36]:
streammoviedata = stream_platform[stream_platform['Title'].isin(data_title)]

In [37]:
streammovie_titles = streammoviedata['Title']
streammovie_titles

0                                   Inception
1                                  The Matrix
2                      Avengers: Infinity War
3                          Back to the Future
6                                 The Pianist
                         ...                 
16552    Confessions of a Teenage Drama Queen
16633                       The Country Bears
16643                        Doug's 1st Movie
16705                        Meet the Deedles
16719                              Pocahontas
Name: Title, Length: 988, dtype: object

In [38]:
stream_platform[~stream_platform['Title'].isin(data_title)]


Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
4,4,5,"The Good, the Bad and the Ugly",1966,18+,8.8,97%,1,0,1,0,0,Sergio Leone,Western,"Italy,Spain,West Germany",Italian,161.0
5,5,6,Spider-Man: Into the Spider-Verse,2018,7+,8.4,97%,1,0,0,0,0,"Bob Persichetti,Peter Ramsey,Rodney Rothman","Animation,Action,Adventure,Family,Sci-Fi",United States,"English,Spanish",117.0
11,11,12,3 Idiots,2009,13+,8.4,100%,1,0,1,0,0,Rajkumar Hirani,"Comedy,Drama",India,"Hindi,English",170.0
12,12,13,Pan's Labyrinth,2006,18+,8.2,95%,1,0,0,0,0,Guillermo del Toro,"Drama,Fantasy,War","Mexico,Spain",Spanish,118.0
18,18,19,The King's Speech,2010,18+,8.0,95%,1,0,0,0,0,Tom Hooper,"Biography,Drama,History","United Kingdom,United States,Australia",English,118.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16637,16637,16638,Inspector Gadget 2,2003,all,3.4,40%,0,0,0,1,0,Alex Zamm,"Action,Comedy,Crime,Family,Sci-Fi",United States,English,89.0
16657,16657,16658,A Kid in King Arthur's Court,1995,7+,4.7,5%,0,0,0,1,0,Michael Gottlieb,"Adventure,Comedy,Family,Fantasy,Romance","United States,Hungary,United Kingdom",English,89.0
16671,16671,16672,George of the Jungle 2,2003,7+,3.3,17%,0,0,0,1,0,David Grossman,"Adventure,Comedy,Family","United States,Australia",English,87.0
16677,16677,16678,That Darn Cat,1997,7+,4.7,13%,0,0,0,1,0,Robert Stevenson,"Comedy,Crime,Family,Thriller",United States,"English,French",116.0


In [39]:
budgets[budgets['movie'].isin(streammovie_titles)]

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
4300,1,"Apr 10, 1981",Nighthawks,5000000.0,14600000.0,1.960000e+07,2.920000e+07
800,1,"Nov 21, 2012",Red Dawn,65000000.0,44806783.0,4.816415e+07,2.797093e+07
100,1,"May 29, 2009",Up,175000000.0,293004164.0,7.314634e+08,8.494675e+08
5700,1,"May 30, 2008",The Foot Fist Way,79000.0,234286.0,2.342860e+05,3.895720e+05
...,...,...,...,...,...,...,...
4699,100,"Nov 20, 1987",Teen Wolf Too,3000000.0,7888000.0,7.888000e+06,1.277600e+07
899,100,"Dec 25, 2018",Vice,60000000.0,47836282.0,7.088317e+07,5.871945e+07
3699,100,"Oct 4, 2013",Parkland,10000000.0,641439.0,1.616353e+06,-7.742208e+06
3499,100,"Jul 6, 2016",Sultan,11000000.0,5599781.0,7.298978e+07,6.758956e+07


In [40]:
budgets[~budgets['movie'].isin(streammovie_titles)]

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
5200,1,"May 14, 2003",Dracula: Pages from a Virgin's Diary,1100000.0,39659.0,81371.0,-978970.0
2300,1,"Dec 7, 2005",The World's Fastest Indian,25000000.0,5128124.0,18991288.0,-880588.0
1500,1,"Mar 9, 2012",A Thousand Words,40000000.0,18450127.0,20790486.0,-759387.0
5400,1,"Dec 14, 2018",That Way Madness Lies,650000.0,1447.0,1447.0,-647106.0
5600,1,"Feb 24, 2015",Give Me Shelter,250000.0,0.0,0.0,-250000.0
...,...,...,...,...,...,...,...
4199,100,"Mar 7, 1965",The Train,5800000.0,6800000.0,6800000.0,7800000.0
3899,100,"Jun 12, 2015",Me and Earl and the Dying Girl,8000000.0,6758416.0,9266180.0,8024596.0
4399,100,"Oct 27, 2015",Running Forever,5000000.0,0.0,0.0,-5000000.0
5699,100,"Aug 30, 1972",The Last House on the Left,87000.0,3100000.0,3100000.0,6113000.0


In [41]:
df = budgets.loc[(budgets['domestic_gross'] > 0.0) & (budgets['worldwide_gross'] > 0.0)]
df

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
5200,1,"May 14, 2003",Dracula: Pages from a Virgin's Diary,1100000.0,39659.0,8.137100e+04,-9.789700e+05
2300,1,"Dec 7, 2005",The World's Fastest Indian,25000000.0,5128124.0,1.899129e+07,-8.805880e+05
1500,1,"Mar 9, 2012",A Thousand Words,40000000.0,18450127.0,2.079049e+07,-7.593870e+05
5400,1,"Dec 14, 2018",That Way Madness Lies,650000.0,1447.0,1.447000e+03,-6.471060e+05
...,...,...,...,...,...,...,...
1199,100,"Jan 30, 2004",The Big Bounce,50000000.0,6471394.0,6.626115e+06,-3.690249e+07
4199,100,"Mar 7, 1965",The Train,5800000.0,6800000.0,6.800000e+06,7.800000e+06
3899,100,"Jun 12, 2015",Me and Earl and the Dying Girl,8000000.0,6758416.0,9.266180e+06,8.024596e+06
5699,100,"Aug 30, 1972",The Last House on the Left,87000.0,3100000.0,3.100000e+06,6.113000e+06


In [42]:
nonstream_eg=df[~df['movie'].isin(streammovie_titles)]
nonstream_eg

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
5200,1,"May 14, 2003",Dracula: Pages from a Virgin's Diary,1100000.0,39659.0,81371.0,-978970.0
2300,1,"Dec 7, 2005",The World's Fastest Indian,25000000.0,5128124.0,18991288.0,-880588.0
1500,1,"Mar 9, 2012",A Thousand Words,40000000.0,18450127.0,20790486.0,-759387.0
5400,1,"Dec 14, 2018",That Way Madness Lies,650000.0,1447.0,1447.0,-647106.0
2800,1,"Nov 14, 1980",Raging Bull,18000000.0,23380203.0,23380203.0,28760406.0
...,...,...,...,...,...,...,...
1199,100,"Jan 30, 2004",The Big Bounce,50000000.0,6471394.0,6626115.0,-36902491.0
4199,100,"Mar 7, 1965",The Train,5800000.0,6800000.0,6800000.0,7800000.0
3899,100,"Jun 12, 2015",Me and Earl and the Dying Girl,8000000.0,6758416.0,9266180.0,8024596.0
5699,100,"Aug 30, 1972",The Last House on the Left,87000.0,3100000.0,3100000.0,6113000.0


In [43]:
nonstream_eg.describe()

Unnamed: 0,id,production_budget,domestic_gross,worldwide_gross,end_gross
count,4291.0,4291.0,4291.0,4291.0,4291.0
mean,50.186204,32435480.0,43447630.0,94255130.0,105267300.0
std,28.83666,40054390.0,65010990.0,166124700.0,201709500.0
min,1.0,1100.0,401.0,401.0,-157475300.0
25%,25.0,6000000.0,4000000.0,7833752.0,1243676.0
50%,50.0,19400000.0,20966640.0,33462010.0,32500040.0
75%,75.0,41000000.0,55071090.0,104496600.0,120095800.0
max,100.0,350000000.0,936662200.0,2208208000.0,2683973000.0


In [44]:
onstream_eg=df[df['movie'].isin(streammovie_titles)]
onstream_eg

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
4300,1,"Apr 10, 1981",Nighthawks,5000000.0,14600000.0,1.960000e+07,2.920000e+07
800,1,"Nov 21, 2012",Red Dawn,65000000.0,44806783.0,4.816415e+07,2.797093e+07
100,1,"May 29, 2009",Up,175000000.0,293004164.0,7.314634e+08,8.494675e+08
5700,1,"May 30, 2008",The Foot Fist Way,79000.0,234286.0,2.342860e+05,3.895720e+05
...,...,...,...,...,...,...,...
4699,100,"Nov 20, 1987",Teen Wolf Too,3000000.0,7888000.0,7.888000e+06,1.277600e+07
899,100,"Dec 25, 2018",Vice,60000000.0,47836282.0,7.088317e+07,5.871945e+07
3699,100,"Oct 4, 2013",Parkland,10000000.0,641439.0,1.616353e+06,-7.742208e+06
3499,100,"Jul 6, 2016",Sultan,11000000.0,5599781.0,7.298978e+07,6.758956e+07


In [45]:
onstream_eg.describe()

Unnamed: 0,id,production_budget,domestic_gross,worldwide_gross,end_gross
count,943.0,943.0,943.0,943.0,943.0
mean,51.076352,41304980.0,59043260.0,130367900.0,148106200.0
std,28.907321,53681280.0,89466310.0,235558800.0,280393100.0
min,1.0,30000.0,388.0,703.0,-89057480.0
25%,26.0,7150000.0,5978240.0,10745360.0,5436333.0
50%,51.0,20000000.0,28087160.0,47158650.0,47449920.0
75%,76.5,50000000.0,72164030.0,140839000.0,162755000.0
max,100.0,425000000.0,760507600.0,2776345000.0,3111853000.0


In [46]:
df_title = df['movie']

In [47]:
df_stream = stream_platform[stream_platform['Title'].isin(df_title)]
df_stream

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
0,0,1,Inception,2010,13+,8.8,87%,1,0,0,0,0,Christopher Nolan,"Action,Adventure,Sci-Fi,Thriller","United States,United Kingdom","English,Japanese,French",148.0
1,1,2,The Matrix,1999,18+,8.7,87%,1,0,0,0,0,"Lana Wachowski,Lilly Wachowski","Action,Sci-Fi",United States,English,136.0
2,2,3,Avengers: Infinity War,2018,13+,8.5,84%,1,0,0,0,0,"Anthony Russo,Joe Russo","Action,Adventure,Sci-Fi",United States,English,149.0
3,3,4,Back to the Future,1985,7+,8.5,96%,1,0,0,0,0,Robert Zemeckis,"Adventure,Comedy,Sci-Fi",United States,English,116.0
6,6,7,The Pianist,2002,18+,8.5,95%,1,0,1,0,0,Roman Polanski,"Biography,Drama,Music,War","United Kingdom,France,Poland,Germany","English,German,Russian",150.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16552,16552,16553,Confessions of a Teenage Drama Queen,2004,7+,4.6,14%,0,0,0,1,0,Sara Sugarman,"Comedy,Family,Music,Romance","Germany,United States",English,89.0
16633,16633,16634,The Country Bears,2002,all,4.1,29%,0,0,0,1,0,Peter Hastings,"Comedy,Family,Music,Musical",United States,English,88.0
16643,16643,16644,Doug's 1st Movie,1999,all,5.0,26%,0,0,0,1,0,Maurice Joyce,"Animation,Adventure,Comedy,Drama,Family,Fantas...",United States,English,77.0
16705,16705,16706,Meet the Deedles,1998,7+,4.1,7%,0,0,0,1,0,Steve Boyum,"Comedy,Family",United States,English,93.0


In [50]:
df_count = df_stream.sort_values('Rotten Tomatoes', ascending = False).head(500)
df_count

Unnamed: 0.1,Unnamed: 0,ID,Title,Year,Age,IMDb,Rotten Tomatoes,Netflix,Hulu,Prime Video,Disney+,Type,Directors,Genres,Country,Language,Runtime
4447,4447,4448,Lady Bird,2017,18+,7.4,99%,0,0,1,0,0,Greta Gerwig,"Comedy,Drama",United States,"English,Spanish",94.0
16220,16220,16221,Finding Nemo,2003,all,8.1,99%,0,0,0,1,0,"Andrew Stanton,Lee Unkrich","Animation,Adventure,Comedy,Family","United States,Australia",English,100.0
4474,4474,4475,Eighth Grade,2018,18+,7.4,99%,0,0,1,0,0,Bo Burnham,"Comedy,Drama",United States,English,93.0
3739,3739,3740,Gloria,2013,18+,6.8,99%,0,1,0,0,0,Edward Zwick,"Biography,Drama,History,War",United States,English,122.0
4534,4534,4535,I Am Not Your Negro,2017,18+,7.8,99%,0,0,1,0,0,Raoul Peck,Documentary,"Switzerland,France,Belgium,United States","English,French",93.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
729,729,730,Kevin Hart: Let Me Explain,2013,18+,6.8,61%,1,0,0,0,0,"Leslie Small,Tim Story","Documentary,Comedy",United States,English,75.0
687,687,688,The Signal,2014,13+,6.1,61%,1,0,0,0,0,William Eubank,"Drama,Mystery,Sci-Fi,Thriller",United States,English,97.0
7297,7297,7298,Boynton Beach Club,2005,18+,6.4,60%,0,0,1,0,0,Susan Seidelman,"Comedy,Romance",United States,English,105.0
16377,16377,16378,Tuck Everlasting,2002,7+,6.6,60%,0,0,0,1,0,Jay Russell,"Drama,Family,Fantasy,Romance",United States,"English,French",90.0


In [57]:
df_count_list = df_count['Title']
df_count_list

4447                      Lady Bird
16220                  Finding Nemo
4474                   Eighth Grade
3739                         Gloria
4534            I Am Not Your Negro
                    ...            
729      Kevin Hart: Let Me Explain
687                      The Signal
7297             Boynton Beach Club
16377              Tuck Everlasting
5893                       Twilight
Name: Title, Length: 500, dtype: object

In [61]:
dfbudgetfilter = df[df['movie'].isin(df_count_list)]
dfbudgetfilter

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
0,1,"Dec 18, 2009",Avatar,425000000.0,760507625.0,2.776345e+09,3.111853e+09
4300,1,"Apr 10, 1981",Nighthawks,5000000.0,14600000.0,1.960000e+07,2.920000e+07
100,1,"May 29, 2009",Up,175000000.0,293004164.0,7.314634e+08,8.494675e+08
4800,1,"Feb 7, 1974",Blazing Saddles,2600000.0,119500000.0,1.195000e+08,2.364000e+08
4100,1,"Nov 13, 2009",The Messenger,6500000.0,1109660.0,1.744952e+06,-3.645388e+06
...,...,...,...,...,...,...,...
1099,100,"Dec 14, 2018",The Mule,50000000.0,103804407.0,1.708577e+08,2.246621e+08
4699,100,"Nov 20, 1987",Teen Wolf Too,3000000.0,7888000.0,7.888000e+06,1.277600e+07
899,100,"Dec 25, 2018",Vice,60000000.0,47836282.0,7.088317e+07,5.871945e+07
3499,100,"Jul 6, 2016",Sultan,11000000.0,5599781.0,7.298978e+07,6.758956e+07


In [62]:
dfbudgetfilter.describe()

Unnamed: 0,id,production_budget,domestic_gross,worldwide_gross,end_gross
count,514.0,514.0,514.0,514.0,514.0
mean,51.394942,44041600.0,73849870.0,169685300.0,199493500.0
std,28.924747,59875040.0,109895100.0,290486500.0,347501400.0
min,1.0,30000.0,388.0,6870.0,-32465470.0
25%,27.0,6000000.0,5917394.0,10780800.0,9468858.0
50%,51.0,18000000.0,30195520.0,53885130.0,62082270.0
75%,76.0,55000000.0,98687440.0,195578600.0,236031200.0
max,100.0,425000000.0,760507600.0,2776345000.0,3111853000.0


In [63]:
dfnon = df[~df['movie'].isin(df_count_list)]
dfnon

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,end_gross
5200,1,"May 14, 2003",Dracula: Pages from a Virgin's Diary,1100000.0,39659.0,81371.0,-978970.0
2300,1,"Dec 7, 2005",The World's Fastest Indian,25000000.0,5128124.0,18991288.0,-880588.0
1500,1,"Mar 9, 2012",A Thousand Words,40000000.0,18450127.0,20790486.0,-759387.0
5400,1,"Dec 14, 2018",That Way Madness Lies,650000.0,1447.0,1447.0,-647106.0
2800,1,"Nov 14, 1980",Raging Bull,18000000.0,23380203.0,23380203.0,28760406.0
...,...,...,...,...,...,...,...
1199,100,"Jan 30, 2004",The Big Bounce,50000000.0,6471394.0,6626115.0,-36902491.0
4199,100,"Mar 7, 1965",The Train,5800000.0,6800000.0,6800000.0,7800000.0
3899,100,"Jun 12, 2015",Me and Earl and the Dying Girl,8000000.0,6758416.0,9266180.0,8024596.0
5699,100,"Aug 30, 1972",The Last House on the Left,87000.0,3100000.0,3100000.0,6113000.0


In [64]:
dfnon.describe()

Unnamed: 0,id,production_budget,domestic_gross,worldwide_gross,end_gross
count,4720.0,4720.0,4720.0,4720.0,4720.0
mean,50.232415,32943610.0,43252700.0,93255820.0,103564900.0
std,28.841141,40556190.0,63835810.0,163192100.0,197360900.0
min,1.0,1100.0,401.0,401.0,-157475300.0
25%,25.0,6500000.0,4059578.0,8000000.0,1133236.0
50%,50.0,20000000.0,21297510.0,34094720.0,33019500.0
75%,75.0,42000000.0,55127170.0,104281600.0,119625400.0
max,100.0,410600000.0,936662200.0,2208208000.0,2683973000.0


In [66]:
df_count['Netflix'].value_counts()

0    317
1    183
Name: Netflix, dtype: int64

In [67]:
df_count['Hulu'].value_counts()

0    418
1     82
Name: Hulu, dtype: int64

In [68]:
df_count['Prime Video'].value_counts()

0    293
1    207
Name: Prime Video, dtype: int64

In [69]:
df_count['Disney+'].value_counts()

0    408
1     92
Name: Disney+, dtype: int64