# DSPT-04 Phase 1 Project Submission

Please fill out:
* Student name: ANGEL LINAH ATUNGIRE 
* Student pace: Part Time 
* Instructor name: MARYAN MWIKALI

### Overview
In this project, I will be using exploratory data analysis to generate insights for a business stakeholder.

### Business Problem
Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. I was charged with exploring what types of films are currently doing the best at the box office. I then had to translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create. 

We can explore the following data questions:

1. *Which film genres have the highest box office performance?*
   - By analyzing the revenue generated by different film genres, I can identify the genres that are most successful in terms of box office performance. I can create visualizations such as bar charts or pie charts to showcase the revenue distribution across genres.
   

2. *What is the relationship between budget and box office success?*
   - I can examine the relationship between the budget allocated to a film and its box office performance. This analysis can help us determine if higher-budget films tend to generate higher revenue, or if there is a particular budget range that yields the best results. Visualizing this relationship using scatter plots or regression analysis can provide insights.
   

3. *How does the release period impact box office performance?*
   - I can explore the impact of the release period (e.g., month, season) on a film's box office success. By analyzing revenue trends over time, I can identify if certain months or seasons are more favorable for film releases. Line graphs or box plots can be useful visualizations to showcase revenue patterns across different release periods.
   

4. *Do audience ratings correlate with box office performance?*
   - I can examine the relationship between audience ratings (e.g., IMDb ratings) and box office success. Analyzing the correlation between these variables can help us understand if films with higher ratings tend to perform better financially. Visualizing this relationship using scatter plots or heatmaps can provide insights into audience preferences.

### Data Understanding.
The data used for this project includes relevant information about films, such as genre, budget, release date, revenue, and audience ratings. We would then perform exploratory data analysis, applying various statistical techniques and data visualization methods to uncover insights. These insights can be translated into actionable recommendations for Microsoft's movie studio, helping them decide on the types of films to create based on the findings.

In [1]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
#Exploration of the data after importing it.

In [3]:
imdb_title_df = pd.read_csv('title.basics.csv')
imdb_title_df.head()

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"


In [28]:
imdb_title_df.tail()

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres
146139,tt9916538,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama
146140,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
146141,tt9916706,Dankyavar Danka,Dankyavar Danka,2013,,Comedy
146142,tt9916730,6 Gunn,6 Gunn,2017,116.0,
146143,tt9916754,Chico Albuquerque - Revelações,Chico Albuquerque - Revelações,2013,,Documentary


In [4]:
imdb_ratings_df = pd.read_csv('title.ratings.csv')
imdb_ratings_df.head()

Unnamed: 0,tconst,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21


In [25]:
imdbdf = pd.merge(imdb_title_df, imdb_ratings_df, on='tconst')
imdbdf.head()

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres,averagerating,numvotes
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama",7.0,77
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama",7.2,43
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama,6.9,4517
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama",6.1,13
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy",6.5,119


In [77]:
votes_df = imdbdf[['genres', 'primary_title', 'numvotes']].sort_values(by='numvotes', ascending=False)

In [79]:
votes_df.head(20)

Unnamed: 0,genres,primary_title,numvotes
2387,"Action,Adventure,Sci-Fi",Inception,1841066
2241,"Action,Thriller",The Dark Knight Rises,1387769
280,"Adventure,Drama,Sci-Fi",Interstellar,1299334
12072,"Drama,Western",Django Unchained,1211405
325,"Action,Adventure,Sci-Fi",The Avengers,1183655
507,"Biography,Crime,Drama",The Wolf of Wall Street,1035358
1091,"Mystery,Thriller",Shutter Island,1005960
15327,"Action,Adventure,Comedy",Guardians of the Galaxy,948394
2831,"Action,Adventure,Comedy",Deadpool,820847
2523,"Action,Adventure,Sci-Fi",The Hunger Games,795227


In [38]:
ratings_df = imdbdf[['genres', 'primary_title', 'averagerating']].sort_values(by='averagerating', ascending=False)
ratings_df

Unnamed: 0,genres,primary_title,averagerating
51109,Documentary,Fly High: Story of the Disc Dog,10.0
65944,"Adventure,Comedy",Calamity Kevin,10.0
71577,Documentary,Pick It Up! - Ska in the '90s,10.0
73616,Documentary,Renegade,10.0
65755,"Documentary,History",Ellis Island: The Making of a Master Race in A...,10.0
...,...,...,...
53923,Horror,Tachiiri kinshi Haittara shinu? Norowareta 5 hen,1.0
63359,Thriller,Between the Walls,1.0
16918,"Documentary,Music",Transgender Trouble,1.0
13053,"Drama,Music",Kikkake wa You!,1.0


In [68]:
ratings_df.head(20)

Unnamed: 0,genres,primary_title,averagerating
51109,Documentary,Fly High: Story of the Disc Dog,10.0
65944,"Adventure,Comedy",Calamity Kevin,10.0
71577,Documentary,Pick It Up! - Ska in the '90s,10.0
73616,Documentary,Renegade,10.0
65755,"Documentary,History",Ellis Island: The Making of a Master Race in A...,10.0
878,"Comedy,Drama",The Dark Knight: The Ballad of the N Word,10.0
64646,Documentary,A Dedicated Life: Phoebe Brand Beyond the Group,10.0
9745,"Crime,Documentary",Freeing Bernie Baran,10.0
702,Documentary,Exteriores: Mulheres Brasileiras na Diplomacia,10.0
49925,Drama,Dog Days in the Heartland,10.0


In [26]:
imdbdf.count()

tconst             73856
primary_title      73856
original_title     73856
start_year         73856
runtime_minutes    66236
genres             73052
averagerating      73856
numvotes           73856
dtype: int64

In [5]:
rt_info_df = pd.read_csv('rt.movie_info.tsv', sep='\t')
rt_info_df.head()

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,1,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104 minutes,
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000.0,108 minutes,Entertainment One
2,5,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116 minutes,
3,6,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128 minutes,
4,7,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200 minutes,


In [24]:
rt_info_df.tail()

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
1555,1996,Forget terrorists or hijackers -- there's a ha...,R,Action and Adventure|Horror|Mystery and Suspense,,,"Aug 18, 2006","Jan 2, 2007",$,33886034.0,106 minutes,New Line Cinema
1556,1997,The popular Saturday Night Live sketch was exp...,PG,Comedy|Science Fiction and Fantasy,Steve Barron,Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner,"Jul 23, 1993","Apr 17, 2001",,,88 minutes,Paramount Vantage
1557,1998,"Based on a novel by Richard Powell, when the l...",G,Classics|Comedy|Drama|Musical and Performing Arts,Gordon Douglas,,"Jan 1, 1962","May 11, 2004",,,111 minutes,
1558,1999,The Sandlot is a coming-of-age story about a g...,PG,Comedy|Drama|Kids and Family|Sports and Fitness,David Mickey Evans,David Mickey Evans|Robert Gunter,"Apr 1, 1993","Jan 29, 2002",,,101 minutes,
1559,2000,"Suspended from the force, Paris cop Hubert is ...",R,Action and Adventure|Art House and Internation...,,Luc Besson,"Sep 27, 2001","Feb 11, 2003",,,94 minutes,Columbia Pictures


In [6]:
bom_df = pd.read_csv('bom.movie_gross.csv')
bom_df.head()

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000,2010
3,Inception,WB,292600000.0,535700000,2010
4,Shrek Forever After,P/DW,238700000.0,513900000,2010


In [69]:
bom_df.dtypes

title              object
studio             object
domestic_gross    float64
foreign_gross      object
year                int64
dtype: object

In [73]:
grossxtitle_df = bom_df[['title', 'domestic_gross']].sort_values(by='domestic_gross', ascending=False)
grossxtitle_df.head(20)

Unnamed: 0,title,domestic_gross
1872,Star Wars: The Force Awakens,936700000.0
3080,Black Panther,700100000.0
3079,Avengers: Infinity War,678800000.0
1873,Jurassic World,652300000.0
727,Marvel's The Avengers,623400000.0
2758,Star Wars: The Last Jedi,620200000.0
3082,Incredibles 2,608600000.0
2323,Rogue One: A Star Wars Story,532200000.0
2759,Beauty and the Beast (2017),504000000.0
2324,Finding Dory,486300000.0


In [75]:
grossxtitle_df.dropna()

Unnamed: 0,title,domestic_gross
1872,Star Wars: The Force Awakens,936700000.0
3080,Black Panther,700100000.0
3079,Avengers: Infinity War,678800000.0
1873,Jurassic World,652300000.0
727,Marvel's The Avengers,623400000.0
...,...,...
3078,2:22,400.0
2321,The Chambermaid,300.0
2757,Satanic,300.0
2756,News From Planet Mars,300.0


In [None]:
bom_df.dtypes

In [7]:
rt_reviews_df = pd.read_csv('rt.reviews.tsv', sep='\t', encoding = 'unicode_escape')
rt_reviews_df.head()

Unnamed: 0,id,review,rating,fresh,critic,top_critic,publisher,date
0,3,A distinctly gallows take on contemporary fina...,3/5,fresh,PJ Nabarro,0,Patrick Nabarro,"November 10, 2018"
1,3,It's an allegory in search of a meaning that n...,,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
2,3,... life lived in a bubble in financial dealin...,,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
3,3,Continuing along a line introduced in last yea...,,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
4,3,... a perverse twist on neorealism...,,fresh,,0,Cinema Scope,"October 12, 2017"


In [23]:
rt_reviews_df.tail()

Unnamed: 0,id,review,rating,fresh,critic,top_critic,publisher,date
54427,2000,The real charm of this trifle is the deadpan c...,,fresh,Laura Sinagra,1,Village Voice,"September 24, 2002"
54428,2000,,1/5,rotten,Michael Szymanski,0,Zap2it.com,"September 21, 2005"
54429,2000,,2/5,rotten,Emanuel Levy,0,EmanuelLevy.Com,"July 17, 2005"
54430,2000,,2.5/5,rotten,Christopher Null,0,Filmcritic.com,"September 7, 2003"
54431,2000,,3/5,fresh,Nicolas Lacroix,0,Showbizz.net,"November 12, 2002"


In [8]:
tm_df = pd.read_csv('tmdb.movies.csv')
tm_df.head()

Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


In [9]:
tm_df = tm_df.drop('Unnamed: 0', axis = 1)

In [10]:
tm_df.head()

Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


In [114]:
pop_df = tm_df[['title', 'popularity']].sort_values(by='popularity', ascending=False)
pop_df.head(20)

Unnamed: 0,title,popularity
23811,Avengers: Infinity War,80.773
11019,John Wick,78.123
23812,Spider-Man: Into the Spider-Verse,60.534
11020,The Hobbit: The Battle of the Five Armies,53.783
5179,The Avengers,50.289
11021,Guardians of the Galaxy,49.606
20617,Blade Runner 2049,48.571
23813,Blade Runner 2049,48.571
23814,Fantastic Beasts: The Crimes of Grindelwald,48.508
23815,Ralph Breaks the Internet,48.057


In [116]:
pop_df.rename(columns = {'title' : 'movie'}, inplace = True)

In [117]:
pop_df.head()

Unnamed: 0,movie,popularity
23811,Avengers: Infinity War,80.773
11019,John Wick,78.123
23812,Spider-Man: Into the Spider-Verse,60.534
11020,The Hobbit: The Battle of the Five Armies,53.783
5179,The Avengers,50.289


In [90]:
budget_df = pd.read_csv('tn.movie_budgets.csv')
budget_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"


In [91]:
budget_df.dtypes

id                    int64
release_date         object
movie                object
production_budget    object
domestic_gross       object
worldwide_gross      object
dtype: object

In [92]:
# Remove '$' and comma from specified columns
budget_df['domestic_gross'] = budget_df['domestic_gross'].str.replace('$', '').str.replace(',', '')
budget_df['worldwide_gross'] = budget_df['worldwide_gross'].str.replace('$', '').str.replace(',', '')
budget_df['production_budget'] = budget_df['production_budget'].str.replace('$', '').str.replace(',', '')

# Print the updated DataFrame
budget_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,425000000,760507625,2776345279
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,410600000,241063875,1045663875
2,3,"Jun 7, 2019",Dark Phoenix,350000000,42762350,149762350
3,4,"May 1, 2015",Avengers: Age of Ultron,330600000,459005868,1403013963
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,317000000,620181382,1316721747


In [93]:
budget_df['domestic_gross'] = budget_df['domestic_gross'].astype('int')

In [94]:
budget_df['worldwide_gross'] = budget_df['worldwide_gross'].astype('int64')

In [95]:
budget_df['production_budget'] = budget_df['production_budget'].astype('int')

In [96]:
budget_df.dtypes

id                    int64
release_date         object
movie                object
production_budget     int32
domestic_gross        int32
worldwide_gross       int64
dtype: object

In [97]:
# Calculate profit using production_budget, domestic_gross, and worldwide_gross
budget_df['profit'] = budget_df['domestic_gross'] + budget_df['worldwide_gross'] - budget_df['production_budget']

# Print the updated DataFrame
budget_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,profit
0,1,"Dec 18, 2009",Avatar,425000000,760507625,2776345279,3111852904
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,410600000,241063875,1045663875,876127750
2,3,"Jun 7, 2019",Dark Phoenix,350000000,42762350,149762350,-157475300
3,4,"May 1, 2015",Avengers: Age of Ultron,330600000,459005868,1403013963,1531419831
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,317000000,620181382,1316721747,1619903129


In [98]:
budget_df['release_date'] = pd.to_datetime(budget_df['release_date']).dt.strftime('%Y-%m-%d')

In [99]:
budget_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,profit
0,1,2009-12-18,Avatar,425000000,760507625,2776345279,3111852904
1,2,2011-05-20,Pirates of the Caribbean: On Stranger Tides,410600000,241063875,1045663875,876127750
2,3,2019-06-07,Dark Phoenix,350000000,42762350,149762350,-157475300
3,4,2015-05-01,Avengers: Age of Ultron,330600000,459005868,1403013963,1531419831
4,5,2017-12-15,Star Wars Ep. VIII: The Last Jedi,317000000,620181382,1316721747,1619903129


In [102]:
budget_df.dtypes

id                    int64
release_date         object
movie                object
production_budget     int32
domestic_gross        int32
worldwide_gross       int64
profit                int64
dtype: object

In [106]:
budget_df['release_date'] = pd.to_datetime(budget_df['release_date'], errors='coerce')

In [107]:
budget_df.dtypes

id                            int64
release_date         datetime64[ns]
movie                        object
production_budget             int32
domestic_gross                int32
worldwide_gross               int64
profit                        int64
dtype: object

In [108]:
budget_df['month'] = budget_df['release_date'].dt.month
budget_df.head()

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,profit,month
0,1,2009-12-18,Avatar,425000000,760507625,2776345279,3111852904,12
1,2,2011-05-20,Pirates of the Caribbean: On Stranger Tides,410600000,241063875,1045663875,876127750,5
2,3,2019-06-07,Dark Phoenix,350000000,42762350,149762350,-157475300,6
3,4,2015-05-01,Avengers: Age of Ultron,330600000,459005868,1403013963,1531419831,5
4,5,2017-12-15,Star Wars Ep. VIII: The Last Jedi,317000000,620181382,1316721747,1619903129,12


In [111]:
avg_prof_date = budget_df.groupby(budget_df['month'].apply(lambda x: 'Jan-Jun' if x <= 6 else 'Jul-Dec'))['profit'].mean()
avg_prof_date_df = pd.DataFrame(avg_prof_date).reset_index()
avg_prof_date_df

Unnamed: 0,month,profit
0,Jan-Jun,106588400.0
1,Jul-Dec,97976440.0


In [112]:
avg_prof_date_df['profit'] = avg_prof_date_df['profit'].astype('int64')

In [113]:
avg_prof_date_df

Unnamed: 0,month,profit
0,Jan-Jun,106588400
1,Jul-Dec,97976439


In [81]:
profxdate_df = budget_df[['release_date', 'movie', 'profit']].sort_values(by='profit', ascending=False)
profxdate_df.head(20)

Unnamed: 0,release_date,movie,profit
0,"Dec 18, 2009",Avatar,3111852904
5,"Dec 18, 2015",Star Wars Ep. VII: The Force Awakens,2683973445
42,"Dec 19, 1997",Titanic,2667572339
6,"Apr 27, 2018",Avengers: Infinity War,2426949682
33,"Jun 12, 2015",Jurassic World,2086125489
26,"May 4, 2012",The Avengers,1916215444
41,"Feb 16, 2018",Black Panther,1848317790
66,"Apr 3, 2015",Furious 7,1681729814
43,"Jun 15, 2018",Incredibles 2,1651102455
4,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,1619903129


In [119]:
popxprof = pd.merge(pop_df, profxdate_df, on='movie')
popxprof.head(20)

Unnamed: 0,movie,popularity,release_date,profit
0,Avengers: Infinity War,80.773,"Apr 27, 2018",2426949682
1,John Wick,78.123,"Oct 24, 2014",89272836
2,The Hobbit: The Battle of the Five Armies,53.783,"Dec 17, 2014",950697409
3,The Avengers,50.289,"May 4, 2012",1916215444
4,The Avengers,50.289,"Aug 14, 1998",11970832
5,Guardians of the Galaxy,49.606,"Aug 1, 2014",934039628
6,Blade Runner 2049,48.571,"Oct 6, 2017",166411567
7,Blade Runner 2049,48.571,"Oct 6, 2017",166411567
8,Fantastic Beasts: The Crimes of Grindelwald,48.508,"Nov 16, 2018",611775987
9,Spider-Man: Homecoming,46.775,"Jul 7, 2017",1039367490


In [120]:
popxprof.dtypes

movie            object
popularity      float64
release_date     object
profit            int64
dtype: object

In [121]:
#Calculate the correlation between the popularity and profit
corr1 = popxprof['popularity'].corr(popxprof['profit'])
corr1

0.5353726817334188

In [82]:
profxdate_df.tail(20)

Unnamed: 0,release_date,movie,profit
952,"Nov 8, 2019",Midway,-59500000
951,"Dec 11, 2015",The Ridiculous 6,-60000000
435,"Dec 22, 1995",Cutthroat Island,-63465356
669,"Feb 23, 2001",Monkeybone,-64180966
820,"Oct 26, 2018",Air Strike,-64483721
478,"Nov 24, 2010",The Nutcracker in 3D,-69338525
477,"Apr 21, 2017",The Promise,-71224295
607,"Sep 2, 2005",A Sound of Thunder,-71799098
670,"Aug 30, 2019",PLAYMOBIL,-75000000
671,"Dec 31, 2019",355,-75000000


In [65]:
prodxprof = budget_df[['movie', 'production_budget', 'profit']].sort_values(by='profit', ascending=False)
prodxprof.head(10)

Unnamed: 0,movie,production_budget,profit
0,Avatar,425000000,3111852904
5,Star Wars Ep. VII: The Force Awakens,306000000,2683973445
42,Titanic,200000000,2667572339
6,Avengers: Infinity War,300000000,2426949682
33,Jurassic World,215000000,2086125489
26,The Avengers,225000000,1916215444
41,Black Panther,200000000,1848317790
66,Furious 7,190000000,1681729814
43,Incredibles 2,200000000,1651102455
4,Star Wars Ep. VIII: The Last Jedi,317000000,1619903129


In [122]:
prodxprof.tail(20)

Unnamed: 0,movie,production_budget,profit
952,Midway,59500000,-59500000
951,The Ridiculous 6,60000000,-60000000
435,Cutthroat Island,92000000,-63465356
669,Monkeybone,75000000,-64180966
820,Air Strike,65000000,-64483721
478,The Nutcracker in 3D,90000000,-69338525
477,The Promise,90000000,-71224295
607,A Sound of Thunder,80000000,-71799098
670,PLAYMOBIL,75000000,-75000000
671,355,75000000,-75000000


In [66]:
world = budget_df[['movie', 'worldwide_gross']].sort_values(by='worldwide_gross', ascending=False)
world.head(10)

Unnamed: 0,movie,worldwide_gross
0,Avatar,2776345279
42,Titanic,2208208395
5,Star Wars Ep. VII: The Force Awakens,2053311220
6,Avengers: Infinity War,2048134200
33,Jurassic World,1648854864
66,Furious 7,1518722794
26,The Avengers,1517935897
3,Avengers: Age of Ultron,1403013963
41,Black Panther,1348258224
260,Harry Potter and the Deathly Hallows: Part II,1341693157


In [63]:
profit_df.tail(10)

Unnamed: 0,release_date,movie,profit
619,"Jan 22, 2019",Renegades,-75978328
535,"Feb 21, 2020",Call of the Wild,-82000000
352,"Apr 27, 2001",Town & Country,-87922780
404,"Aug 16, 2002",The Adventures of Pluto Nash,-88493903
193,"Mar 11, 2011",Mars Needs Moms,-89057484
480,"Dec 31, 2019",Army of the Dead,-90000000
479,"Dec 13, 2017",Bright,-90000000
341,"Jun 14, 2019",Men in Black: International,-103800000
194,"Dec 31, 2020",Moonfall,-150000000
2,"Jun 7, 2019",Dark Phoenix,-157475300


In [16]:
budget_df.count()

id                   5782
release_date         5782
movie                5782
production_budget    5782
domestic_gross       5782
worldwide_gross      5782
dtype: int64

In [17]:
tm_df.count()

genre_ids            26517
id                   26517
original_language    26517
original_title       26517
popularity           26517
release_date         26517
title                26517
vote_average         26517
vote_count           26517
dtype: int64

In [18]:
rt_reviews_df.count()

id            54432
review        48869
rating        40915
fresh         54432
critic        51710
top_critic    54432
publisher     54123
date          54432
dtype: int64

In [19]:
bom_df.count()

title             3387
studio            3382
domestic_gross    3359
foreign_gross     2037
year              3387
dtype: int64

In [20]:
rt_info_df.count()

id              1560
synopsis        1498
rating          1557
genre           1552
director        1361
writer          1111
theater_date    1201
dvd_date        1201
currency         340
box_office       340
runtime         1530
studio           494
dtype: int64

In [21]:
imdb_ratings_df.count()

tconst           73856
averagerating    73856
numvotes         73856
dtype: int64

In [22]:
imdb_title_df.count()

tconst             146144
primary_title      146144
original_title     146123
start_year         146144
runtime_minutes    114405
genres             140736
dtype: int64