# Top Earners in the Movie Industry

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

> I chose the IMDB movie dataset. I've wanted to know how much the different movie genres, directors and production companies have grossed over a period of time.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
df = pd.read_csv('imdb-movies.csv')
df.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,tagline,...,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,...,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/15,5562,6.5,2015,137999900.0,1392446000.0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,...,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/15,6185,7.1,2015,137999900.0,348161300.0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,...,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/15,2480,6.3,2015,101200000.0,271619000.0
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,...,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/15,5292,7.5,2015,183999900.0,1902723000.0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,...,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/15,2947,7.3,2015,174799900.0,1385749000.0


In [6]:
df.dtypes

id                        int64
imdb_id                  object
popularity              float64
budget                    int64
revenue                   int64
original_title           object
cast                     object
homepage                 object
director                 object
tagline                  object
keywords                 object
overview                 object
runtime                   int64
genres                   object
production_companies     object
release_date             object
vote_count                int64
vote_average            float64
release_year              int64
budget_adj              float64
revenue_adj             float64
dtype: object

### Data Cleaning

In [3]:
# Drop columns without neccesary information and remove all records with no financial information


In [8]:
df.columns

Index(['id', 'imdb_id', 'popularity', 'budget', 'revenue', 'original_title',
       'cast', 'homepage', 'director', 'tagline', 'keywords', 'overview',
       'runtime', 'genres', 'production_companies', 'release_date',
       'vote_count', 'vote_average', 'release_year', 'budget_adj',
       'revenue_adj'],
      dtype='object')

In [9]:
df.drop(['id', 'imdb_id', 'popularity', 'vote_count', 'vote_average', 'keywords'], axis=1, inplace=True)
df.columns

Index(['budget', 'revenue', 'original_title', 'cast', 'homepage', 'director',
       'tagline', 'overview', 'runtime', 'genres', 'production_companies',
       'release_date', 'release_year', 'budget_adj', 'revenue_adj'],
      dtype='object')

In [14]:
df.isna().sum()


budget                  0
revenue                 0
original_title          0
cast                    0
homepage                0
director                0
tagline                 0
overview                0
runtime                 0
genres                  0
production_companies    0
release_date            0
release_year            0
budget_adj              0
revenue_adj             0
dtype: int64

In [15]:
df.dropna(inplace=True)

In [16]:
df

Unnamed: 0,budget,revenue,original_title,cast,homepage,director,tagline,overview,runtime,genres,production_companies,release_date,release_year,budget_adj,revenue_adj
0,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/15,2015,1.379999e+08,1.392446e+09
1,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/15,2015,1.379999e+08,3.481613e+08
2,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/15,2015,1.012000e+08,2.716190e+08
3,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/15,2015,1.839999e+08,1.902723e+09
4,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/15,2015,1.747999e+08,1.385749e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10724,7000000,81974493,On Her Majesty's Secret Service,George Lazenby|Diana Rigg|Telly Savalas|Gabrie...,http://www.mgm.com/view/movie/1411/On-Her-Maje...,Peter R. Hunt,Far up! Far out! Far more! James Bond 007 is b...,James Bond tracks archnemesis Ernst Blofeld to...,142,Adventure|Action|Thriller,Eon Productions|Metro-Goldwyn-Mayer (MGM)|Danjaq,12/12/69,1969,4.160985e+07,4.872780e+08
10759,300000,70000000,Halloween,Donald Pleasence|Jamie Lee Curtis|P.J. Soles|N...,http://www.theofficialjohncarpenter.com/hallow...,John Carpenter,The Night HE Came Home!,"A psychotic murderer, institutionalized since ...",91,Horror|Thriller,Compass International Pictures|Falcon Internat...,10/25/78,1978,1.002810e+06,2.339890e+08
10760,2700000,141000000,Animal House,John Belushi|Tim Matheson|John Vernon|Verna Bl...,http://www.animalhouse.com/,John Landis,It was the Deltas against the rules... the rul...,"At a 1962 College, Dean Vernon Wormer is deter...",109,Comedy,Universal Pictures|Oregon Film Factory|Stage I...,7/27/78,1978,9.025292e+06,4.713208e+08
10817,0,321952,The Last Waltz,Robbie Robertson|Rick Danko|Levon Helm|Richard...,http://www.mgm.com/#/our-titles/1092/The-Last-...,Martin Scorsese,It Started as a Concert. It Became a Celebration.,Martin Scorsese's rockumentary intertwines foo...,117,Documentary|Music,FM Productions|Last Waltz Inc.,5/1/78,1978,0.000000e+00,1.076189e+06


#### If I created one record for each the `production_companies` a movie was release under and one record each for `genres`<br>and tried to run calculations, it wouldn't work because for many records, the amount of `production_companies`<br>and `genres` aren't the same, so I'll create 2 dataframes; one w/o a `production_companies` column and one w/o a `genres` columns

#### One `production_companies` per record

In [21]:
df_prod = df.drop(['genres'],axis=1)

In [22]:
df_prod[['production_companies','prod1','prod2','prod3','prod4']] = df_prod['production_companies'].str.split('|',expand=True)


In [23]:
df_prod.head()


Unnamed: 0,budget,revenue,original_title,cast,homepage,director,tagline,overview,runtime,production_companies,release_date,release_year,budget_adj,revenue_adj,prod1,prod2,prod3,prod4
0,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,Twenty-two years after the events of Jurassic ...,124,Universal Studios,6/9/15,2015,137999900.0,1392446000.0,Amblin Entertainment,Legendary Pictures,Fuji Television Network,Dentsu
1,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,An apocalyptic story set in the furthest reach...,120,Village Roadshow Pictures,5/13/15,2015,137999900.0,348161300.0,Kennedy Miller Productions,,,
2,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,Beatrice Prior must confront her inner demons ...,119,Summit Entertainment,3/18/15,2015,101200000.0,271619000.0,Mandeville Films,Red Wagon Entertainment,NeoReel,
3,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,Thirty years after defeating the Galactic Empi...,136,Lucasfilm,12/15/15,2015,183999900.0,1902723000.0,Truenorth Productions,Bad Robot,,
4,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,Deckard Shaw seeks revenge against Dominic Tor...,137,Universal Pictures,4/1/15,2015,174799900.0,1385749000.0,Original Film,Media Rights Capital,Dentsu,One Race Films


In [1]:
# GENRES
# For every string of genres in that record, split the production companies into a list. 
# This way we should be able to query whichever production company

#### One `genres` per record

<a id='eda'></a>
## Exploratory Data Analysis

### Which production companies released the most movies in the last 10 years? Display the top 10 production companies.

In [24]:
releases = df.query('release_year > 2011')

In [25]:
releases['production_companies'].value_counts().head(10)


Marvel Studios                                      7
The Asylum                                          4
Walt Disney Pictures                                4
Lionsgate|Color Force                               3
DreamWorks Animation                                3
Columbia Pictures                                   3
Legendary Pictures                                  2
HBO Films                                           2
Home Box Office (HBO)                               2
Blue Sky Studios|Twentieth Century Fox Animation    2
Name: production_companies, dtype: int64

### What 5 movie genres grossed the highest all-time?

In [27]:
df.groupby('genres').revenue.sum().sort_values(ascending = False).head()


genres
Adventure|Fantasy|Action            7428942514
Comedy                              7408356696
Drama                               6809971541
Adventure|Action|Thriller           5188992154
Adventure|Action|Science Fiction    4839902747
Name: revenue, dtype: int64

### Who are the top 10 grossing directors?

In [28]:
df.groupby('director').revenue.sum().sort_values(ascending = False).head(10)

director
Peter Jackson        5943642820
Steven Spielberg     5257606337
James Cameron        4721952106
David Yates          4154295625
Christopher Nolan    4053833672
George Lucas         3201550893
Francis Lawrence     3179979588
Michael Bay          3164378459
Sam Raimi            3079057712
Gore Verbinski       3016685549
Name: revenue, dtype: int64

### Compare the revenue of the highest grossing movies of all time.

In [29]:
df.groupby('original_title').revenue.sum().sort_values(ascending = False).head()


original_title
Avatar                          2781505847
Star Wars: The Force Awakens    2068178225
Titanic                         1845034188
The Avengers                    1519557910
Jurassic World                  1513528810
Name: revenue, dtype: int64

<a id='conclusions'></a>
## Conclusions

* Avatar is the highest-grossing movie of all time.

* Steven Spielberg is the highest-grossing director of all time.

* Action movies (not to my surprise) are the highest-grossing movies..

* Disney is not one of the top 5 highest-grossing production companies during the last 10 years.