# Top Earners in the Movie Industry

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

> I chose the IMDB movie dataset. I've wanted to know how much the different movie genres, directors and production companies have grossed over a period of time.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('imdb-movies.csv')

In [3]:
df.head()

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,cast,homepage,director,tagline,...,overview,runtime,genres,production_companies,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,http://www.jurassicworld.com/,Colin Trevorrow,The park is open.,...,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,6/9/15,5562,6.5,2015,137999900.0,1392446000.0
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,http://www.madmaxmovie.com/,George Miller,What a Lovely Day.,...,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,5/13/15,6185,7.1,2015,137999900.0,348161300.0
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Shailene Woodley|Theo James|Kate Winslet|Ansel...,http://www.thedivergentseries.movie/#insurgent,Robert Schwentke,One Choice Can Destroy You,...,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,3/18/15,2480,6.3,2015,101200000.0,271619000.0
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...,http://www.starwars.com/films/star-wars-episod...,J.J. Abrams,Every generation has a story.,...,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,12/15/15,5292,7.5,2015,183999900.0,1902723000.0
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,Vin Diesel|Paul Walker|Jason Statham|Michelle ...,http://www.furious7.com/,James Wan,Vengeance Hits Home,...,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,4/1/15,2947,7.3,2015,174799900.0,1385749000.0


### Data Cleaning

In [4]:
# Drop columns without neccesary information and remove all records with no financial information
# id, imdb id, cast, homepage, tagline, keywords, vote count?, vote average?, budget adj, rev adj
df.drop(['id', 'imdb_id', 'cast', 'homepage', 'tagline', 'keywords', 'vote_count', 'vote_average', 'budget_adj', 'revenue_adj', 'release_date'], axis='columns', inplace=True)

In [5]:
df

Unnamed: 0,popularity,budget,revenue,original_title,director,overview,runtime,genres,production_companies,release_year
0,32.985763,150000000,1513528810,Jurassic World,Colin Trevorrow,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,2015
1,28.419936,150000000,378436354,Mad Max: Fury Road,George Miller,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,2015
2,13.112507,110000000,295238201,Insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015
3,11.173104,200000000,2068178225,Star Wars: The Force Awakens,J.J. Abrams,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,2015
4,9.335014,190000000,1506249360,Furious 7,James Wan,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,2015
...,...,...,...,...,...,...,...,...,...,...
10861,0.080598,0,0,The Endless Summer,Bruce Brown,"The Endless Summer, by Bruce Brown, is one of ...",95,Documentary,Bruce Brown Films,1966
10862,0.065543,0,0,Grand Prix,John Frankenheimer,Grand Prix driver Pete Aron is fired by his te...,176,Action|Adventure|Drama,Cherokee Productions|Joel Productions|Douglas ...,1966
10863,0.065141,0,0,Beregis Avtomobilya,Eldar Ryazanov,An insurance agent who moonlights as a carthie...,94,Mystery|Comedy,Mosfilm,1966
10864,0.064317,0,0,"What's Up, Tiger Lily?",Woody Allen,"In comic Woody Allen's film debut, he took the...",80,Action|Comedy,Benedict Pictures Corp.,1966


In [6]:
df.isna().sum()


popularity                 0
budget                     0
revenue                    0
original_title             0
director                  44
overview                   4
runtime                    0
genres                    23
production_companies    1030
release_year               0
dtype: int64

In [7]:
df.dtypes

popularity              float64
budget                    int64
revenue                   int64
original_title           object
director                 object
overview                 object
runtime                   int64
genres                   object
production_companies     object
release_year              int64
dtype: object

In [8]:
df.dropna(inplace=True)

In [9]:
df.drop_duplicates(inplace=True)

In [14]:
# multiple_production_companies = df[df.production_companies.str.contains('|')]

In [11]:
df.to_csv('df_clean.csv', index=False)

In [12]:
df = pd.read_csv('df_clean.csv')

In [16]:
multiple_production_companies

Unnamed: 0,popularity,budget,revenue,original_title,director,overview,runtime,genres,production_companies,release_year
0,32.985763,150000000,1513528810,Jurassic World,Colin Trevorrow,Twenty-two years after the events of Jurassic ...,124,Action|Adventure|Science Fiction|Thriller,Universal Studios|Amblin Entertainment|Legenda...,2015
1,28.419936,150000000,378436354,Mad Max: Fury Road,George Miller,An apocalyptic story set in the furthest reach...,120,Action|Adventure|Science Fiction|Thriller,Village Roadshow Pictures|Kennedy Miller Produ...,2015
2,13.112507,110000000,295238201,Insurgent,Robert Schwentke,Beatrice Prior must confront her inner demons ...,119,Adventure|Science Fiction|Thriller,Summit Entertainment|Mandeville Films|Red Wago...,2015
3,11.173104,200000000,2068178225,Star Wars: The Force Awakens,J.J. Abrams,Thirty years after defeating the Galactic Empi...,136,Action|Adventure|Science Fiction|Fantasy,Lucasfilm|Truenorth Productions|Bad Robot,2015
4,9.335014,190000000,1506249360,Furious 7,James Wan,Deckard Shaw seeks revenge against Dominic Tor...,137,Action|Crime|Thriller,Universal Pictures|Original Film|Media Rights ...,2015
...,...,...,...,...,...,...,...,...,...,...
9801,0.080598,0,0,The Endless Summer,Bruce Brown,"The Endless Summer, by Bruce Brown, is one of ...",95,Documentary,Bruce Brown Films,1966
9802,0.065543,0,0,Grand Prix,John Frankenheimer,Grand Prix driver Pete Aron is fired by his te...,176,Action|Adventure|Drama,Cherokee Productions|Joel Productions|Douglas ...,1966
9803,0.065141,0,0,Beregis Avtomobilya,Eldar Ryazanov,An insurance agent who moonlights as a carthie...,94,Mystery|Comedy,Mosfilm,1966
9804,0.064317,0,0,"What's Up, Tiger Lily?",Woody Allen,"In comic Woody Allen's film debut, he took the...",80,Action|Comedy,Benedict Pictures Corp.,1966


#### If I created one record for each the `production_companies` a movie was release under and one record each for `genres`<br>and tried to run calculations, it wouldn't work because for many records, the amount of `production_companies`<br>and `genres` aren't the same, so I'll create 2 dataframes; one w/o a `production_companies` column and one w/o a `genres` columns

#### One `production_companies` per record

In [17]:
multiple_production_companies = df.assign(production_companies=df.production_companies.str.split("|")).explode('production_companies')

In [21]:
multiple_production_companies.drop(['genres'], axis='columns', inplace=True)

In [23]:
pc = multiple_production_companies

In [29]:
pc.query('release_year')

Unnamed: 0,popularity,budget,revenue,original_title,director,overview,runtime,production_companies,release_year
2015,0.289469,2500000,0,Lake Placid 3,G.E. Furst,"A game warden moves his family to Lake Placid,...",93,Stage 6 Films,2010
2015,0.289469,2500000,0,Lake Placid 3,G.E. Furst,"A game warden moves his family to Lake Placid,...",93,Curmudgeon Films,2010
2015,0.289469,2500000,0,Lake Placid 3,G.E. Furst,"A game warden moves his family to Lake Placid,...",93,RCR Media Group,2010
2015,0.289469,2500000,0,Lake Placid 3,G.E. Furst,"A game warden moves his family to Lake Placid,...",93,UFO Films,2010
2015,0.289469,2500000,0,Lake Placid 3,G.E. Furst,"A game warden moves his family to Lake Placid,...",93,Stage 6 Films,2010
...,...,...,...,...,...,...,...,...,...
1966,0.395670,0,850994,Mr. Nice,Bernard Rose,Biopic about 1970s British marijuana trafficke...,121,Independent,2010
1966,0.395670,0,850994,Mr. Nice,Bernard Rose,Biopic about 1970s British marijuana trafficke...,121,Kanzaman,2010
1966,0.395670,0,850994,Mr. Nice,Bernard Rose,Biopic about 1970s British marijuana trafficke...,121,Prescience,2010
1966,0.395670,0,850994,Mr. Nice,Bernard Rose,Biopic about 1970s British marijuana trafficke...,121,Lipsync Productions,2010


In [31]:
pc.where(pc.release_year > 2005).production_companies.value_counts().head(10)

Universal Pictures                        149
Warner Bros.                              131
Relativity Media                          108
Columbia Pictures                         100
Paramount Pictures                         92
Twentieth Century Fox Film Corporation     79
Walt Disney Pictures                       79
New Line Cinema                            71
BBC Films                                  67
Lionsgate                                  58
Name: production_companies, dtype: int64

In [13]:
# GENRES
# For every string of genres in that record, split the production companies into a list. 
# This way we should be able to query whichever production company

In [32]:
multiple_genres = df.assign(genres=df.genres.str.split("|")).explode('genres')

In [54]:
multiple_genres.groupby('original_title').revenue.sum().sort_values(ascending=False).head(10)

original_title
Avatar                          11126023388
Star Wars: The Force Awakens     8272712900
Jurassic World                   6054115240
Titanic                          5535102564
The Net                          5531398290
Minions                          4626923848
The Avengers                     4607196562
Shrek 2                          4599193790
Furious 7                        4518748080
The Dark Knight Rises            4324165148
Name: revenue, dtype: int64

#### One `genres` per record

<a id='eda'></a>
## Exploratory Data Analysis

### Which production companies released the most movies in the last 10 years? Display the top 10 production companies.

In [None]:
# Universal Pictures                        149
# Warner Bros.                              131
# Relativity Media                          108
# Columbia Pictures                         100
# Paramount Pictures                         92
# Twentieth Century Fox Film Corporation     79
# Walt Disney Pictures                       79
# New Line Cinema                            71
# BBC Films                                  67
# Lionsgate                                  58

### What 5 movie genres grossed the highest all-time?

In [None]:
# Action       173417346979
# Adventure    166317625752
# Comedy       142141376544
# Drama        138895805395
# Thriller     121188594087

### Who are the top 10 grossing directors?

In [None]:
# Steven Spielberg     24663086098
# James Cameron        20132327500
# Peter Jackson        17530037629
# Michael Bay          16297866403
# Christopher Nolan    16196885522
# David Yates          13401099613
# Robert Zemeckis      13064895818
# J.J. Abrams          12805687973
# Tim Burton           11811526116
# Francis Lawrence     11443904006

### Compare the revenue of the highest grossing movies of all time.

In [None]:
# Avatar                          11126023388
# Star Wars: The Force Awakens     8272712900
# Jurassic World                   6054115240
# Titanic                          5535102564
# The Net                          5531398290
# Minions                          4626923848
# The Avengers                     4607196562
# Shrek 2                          4599193790
# Furious 7                        4518748080
# The Dark Knight Rises            4324165148

<a id='conclusions'></a>
## Conclusions

* Avatar is the highest-grossing movie of all time.

* Steven Spielberg is the highest-grossing director of all time.

* Action movies (not to my surprise) are the highest-grossing movies..

* Disney is not one of the top 5 highest-grossing production companies during the last 10 years.