# Top Earners in Movie Industry

## Table of Contents

<ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#eda">Exploratory Data Analysis</a></li>
    <li><a href="#conclusion">Conclusion</a></li>
</ul>

<a id="#intro"></a>
## Introduction

> This analysis project is to be done using the imdb movie data. When the analysis is completed, you should be able to find the top 5 highest grossing directors, the top 5 highest grossing movie genres of all time, comparing the revenue of the highest grossing movies and which companies released the most movies. 

> There are 10 columns that will not be needed for the analysis. Use pandas to drop these columns. HINT: Only the columns pertaining to revenue will be needed.

> To get you started, I've already placed the needed code for getting the packages and datafile that you will be using for the project. 

In [11]:
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

In [12]:
df = pd.read_csv('./imdb-movies.csv')

### Drop columns without neccesary information and remove all records with no financial information -- Pay close attention to things that don't tell you anything regarding financial data

In [13]:
df_genre = df.drop(['homepage', 'cast', 'homepage', 'tagline', 'overview', 'production_companies'],  axis=1)
df_prod = df.drop(['homepage', 'cast', 'homepage', 'tagline', 'overview', 'genres', ],  axis=1)


### Data Cleaning

In [14]:
# Delete all records with null, or empty values

df_genre = df_genre.dropna()
df_prod = df_prod.dropna()


#### Here's a helpful hint from my own analysis when I ran this the first time. This may help shed light on what your data set should look like.

#### If I created one record for each the `production_companies` a movie was release under and one record each for `genres`<br>and tried to run calculations, it wouldn't work because for many records, the amount of `production_companies`<br>and `genres` aren't the same, so I'll create 2 dataframes; one w/o a `production_companies` column and one w/o a `genres` columns

<a id="eda"></a>
## Exploratory Data Analysis

> Use Matplotlib to display your data analysis

### Which production companies released the most movies in the last 10 years? Display the top 5 production companies.

In [23]:
dicter = dict()
for i in df_prod['production_companies']:
    for company in i.split('|'):
        if company in dicter.keys():
            dicter[company] += 1
        else:
            dicter[company] = 1

print(f'{[k for k, v in dicter.items() if v == max(dicter.values())]} is the company with {max(dicter.values())} published movies.')


['Warner Bros.'] is the company with 495 published movies.


### What 5 movie genres grossed the highest all-time?

In [24]:
df_genre

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,director,keywords,runtime,genres,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Colin Trevorrow,monster|dna|tyrannosaurus rex|velociraptor|island,124,Action|Adventure|Science Fiction|Thriller,6/9/15,5562,6.5,2015,1.379999e+08,1.392446e+09
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,George Miller,future|chase|post-apocalyptic|dystopia|australia,120,Action|Adventure|Science Fiction|Thriller,5/13/15,6185,7.1,2015,1.379999e+08,3.481613e+08
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Robert Schwentke,based on novel|revolution|dystopia|sequel|dyst...,119,Adventure|Science Fiction|Thriller,3/18/15,2480,6.3,2015,1.012000e+08,2.716190e+08
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,J.J. Abrams,android|spaceship|jedi|space opera|3d,136,Action|Adventure|Science Fiction|Fantasy,12/15/15,5292,7.5,2015,1.839999e+08,1.902723e+09
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,James Wan,car race|speed|revenge|suspense|car,137,Action|Crime|Thriller,4/1/15,2947,7.3,2015,1.747999e+08,1.385749e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10861,21,tt0060371,0.080598,0,0,The Endless Summer,Bruce Brown,surfer|surfboard|surfing,95,Documentary,6/15/66,11,7.4,1966,0.000000e+00,0.000000e+00
10862,20379,tt0060472,0.065543,0,0,Grand Prix,John Frankenheimer,car race|racing|formula 1,176,Action|Adventure|Drama,12/21/66,20,5.7,1966,0.000000e+00,0.000000e+00
10863,39768,tt0060161,0.065141,0,0,Beregis Avtomobilya,Eldar Ryazanov,car|trolley|stealing car,94,Mystery|Comedy,1/1/66,11,6.5,1966,0.000000e+00,0.000000e+00
10864,21449,tt0061177,0.064317,0,0,"What's Up, Tiger Lily?",Woody Allen,spoof,80,Action|Comedy,11/2/66,22,5.4,1966,0.000000e+00,0.000000e+00


In [35]:
dicter = dict()

revenue = dict()
for idx, i in enumerate(df_genre['revenue']):
    revenue[idx] = i


for idx,i in enumerate(df_genre['genres']):
    # print(df_genre['revenue'][idx])
    # break
    for genre in i.split('|'):
        if genre in dicter.keys():
            dicter[genre] += revenue[idx]
        else:
            dicter[genre] = revenue[idx]


print(f'{[k for k, v in dicter.items() if v == max(dicter.values())]} is the genre with ${max(dicter.values())} revenue.')


['Action'] is the genre with $171956974038 revenue.


### Who are the top 5 grossing directors?

In [37]:
dicter = dict()

revenue = dict()
for idx, i in enumerate(df_genre['revenue']):
    revenue[idx] = i


for idx, i in enumerate(df_genre['director']):
    if i in dicter.keys():
        dicter[i] += revenue[idx]
    else:
        dicter[i] = revenue[idx]


print(f'{[k for k, v in dicter.items() if v == max(dicter.values())]} is the director with ${max(dicter.values())} of revenue.')


['Steven Spielberg'] is the director with $9018563772 of revenue.


### Compare the revenue of the highest grossing movies of all time.

In [38]:
df_genre

Unnamed: 0,id,imdb_id,popularity,budget,revenue,original_title,director,keywords,runtime,genres,release_date,vote_count,vote_average,release_year,budget_adj,revenue_adj
0,135397,tt0369610,32.985763,150000000,1513528810,Jurassic World,Colin Trevorrow,monster|dna|tyrannosaurus rex|velociraptor|island,124,Action|Adventure|Science Fiction|Thriller,6/9/15,5562,6.5,2015,1.379999e+08,1.392446e+09
1,76341,tt1392190,28.419936,150000000,378436354,Mad Max: Fury Road,George Miller,future|chase|post-apocalyptic|dystopia|australia,120,Action|Adventure|Science Fiction|Thriller,5/13/15,6185,7.1,2015,1.379999e+08,3.481613e+08
2,262500,tt2908446,13.112507,110000000,295238201,Insurgent,Robert Schwentke,based on novel|revolution|dystopia|sequel|dyst...,119,Adventure|Science Fiction|Thriller,3/18/15,2480,6.3,2015,1.012000e+08,2.716190e+08
3,140607,tt2488496,11.173104,200000000,2068178225,Star Wars: The Force Awakens,J.J. Abrams,android|spaceship|jedi|space opera|3d,136,Action|Adventure|Science Fiction|Fantasy,12/15/15,5292,7.5,2015,1.839999e+08,1.902723e+09
4,168259,tt2820852,9.335014,190000000,1506249360,Furious 7,James Wan,car race|speed|revenge|suspense|car,137,Action|Crime|Thriller,4/1/15,2947,7.3,2015,1.747999e+08,1.385749e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10861,21,tt0060371,0.080598,0,0,The Endless Summer,Bruce Brown,surfer|surfboard|surfing,95,Documentary,6/15/66,11,7.4,1966,0.000000e+00,0.000000e+00
10862,20379,tt0060472,0.065543,0,0,Grand Prix,John Frankenheimer,car race|racing|formula 1,176,Action|Adventure|Drama,12/21/66,20,5.7,1966,0.000000e+00,0.000000e+00
10863,39768,tt0060161,0.065141,0,0,Beregis Avtomobilya,Eldar Ryazanov,car|trolley|stealing car,94,Mystery|Comedy,1/1/66,11,6.5,1966,0.000000e+00,0.000000e+00
10864,21449,tt0061177,0.064317,0,0,"What's Up, Tiger Lily?",Woody Allen,spoof,80,Action|Comedy,11/2/66,22,5.4,1966,0.000000e+00,0.000000e+00


In [39]:
dicter = dict()

revenue = dict()
for idx, i in enumerate(df_genre['revenue']):
    revenue[idx] = i


for idx, i in enumerate(df_genre['original_title']):
    if i in dicter.keys():
        dicter[i] += revenue[idx]
    else:
        dicter[i] = revenue[idx]


print(f'{[k for k, v in dicter.items() if v == max(dicter.values())]} is the movie with ${max(dicter.values())} of revenue.')


['Avatar'] is the movie with $2781505847 of revenue.


<a id="conclusions"></a>
## Conclusions

> Using the cell below, write a brief conclusion of what you have found from the anaylsis of the data. The Cell below will allow you to write plan text instead of code.

Warner Bros making action movies, directed by Steven Spielburg, named Avatar will likely gross the most revenue (intentionally joking, don't fail me hahaha).