<h1 align="center">Explanatory Data Analysis & Data Presentation (Movies Dataset)</h1>

---

## 0. Load the required libraries

In [113]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [114]:
#Set the options
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 1000)

------

## 1. Load the dataset

In [115]:
#Load the dataset
df = pd.read_csv('movies_complete.csv')

------

## 2. Overview of the dataset

In [116]:
df.head()

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
0,862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.554033,Pixar Animation Studios,United States of America,5415.0,7.7,21.946943,81.0,"Led by Woody, Andy's toys live happi...",English,<img src='http://image.tmdb.org/t/p/...,Tom Hanks|Tim Allen|Don Rickles|Jim ...,13,106,John Lasseter
1,8844,Jumanji,Roll the dice and unleash the excite...,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.797249,TriStar Pictures|Teitler Film|Inters...,United States of America,2413.0,6.9,17.015539,104.0,When siblings Judy and Peter discove...,English|Français,<img src='http://image.tmdb.org/t/p/...,Robin Williams|Jonathan Hyde|Kirsten...,26,16,Joe Johnston
2,15602,Grumpier Old Men,Still Yelling. Still Fighting. Still...,1995-12-22,Romance|Comedy,Grumpy Old Men Collection,en,,,Warner Bros.|Lancaster Gate,United States of America,92.0,6.5,11.7129,101.0,A family wedding reignites the ancie...,English,<img src='http://image.tmdb.org/t/p/...,Walter Matthau|Jack Lemmon|Ann-Margr...,7,4,Howard Deutch
3,31357,Waiting to Exhale,Friends are the people who let you b...,1995-12-22,Comedy|Drama|Romance,,en,16.0,81.452156,Twentieth Century Fox Film Corporation,United States of America,34.0,6.1,3.859495,127.0,"Cheated on, mistreated and stepped o...",English,<img src='http://image.tmdb.org/t/p/...,Whitney Houston|Angela Bassett|Loret...,10,10,Forest Whitaker
4,11862,Father of the Bride Part II,Just When His World Is Back To Norma...,1995-02-10,Comedy,Father of the Bride Collection,en,,76.578911,Sandollar Productions|Touchstone Pic...,United States of America,173.0,5.7,8.387519,106.0,Just when George Banks has recovered...,English,<img src='http://image.tmdb.org/t/p/...,Steve Martin|Diane Keaton|Martin Sho...,12,7,Charles Shyer


__Data Dictionary__:

* **id:** The ID of the movie (clear/unique identifier).
* **title:** The Official Title of the movie.
* **tagline:** The tagline of the movie.
* **release_date:** Theatrical Release Date of the movie.
* **genres:** Genres associated with the movie.
* **belongs_to_collection:** Gives information on the movie series/franchise the particular film belongs to.
* **original_language:** The language in which the movie was originally shot in.
* **budget_musd:** The budget of the movie in million dollars.
* **revenue_musd:** The total revenue of the movie in million dollars.
* **production_companies:** Production companies involved with the making of the movie.
* **production_countries:** Countries where the movie was shot/produced in.
* **vote_count:** The number of votes by users, as counted by TMDB.
* **vote_average:** The average rating of the movie.
* **popularity:** The Popularity Score assigned by TMDB.
* **runtime:** The runtime of the movie in minutes.
* **overview:** A brief blurb of the movie.
* **spoken_languages:** Spoken languages in the film.
* **poster_path:** The URL of the poster image.
* **cast:** (Main) Actors appearing in the movie.
* **cast_size:** number of Actors appearing in the movie.
* **director:** Director of the movie.
* **crew_size:** Size of the film crew (incl. director, excl. actors).

In [117]:
#Let's see the shape of the dataset. It gives us the number of rows and columns in the datasetb
df.shape

(44691, 22)

In [118]:
#Information on the columns and null values in the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44691 entries, 0 to 44690
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     44691 non-null  int64  
 1   title                  44691 non-null  object 
 2   tagline                20284 non-null  object 
 3   release_date           44657 non-null  object 
 4   genres                 42586 non-null  object 
 5   belongs_to_collection  4463 non-null   object 
 6   original_language      44681 non-null  object 
 7   budget_musd            8854 non-null   float64
 8   revenue_musd           7385 non-null   float64
 9   production_companies   33356 non-null  object 
 10  production_countries   38835 non-null  object 
 11  vote_count             44691 non-null  float64
 12  vote_average           42077 non-null  float64
 13  popularity             44691 non-null  float64
 14  runtime                43179 non-null  float64
 15  ov

In [119]:
#Percentage null values in the dataset
((df.isnull().sum()/len(df))*100).sort_values(ascending=False)

belongs_to_collection    90.013649
revenue_musd             83.475420
budget_musd              80.188405
tagline                  54.612786
production_companies     25.363048
production_countries     13.103309
spoken_languages          8.048600
vote_average              5.849052
cast                      4.898078
genres                    4.710121
runtime                   3.383232
overview                  2.127945
director                  1.635676
poster_path               0.501219
release_date              0.076078
original_language         0.022376
crew_size                 0.000000
vote_count                0.000000
popularity                0.000000
cast_size                 0.000000
title                     0.000000
id                        0.000000
dtype: float64

In [120]:
#Total number of duplicate rows
df.duplicated().sum()

0

## 3. The best and the worst movies...

__Filter__ the Dataset and __find the best/worst n Movies__ with the

- Highest Revenue
- Highest Budget
- Highest Profit (=Revenue - Budget)
- Lowest Profit (=Revenue - Budget)
- Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10) 
- Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Highest number of Votes
- Highest Rating (only movies with 10 or more Ratings)
- Lowest Rating (only movies with 10 or more Ratings)
- Highest Popularity

__Define__ an appropriate __user-defined function__ to reuse code.

__Movies Top 5 - Highest Revenue__

In [121]:
from IPython.display import HTML

In [122]:
#Top 5 Movies with Highest Revenue
df[['title','revenue_musd']].sort_values(by=['revenue_musd'], ascending=False).iloc[0:5,][['title', 'revenue_musd']]

Unnamed: 0,title,revenue_musd
14448,Avatar,2787.965087
26265,Star Wars: The Force Awakens,2068.223624
1620,Titanic,1845.034188
17669,The Avengers,1519.55791
24812,Jurassic World,1513.52881


__Movies Top 5 - Highest Budget__

In [123]:
#Create a custom function to fetch movies based on highest budget, profit and other criteria
def get_movies(df, title, criteria, sort_type, min_budget, min_votes):
    subset = df.loc[(df['budget_musd'] > min_budget) & (df['vote_count'] > min_votes), :]
    return subset[[title,criteria]].sort_values(by=[criteria], ascending=sort_type).iloc[0:5,][[title, criteria]]

In [124]:
#Top 5 Movies with Highest Budget
get_movies(df, 'title', 'budget_musd', False, 0,0)

Unnamed: 0,title,budget_musd
16986,Pirates of the Caribbean: On Strange...,380.0
11743,Pirates of the Caribbean: At World's...,300.0
26268,Avengers: Age of Ultron,280.0
10985,Superman Returns,270.0
18517,John Carter,260.0


__Movies Top 5 - Highest Profit__

In [125]:
#Create a new column for profit
df["profit_musd"] = df["revenue_musd"] - df["budget_musd"]

In [126]:
#Fetch movies with highest profit
get_movies(df, 'title', 'profit_musd', False,0,0)

Unnamed: 0,title,profit_musd
14448,Avatar,2550.965087
26265,Star Wars: The Force Awakens,1823.223624
1620,Titanic,1645.034188
24812,Jurassic World,1363.52881
28501,Furious 7,1316.24936


__Movies Top 5 - Lowest Profit__

In [127]:
#Fetch movies with lowest profit
get_movies(df, 'title', 'profit_musd', True,0,0)

Unnamed: 0,title,profit_musd
20959,The Lone Ranger,-165.71009
7164,The Alamo,-119.180039
16659,Mars Needs Moms,-111.007242
43611,Valerian and the City of a Thousand ...,-107.447384
2684,The 13th Warrior,-98.301101


__Movies Top 5 - Highest ROI__

In [128]:
#Create a new column for ROI
df["roi"] = round(df["revenue_musd"] / df["budget_musd"],2)

In [129]:
#Fetch movies with highest ROI and minimum budget of 50
get_movies(df, 'title', 'roi', False,50,0)

Unnamed: 0,title,roi
30330,Minions,15.63
474,Jurassic Park,14.6
26272,Deadpool,13.5
20946,Despicable Me 2,12.77
43294,Despicable Me 3,12.75


__Movies Top 5 - Most Votes__

In [130]:
#Fetch movies with most votes
get_movies(df, 'title', 'vote_count', False,0,0)

Unnamed: 0,title,vote_count
15368,Inception,14075.0
12396,The Dark Knight,12269.0
14448,Avatar,12114.0
17669,The Avengers,12000.0
26272,Deadpool,11444.0


__Movies Top 5 - Highest Rating__

In [131]:
#Fetch movies with highest rating
get_movies(df, 'title', 'vote_average', False,0,50)

Unnamed: 0,title,vote_average
10233,Dilwale Dulhania Le Jayenge,9.1
32987,Human,8.6
826,The Godfather,8.5
313,The Shawshank Redemption,8.5
1166,Psycho,8.3


__Movies Top 5 - Lowest Rating__

In [132]:
#Fetch movies with lowest rating
get_movies(df, 'title', 'vote_average', True,0,100)

Unnamed: 0,title,vote_average
6665,House of the Dead,2.8
13476,Dragonball Evolution,2.9
3439,Battlefield Earth,3.0
12824,Disaster Movie,3.1
9578,Alone in the Dark,3.1


__Movies Top 5 - Most Popular__

In [133]:
#Fetch most popular movies
get_movies(df, 'title', 'popularity', False,20,20)

Unnamed: 0,title,popularity
30330,Minions,547.488298
32927,Wonder Woman,294.337037
41556,Beauty and the Beast,287.253654
42940,Baby Driver,228.032744
24187,Big Hero 6,213.849907


## 4. Find your next Movie

__Filter__ the Dataset for movies that meet the following conditions:

__Search 1: Science Fiction Action Movie with Bruce Willis (sorted from high to low Rating)__

__Search 2: Movies with Uma Thurman and directed by Quentin Tarantino (sorted from short to long runtime)__

__Search 3: Most Successful Pixar Studio Movies between 2010 and 2015 (sorted from high to low Revenue)__

__Search 4: Action or Thriller Movie with original language English and minimum Rating of 7.5 (most recent movies first)__

In [134]:
#Search 1: Science Fiction Action Movie with Bruce Willis (sorted from high to low Rating)

#Create selectors
genres_sel = df.genres.str.contains("Action") & df.genres.str.contains("Science Fiction")
actor_sel = df.cast.str.contains("Bruce Willis")

#Create a combined bool_selector
bool_selector = genres_sel & actor_sel

#Select the matching rows from the dataframe
df.loc[bool_selector,:].sort_values(by=["vote_average"], ascending=False)

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit_musd,roi
1448,18,The Fifth Element,There is no future without it.,1997-05-07,Adventure|Fantasy|Action|Thriller|Sc...,,en,90.0,263.92018,Columbia Pictures|Gaumont,France,3962.0,7.3,24.30526,126.0,"In 2257, a taxi driver is unintentio...",English|svenska|Deutsch,<img src='http://image.tmdb.org/t/p/...,Bruce Willis|Gary Oldman|Ian Holm|Mi...,114,134,Luc Besson,173.92018,2.93
19218,59967,Looper,"Hunted By Your Future, Haunted By Yo...",2012-09-26,Action|Thriller|Science Fiction,,en,30.0,47.042,Endgame Entertainment|FilmDistrict|D...,China|United States of America,4777.0,6.6,12.727269,118.0,In the futuristic action thriller Lo...,English,<img src='http://image.tmdb.org/t/p/...,Joseph Gordon-Levitt|Bruce Willis|Em...,34,42,Rian Johnson,17.042,1.57
1786,95,Armageddon,The Earth's Darkest Day Will Be Man'...,1998-07-01,Action|Thriller|Science Fiction|Adve...,,en,140.0,553.799566,Jerry Bruckheimer Films|Touchstone P...,United States of America,2540.0,6.5,13.235112,151.0,When an asteroid threatens to collid...,English|Pусский,<img src='http://image.tmdb.org/t/p/...,Bruce Willis|Billy Bob Thornton|Ben ...,67,108,Michael Bay,413.799566,3.96
14135,19959,Surrogates,How do you save humanity when the on...,2009-09-24,Action|Science Fiction|Thriller,,en,80.0,122.444772,Touchstone Pictures|Mandeville Films...,United States of America,1219.0,5.9,16.211937,89.0,Set in a futuristic world where huma...,English|Français,<img src='http://image.tmdb.org/t/p/...,Bruce Willis|Radha Mitchell|Rosamund...,44,25,Jonathan Mostow,42.444772,1.53
20333,72559,G.I. Joe: Retaliation,,2013-03-26,Adventure|Action|Science Fiction|Thr...,G.I. Joe (Live-Action) Collection,en,130.0,371.876278,Paramount Pictures|Di Bonaventura Pi...,United States of America,3045.0,5.4,10.560608,110.0,Framed for crimes against the countr...,English,<img src='http://image.tmdb.org/t/p/...,Dwayne Johnson|D.J. Cotrona|Adrianne...,20,28,Jon M. Chu,241.876278,2.86
27619,307663,Vice,Where the future is your past.,2015-01-16,Thriller|Science Fiction|Action|Adve...,,en,10.0,,Grindstone Entertainment Group|K5 In...,United States of America,245.0,4.1,19.236571,96.0,Julian Michaels has designed the ult...,English,<img src='http://image.tmdb.org/t/p/...,Ambyr Childers|Thomas Jane|Bryan Gre...,51,56,Brian A Miller,,


In [135]:
#Search 2: Movies with Uma Thurman and directed by Quentin Tarantino (sorted from short to long runtime)

#Create selectors
director_sel = df.director.str.contains("Quentin Tarantino")
actor_sel = df.cast.str.contains("Uma Thurman")

#Create a combined bool_selector
bool_selector = director_sel & actor_sel

#Select the matching rows from the dataframe
df.loc[bool_selector,:].sort_values(by=["runtime"])

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit_musd,roi
6667,24,Kill Bill: Vol. 1,Go for the kill.,2003-10-10,Action|Crime,Kill Bill Collection,en,30.0,180.949,Miramax Films|A Band Apart|Super Coo...,United States of America,5091.0,7.7,25.261865,111.0,An assassin is shot at the altar by ...,English|日本語|Français,<img src='http://image.tmdb.org/t/p/...,Uma Thurman|Lucy Liu|Vivica A. Fox|D...,36,161,Quentin Tarantino,150.949,6.03
7208,393,Kill Bill: Vol. 2,The bride is back for the final cut.,2004-04-16,Action|Crime|Thriller,Kill Bill Collection,en,30.0,152.159461,Miramax Films|A Band Apart|Super Coo...,United States of America,4061.0,7.7,21.533072,136.0,The Bride unwaveringly continues on ...,English|普通话|Español|广州话 / 廣州話,<img src='http://image.tmdb.org/t/p/...,Uma Thurman|David Carradine|Daryl Ha...,27,130,Quentin Tarantino,122.159461,5.07
291,680,Pulp Fiction,Just because you are a character doe...,1994-09-10,Thriller|Crime,,en,8.0,213.928762,Miramax Films|A Band Apart|Jersey Films,United States of America,8670.0,8.3,140.950236,154.0,"A burger-loving hit man, his philoso...",English|Español|Français,<img src='http://image.tmdb.org/t/p/...,John Travolta|Samuel L. Jackson|Uma ...,54,87,Quentin Tarantino,205.928762,26.74


In [146]:
#Convert release_date to Pandas datetime
df["release_date"] = df[["release_date"]].apply(pd.to_datetime)

In [147]:
#Check release_date column data type
df['release_date'].dtype

dtype('<M8[ns]')

In [148]:
#Search 3: Most Successful Pixar Studio Movies between 2010 and 2015 (sorted from high to low Revenue)

#Create selectors
studio_sel = df.production_companies.str.contains("Pixar")
year_sel = (pd.DatetimeIndex(df['release_date']).year > 2009) & (pd.DatetimeIndex(df['release_date']).year < 2016)

#Create a combined bool_selector
bool_selector = studio_sel & year_sel

#Select the matching rows from the dataframe
df.loc[bool_selector,:].sort_values(by=["revenue_musd"], ascending=False)

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit_musd,roi
15236,10193,Toy Story 3,No toy gets left behind.,2010-06-16,Animation|Family|Comedy,Toy Story Collection,en,200.0,1066.969703,Walt Disney Pictures|Pixar Animation...,United States of America,4710.0,7.6,16.96647,103.0,"Woody, Buzz, and the rest of Andy's ...",English|Español,<img src='http://image.tmdb.org/t/p/...,Tom Hanks|Tim Allen|Ned Beatty|Joan ...,45,38,Lee Unkrich,866.969703,5.33
29957,150540,Inside Out,Meet the little voices inside your h...,2015-06-09,Drama|Comedy|Animation|Family,,en,175.0,857.611174,Walt Disney Pictures|Pixar Animation...,United States of America,6737.0,7.9,23.985587,94.0,"Growing up can be a bumpy road, and ...",English,<img src='http://image.tmdb.org/t/p/...,Amy Poehler|Phyllis Smith|Richard Ki...,65,50,Pete Docter,682.611174,4.9
20888,62211,Monsters University,School never looked this scary.,2013-06-20,Animation|Family,"Monsters, Inc. Collection",en,200.0,743.559607,Walt Disney Pictures|Pixar Animation...,United States of America,3622.0,7.0,16.267502,104.0,A look at the relationship between M...,English,<img src='http://image.tmdb.org/t/p/...,Billy Crystal|John Goodman|Steve Bus...,24,13,Dan Scanlon,543.559607,3.72
17220,49013,Cars 2,Ka-ciao!,2011-06-11,Animation|Family|Adventure|Comedy,Cars Collection,en,200.0,559.852396,Walt Disney Pictures|Pixar Animation...,United States of America,2088.0,5.8,13.693002,106.0,Star race car Lightning McQueen and ...,English|日本語|Italiano|Français,<img src='http://image.tmdb.org/t/p/...,Owen Wilson|Larry the Cable Guy|Mich...,47,40,John Lasseter,359.852396,2.8
18900,62177,Brave,Change your fate.,2012-06-21,Animation|Adventure|Comedy|Family|Ac...,,en,185.0,538.983207,Walt Disney Pictures|Pixar Animation...,United States of America,4760.0,6.7,15.876341,93.0,Brave is set in the mystical Scottis...,English,<img src='http://image.tmdb.org/t/p/...,Kelly Macdonald|Billy Connolly|Emma ...,15,44,Brenda Chapman,353.983207,2.91
30388,105864,The Good Dinosaur,Little Arms With Big Attitude,2015-11-14,Adventure|Animation|Family,,en,175.0,331.926147,Walt Disney Pictures|Pixar Animation...,United States of America,1782.0,6.6,12.319595,93.0,An epic journey into the world of di...,English,<img src='http://image.tmdb.org/t/p/...,Raymond Ochoa|Jack Bright|Jeffrey Wr...,19,11,Peter Sohn,156.926147,1.9
16392,40619,Day & Night,,2010-06-17,Animation|Family,,en,,,Walt Disney Pictures|Pixar Animation...,United States of America,272.0,7.6,6.345512,6.0,"When Day, a sunny fellow, encounters...",,<img src='http://image.tmdb.org/t/p/...,Wayne Dyer,1,1,Teddy Newton,,
21694,200481,The Blue Umbrella,,2013-02-12,Animation|Romance,,en,,,Pixar Animation Studios,United States of America,183.0,7.8,6.568023,7.0,It is just another evening commute u...,No Language,<img src='http://image.tmdb.org/t/p/...,Sarah Jaffe,1,1,Saschka Unseld,,
21697,213121,Toy Story of Terror!,One toy gets left behind!,2013-10-15,Animation|Comedy|Family,,en,,,Walt Disney Pictures|Pixar Animation...,United States of America,246.0,7.3,0.512025,22.0,What starts out as a fun road trip f...,English,<img src='http://image.tmdb.org/t/p/...,Tom Hanks|Tim Allen|Kristen Schaal|C...,8,8,Angus MacLane,,
22489,83564,La luna,A young boy discovers his family's m...,2011-01-01,Animation|Family,,en,,,Pixar Animation Studios,United States of America,257.0,8.0,7.331398,7.0,A young boy comes of age in the most...,English,<img src='http://image.tmdb.org/t/p/...,Krista Sheffler|Tony Fucile|Phil She...,3,9,Enrico Casarosa,,


In [151]:
#Search 4: Action or Thriller Movie with original language English and minimum Rating of 7.5 (most recent movies first)

#Create selectors
genres_sel = df.genres.str.contains("Action") | df.genres.str.contains("Thriller")
lang_sel = df["original_language"] == "en"
rating_sel = df["vote_average"] >= 7.5

#Create a combined bool_selector
bool_selector = genres_sel & lang_sel & rating_sel

#Select the matching rows from the dataframe
df.loc[bool_selector,:].sort_values(by=["release_date"])

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director,profit_musd,roi
34547,128899,Bardelys the Magnificent,The screen's great lover in a dashin...,1926-09-30,Action|Drama|Romance,,en,,,Metro-Goldwyn-Mayer (MGM),United States of America,1.0,8.0,0.978542,90.0,Rafael Sabatini's story of the swash...,English,<img src='http://image.tmdb.org/t/p/...,John Gilbert|Eleanor Boardman|Roy D'...,6,11,King Vidor,,
2879,961,The General,"Buster drives ""The General"" to train...",1926-12-31,Action|Adventure|Comedy|Drama,,en,0.75,,Buster Keaton Productions|Joseph M. ...,United States of America,240.0,8.0,8.002953,79.0,During America’s Civil War Union spi...,English,<img src='http://image.tmdb.org/t/p/...,Buster Keaton|Marion Mack|Glen Caven...,22,25,Buster Keaton,,
8255,25768,"Steamboat Bill, Jr.",The Laugh Special of the Age. See It.,1928-02-14,Action|Comedy,,en,,,Buster Keaton Productions,United States of America,66.0,7.9,7.518657,70.0,The just out of college effete son o...,,<img src='http://image.tmdb.org/t/p/...,Buster Keaton|Ernest Torrence|Tom Mc...,6,12,Buster Keaton,,
8268,877,Scarface,The rise and fall of a power hungry ...,1932-04-09,Action|Adventure|Crime|Drama|Thriller,,en,,0.600000,United Artists|The Caddo Company,United States of America,88.0,7.5,4.854436,90.0,"Big Louis Costillo, last of the old-...",English|Italiano,<img src='http://image.tmdb.org/t/p/...,Paul Muni|Ann Dvorak|Karen Morley|Os...,41,22,Howard Hawks,,
11135,44892,The Music Box,Mr. Laurel and Mr. Hardy decided to ...,1932-04-16,Action|Comedy,,en,,,Hal Roach Studios,United States of America,39.0,7.5,2.186467,29.0,The Laurel &amp; Hardy Moving Co. ha...,English,<img src='http://image.tmdb.org/t/p/...,Stan Laurel|Oliver Hardy|Dinah|Glady...,9,9,James Parrott,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43467,416445,Revengeance,Revenge is a dish best served animated,2017-04-05,Comedy|Action|Animation,,en,,,Plymptoons,United States of America,2.0,8.0,1.095080,71.0,A low-rent bounty hunter named Rod R...,English,<img src='http://image.tmdb.org/t/p/...,Charley Rossman|Robert LuJane,2,2,Bill Plympton,,
26273,283995,Guardians of the Galaxy Vol. 2,Obviously.,2017-04-19,Action|Adventure|Comedy|Science Fiction,Guardians of the Galaxy Collection,en,200.00,863.416141,Walt Disney Pictures|Marvel Studios,United States of America,4858.0,7.6,185.330992,137.0,The Guardians must fight to keep the...,English,<img src='http://image.tmdb.org/t/p/...,Chris Pratt|Zoe Saldana|Dave Bautist...,63,131,James Gunn,663.416141,4.32
42624,382614,The Book of Henry,Never leave things undone.,2017-06-16,Thriller|Drama|Crime,,en,10.00,4.219536,Sidney Kimmel Entertainment|Double N...,United States of America,84.0,7.6,24.553725,105.0,"Naomi Watts stars as Susan, a single...",English,<img src='http://image.tmdb.org/t/p/...,Naomi Watts|Jaeden Lieberher|Jacob T...,27,27,Colin Trevorrow,-5.780464,0.42
43941,374720,Dunkirk,The event that shaped our world,2017-07-19,Action|Drama|History|Thriller|War,,en,100.00,519.876949,Canal+|Studio Canal|Warner Bros.|Syn...,Netherlands|France|United Kingdom|Un...,2712.0,7.5,30.938854,107.0,The miraculous evacuation of Allied ...,English|Français|Deutsch,<img src='http://image.tmdb.org/t/p/...,Fionn Whitehead|Tom Glynn-Carney|Jac...,66,214,Christopher Nolan,419.876949,5.20


## 5. Are Franchises more successful?

__Analyze__ the Dataset and __find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies__ in terms of:

- mean revenue
- median Return on Investment
- mean budget raised
- mean popularity
- mean rating

hint: use groupby()

__Franchise vs. Stand-alone: Average Revenue__

In [160]:
df["Franchise"] = df.belongs_to_collection.notna()

In [161]:
df.groupby("Franchise").revenue_musd.mean()

Franchise
False     44.742814
True     165.708193
Name: revenue_musd, dtype: float64

__Franchise vs. Stand-alone: Return on Investment / Profitability (median)__

In [171]:
df.groupby("Franchise").roi.median()

Franchise
False    1.62
True     3.71
Name: roi, dtype: float64

__Franchise vs. Stand-alone: Average Budget__

In [163]:
df.groupby("Franchise").budget_musd.mean()

Franchise
False    18.047741
True     38.319847
Name: budget_musd, dtype: float64

__Franchise vs. Stand-alone: Average Popularity__

In [165]:
df.groupby("Franchise").popularity.mean()

Franchise
False    2.592726
True     6.245051
Name: popularity, dtype: float64

__Franchise vs. Stand-alone: Average Rating__

In [166]:
df.groupby("Franchise").vote_average.mean()

Franchise
False    6.008787
True     5.956806
Name: vote_average, dtype: float64

## 6. Most Successful Franchises

__Find__ the __most successful Franchises__ in terms of

- __total number of movies__
- __total & mean budget__
- __total & mean revenue__
- __mean rating__

In [173]:
#Most successful franchises by total number of movies
df["belongs_to_collection"].value_counts(ascending=False, dropna = True)[0:5]

The Bowery Boys                  29
Totò Collection                  27
Zatôichi: The Blind Swordsman    26
James Bond Collection            26
The Carry On Collection          25
Name: belongs_to_collection, dtype: int64

In [186]:
#Most successful franchises by mean budget
df.groupby('belongs_to_collection')['revenue_musd'].mean().dropna().sort_values(ascending = False)[0:5,]

belongs_to_collection
Avatar Collection          2787.965087
The Avengers Collection    1462.480802
Frozen Collection          1274.219009
Finding Nemo Collection     984.453213
The Hobbit Collection       978.507785
Name: revenue_musd, dtype: float64

In [188]:
#Most successful franchises by total budget
df.groupby('belongs_to_collection')['revenue_musd'].sum().dropna().sort_values(ascending = False)[0:5,]

belongs_to_collection
Harry Potter Collection                7707.367425
Star Wars Collection                   7434.494790
James Bond Collection                  7106.970239
The Fast and the Furious Collection    5125.098793
Pirates of the Caribbean Collection    4521.576826
Name: revenue_musd, dtype: float64

## 7. Most Successful Directors

__Find__ the __most successful Directors__ in terms of

- __total number of movies__
- __total revenue__
- __mean rating__

In [190]:
#Most successful Director by total number of movies
df["director"].value_counts()[0:5,]

John Ford           66
Michael Curtiz      65
Werner Herzog       54
Alfred Hitchcock    53
Georges Méliès      49
Name: director, dtype: int64

In [194]:
#Most successful Director by revenue
df.groupby('director')['revenue_musd'].sum().nlargest(20)

director
Steven Spielberg     9256.621422
Peter Jackson        6528.244659
Michael Bay          6437.466781
James Cameron        5900.610310
David Yates          5334.563196
Christopher Nolan    4747.408665
Robert Zemeckis      4138.233542
Tim Burton           4032.916124
Ridley Scott         3917.529240
Chris Columbus       3866.836869
Roland Emmerich      3798.402596
Ron Howard           3714.152341
J.J. Abrams          3579.215315
Gore Verbinski       3575.339236
George Lucas         3341.550893
Sam Raimi            3193.788606
Francis Lawrence     3183.341910
Clint Eastwood       3100.675162
Bill Condon          3017.298095
Joss Whedon          2963.831071
Name: revenue_musd, dtype: float64

In [198]:
#Most successful Director by mean_rating
directors = df.groupby("director").agg({"title": "count", "vote_average" :"mean", "vote_count": "sum"})

In [199]:
#Select specific directors based on vote count and title
directors[(directors.vote_count >= 10000) & (directors.title >= 10)].nlargest(20, "vote_average")

Unnamed: 0_level_0,title,vote_average,vote_count
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Hayao Miyazaki,14,7.7,14700.0
Christopher Nolan,11,7.618182,67344.0
Quentin Tarantino,10,7.49,45910.0
Wes Anderson,10,7.37,11743.0
David Fincher,10,7.35,37588.0
Martin Scorsese,39,7.218421,35541.0
Peter Jackson,13,7.138462,47571.0
Joel Coen,17,7.023529,18139.0
James Cameron,11,6.927273,33736.0
Stanley Kubrick,16,6.9125,18214.0


---
# End of sheet