# Project 1: Explanatory Data Analysis & Data Presentation (Movies Dataset)

Here you´ll have the opportunity to code major parts of Project 1 on your own. If you need any help or inspiration, have a look at the Videos. <br> <br>
Keep in mind that it´s all about __getting the right results/conclusions__. It´s not about finding the identical code. Things can be coded in many different ways. Even if you come to the same conclusions, it´s very unlikely that we have the very same code. 

## Data Import and first Inspection

1. __Import__ the movies dataset from the CSV file "movies_complete.csv". __Inspect__ the data.

__Some additional information on Features/Columns__:

* **id:** The ID of the movie (clear/unique identifier).
* **title:** The Official Title of the movie.
* **tagline:** The tagline of the movie.
* **release_date:** Theatrical Release Date of the movie.
* **genres:** Genres associated with the movie.
* **belongs_to_collection:** Gives information on the movie series/franchise the particular film belongs to.
* **original_language:** The language in which the movie was originally shot in.
* **budget_musd:** The budget of the movie in million dollars.
* **revenue_musd:** The total revenue of the movie in million dollars.
* **production_companies:** Production companies involved with the making of the movie.
* **production_countries:** Countries where the movie was shot/produced in.
* **vote_count:** The number of votes by users, as counted by TMDB.
* **vote_average:** The average rating of the movie.
* **popularity:** The Popularity Score assigned by TMDB.
* **runtime:** The runtime of the movie in minutes.
* **overview:** A brief blurb of the movie.
* **spoken_languages:** Spoken languages in the film.
* **poster_path:** The URL of the poster image.
* **cast:** (Main) Actors appearing in the movie.
* **cast_size:** number of Actors appearing in the movie.
* **director:** Director of the movie.
* **crew_size:** Size of the film crew (incl. director, excl. actors).

## The best and the worst movies...

2. __Filter__ the Dataset and __find the best/worst n Movies__ with the

- Highest Revenue
- Highest Budget
- Highest Profit (=Revenue - Budget)
- Lowest Profit (=Revenue - Budget)
- Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10) 
- Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Highest number of Votes
- Highest Rating (only movies with 10 or more Ratings)
- Lowest Rating (only movies with 10 or more Ratings)
- Highest Popularity

__Define__ an appropriate __user-defined function__ to reuse code.

__Movies Top 5 - Highest Revenue__

In [73]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_columns = 30
pd.options.display.float_format = '{:.2f}'.format
from IPython.display import HTML

In [74]:
df = pd.read_csv('movies_complete.csv')

In [75]:
df.head()

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
0,862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.55,Pixar Animation Studios,United States of America,5415.0,7.7,21.95,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter
1,8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.8,TriStar Pictures|Teitler Film|Interscope Commu...,United States of America,2413.0,6.9,17.02,104.0,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston
2,15602,Grumpier Old Men,Still Yelling. Still Fighting. Still Ready for...,1995-12-22,Romance|Comedy,Grumpy Old Men Collection,en,,,Warner Bros.|Lancaster Gate,United States of America,92.0,6.5,11.71,101.0,A family wedding reignites the ancient feud be...,English,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...,7,4,Howard Deutch
3,31357,Waiting to Exhale,Friends are the people who let you be yourself...,1995-12-22,Comedy|Drama|Romance,,en,16.0,81.45,Twentieth Century Fox Film Corporation,United States of America,34.0,6.1,3.86,127.0,"Cheated on, mistreated and stepped on, the wom...",English,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Whitney Houston|Angela Bassett|Loretta Devine|...,10,10,Forest Whitaker
4,11862,Father of the Bride Part II,Just When His World Is Back To Normal... He's ...,1995-02-10,Comedy,Father of the Bride Collection,en,,76.58,Sandollar Productions|Touchstone Pictures,United States of America,173.0,5.7,8.39,106.0,Just when George Banks has recovered from his ...,English,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Steve Martin|Diane Keaton|Martin Short|Kimberl...,12,7,Charles Shyer


In [76]:
df_best = df[['poster_path', 'title', 'budget_musd', 'revenue_musd', 'vote_count', 'vote_average', 'popularity']].copy()

In [77]:
df_best

Unnamed: 0,poster_path,title,budget_musd,revenue_musd,vote_count,vote_average,popularity
0,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Toy Story,30.00,373.55,5415.00,7.70,21.95
1,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Jumanji,65.00,262.80,2413.00,6.90,17.02
2,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Grumpier Old Men,,,92.00,6.50,11.71
3,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Waiting to Exhale,16.00,81.45,34.00,6.10,3.86
4,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Father of the Bride Part II,,76.58,173.00,5.70,8.39
5,<img src='http://image.tmdb.org/t/p/w185//lbf2...,Heat,60.00,187.44,1886.00,7.70,17.92
6,<img src='http://image.tmdb.org/t/p/w185//z1oN...,Sabrina,58.00,,141.00,6.20,6.68
7,<img src='http://image.tmdb.org/t/p/w185//6yox...,Tom and Huck,,,45.00,5.40,2.56
8,<img src='http://image.tmdb.org/t/p/w185//gV1V...,Sudden Death,35.00,64.35,174.00,5.50,5.23
9,<img src='http://image.tmdb.org/t/p/w185//z0lj...,GoldenEye,58.00,352.19,1194.00,6.60,14.69


In [78]:
df_best['profit_musd'] = df.revenue_musd.sub(df.budget_musd)
df_best['return'] = df.revenue_musd.div(df.budget_musd)

In [79]:
df_best.columns = ['', 'Title', 'Budget', 'Revenue', 'Voters', 'Average Rating', 'Popularity', 'Profit', 'ROI']
df_best.set_index('Title', inplace = True)

In [80]:
df_best

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Voters,Average Rating,Popularity,Profit,ROI
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Toy Story,<img src='http://image.tmdb.org/t/p/w185//uXDf...,30.00,373.55,5415.00,7.70,21.95,343.55,12.45
Jumanji,<img src='http://image.tmdb.org/t/p/w185//vgpX...,65.00,262.80,2413.00,6.90,17.02,197.80,4.04
Grumpier Old Men,<img src='http://image.tmdb.org/t/p/w185//1FSX...,,,92.00,6.50,11.71,,
Waiting to Exhale,<img src='http://image.tmdb.org/t/p/w185//4wjG...,16.00,81.45,34.00,6.10,3.86,65.45,5.09
Father of the Bride Part II,<img src='http://image.tmdb.org/t/p/w185//lf9R...,,76.58,173.00,5.70,8.39,,
Heat,<img src='http://image.tmdb.org/t/p/w185//lbf2...,60.00,187.44,1886.00,7.70,17.92,127.44,3.12
Sabrina,<img src='http://image.tmdb.org/t/p/w185//z1oN...,58.00,,141.00,6.20,6.68,,
Tom and Huck,<img src='http://image.tmdb.org/t/p/w185//6yox...,,,45.00,5.40,2.56,,
Sudden Death,<img src='http://image.tmdb.org/t/p/w185//gV1V...,35.00,64.35,174.00,5.50,5.23,29.35,1.84
GoldenEye,<img src='http://image.tmdb.org/t/p/w185//z0lj...,58.00,352.19,1194.00,6.60,14.69,294.19,6.07


In [81]:
df_best.iloc[0,0]

"<img src='http://image.tmdb.org/t/p/w185//uXDfjJbdP4ijW5hWSBrPrlKpxab.jpg' style='height:100px;'>"

In [82]:
subset = df_best.iloc[:5, :2]

In [83]:
subset

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Toy Story,<img src='http://image.tmdb.org/t/p/w185//uXDf...,30.0
Jumanji,<img src='http://image.tmdb.org/t/p/w185//vgpX...,65.0
Grumpier Old Men,<img src='http://image.tmdb.org/t/p/w185//1FSX...,
Waiting to Exhale,<img src='http://image.tmdb.org/t/p/w185//4wjG...,16.0
Father of the Bride Part II,<img src='http://image.tmdb.org/t/p/w185//lf9R...,


In [84]:
HTML(subset.to_html(escape = False))

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Toy Story,,65.0
Grumpier Old Men,,16.0
Father of the Bride Part II,,


In [85]:
df_best.sort_values(by = 'Average Rating', ascending = False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Voters,Average Rating,Popularity,Profit,ROI
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Portrait of a Young Man in Three Movements,,,,1.00,10.00,0.04,,
Brave Revolutionary,<img src='http://image.tmdb.org/t/p/w185//zAb2...,,,1.00,10.00,0.32,,
Other Voices Other Rooms,<img src='http://image.tmdb.org/t/p/w185//4ifP...,,,1.00,10.00,0.04,,
The Lion of Thebes,<img src='http://image.tmdb.org/t/p/w185//tdOc...,,,1.00,10.00,1.78,,
Katt Williams: Priceless: Afterlife,<img src='http://image.tmdb.org/t/p/w185//wKrH...,,,2.00,10.00,0.48,,
Avetik,<img src='http://image.tmdb.org/t/p/w185//cyc8...,,,3.00,10.00,0.15,,
Acéphale,<img src='http://image.tmdb.org/t/p/w185//b6ps...,,,1.00,10.00,0.05,,
Symphony of the Soil,<img src='http://image.tmdb.org/t/p/w185//2ECO...,,,1.00,10.00,0.05,,
Titus Andronicus,<img src='http://image.tmdb.org/t/p/w185//p74s...,,,1.00,10.00,0.23,,
In Search of Ancient Astronauts,<img src='http://image.tmdb.org/t/p/w185//xYet...,,,1.00,10.00,0.00,,


In [86]:
df_best.sort_values(by = 'ROI', ascending = False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Voters,Average Rating,Popularity,Profit,ROI
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Less Than Zero,<img src='http://image.tmdb.org/t/p/w185//1GY0...,0.00,12.40,77.00,6.10,4.03,12.40,12396383.00
Modern Times,<img src='http://image.tmdb.org/t/p/w185//7uoi...,0.00,8.50,881.00,8.10,8.16,8.50,8500000.00
Welcome to Dongmakgol,<img src='http://image.tmdb.org/t/p/w185//5iGV...,0.00,33.58,49.00,7.70,4.22,33.58,4197476.62
Aquí Entre Nos,<img src='http://image.tmdb.org/t/p/w185//oflx...,0.00,2.76,3.00,6.00,0.23,2.76,2755584.00
"The Karate Kid, Part II",<img src='http://image.tmdb.org/t/p/w185//mSne...,0.00,115.10,457.00,5.90,9.23,115.10,1018619.28
Nurse 3-D,<img src='http://image.tmdb.org/t/p/w185//ny0N...,0.00,10.00,120.00,4.90,5.19,10.00,1000000.00
From Prada to Nada,<img src='http://image.tmdb.org/t/p/w185//jAJa...,0.00,2.50,87.00,5.00,11.10,2.50,26881.72
Paranormal Activity,<img src='http://image.tmdb.org/t/p/w185//1bjA...,0.01,193.36,1351.00,5.90,12.71,193.34,12890.39
Tarnation,<img src='http://image.tmdb.org/t/p/w185//7zeQ...,0.00,1.16,22.00,7.50,1.62,1.16,5330.34
The Blair Witch Project,<img src='http://image.tmdb.org/t/p/w185//bFmb...,0.06,248.00,1090.00,6.30,14.84,247.94,4133.33


# Movies with budget > = 5

In [87]:
df_best.loc[df_best.Budget>=5].sort_values(by = 'ROI', ascending = False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Voters,Average Rating,Popularity,Profit,ROI
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
E.T. the Extra-Terrestrial,<img src='http://image.tmdb.org/t/p/w185//cBfk...,10.50,792.97,3359.00,7.30,19.36,782.47,75.52
My Big Fat Greek Wedding,<img src='http://image.tmdb.org/t/p/w185//3TB2...,5.00,368.74,686.00,6.20,6.72,363.74,73.75
Star Wars,<img src='http://image.tmdb.org/t/p/w185//6FfC...,11.00,775.40,6778.00,8.10,42.15,764.40,70.49
Jaws,<img src='http://image.tmdb.org/t/p/w185//s2xc...,7.00,470.65,2628.00,7.50,19.73,463.65,67.24
Crocodile Dundee,<img src='http://image.tmdb.org/t/p/w185//kiwO...,5.00,328.20,512.00,6.30,7.79,323.20,65.64
The Exorcist,<img src='http://image.tmdb.org/t/p/w185//4ucL...,8.00,441.31,2046.00,7.50,12.14,433.31,55.16
Get Out,<img src='http://image.tmdb.org/t/p/w185//qbaI...,5.00,252.43,2978.00,7.20,36.89,247.43,50.49
Four Weddings and a Funeral,<img src='http://image.tmdb.org/t/p/w185//qa72...,6.00,254.70,654.00,6.60,8.99,248.70,42.45
Paranormal Activity 3,<img src='http://image.tmdb.org/t/p/w185//zPXA...,5.00,205.70,685.00,5.90,11.00,200.70,41.14
The Godfather,<img src='http://image.tmdb.org/t/p/w185//iVZ3...,6.00,245.07,6024.00,8.50,41.11,239.07,40.84


In [88]:
df_best.Budget.fillna(0, inplace = True)
df_best.Voters.fillna(0, inplace = True)

In [90]:
#df_best.info()

# best and worst movie fxn

In [104]:
def best_worst(n, by, ascending = False, min_bud = 0, min_votes = 0):
    df2 = df_best.loc[(df_best.Budget>=min_bud) & (df_best.Voters >=min_votes >=min_votes),
                     ["", by]].sort_values(by = by, ascending = ascending).head(n).copy()
    return HTML(df2.to_html(escape=False))

__Movies Top 5 - Highest Budget__

#### Movies Top 5 - Highest Revenue

In [106]:
best_worst(n=5,by ='Revenue')

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

__Movies Top 5 - Highest Profit__

__Movies Top 5 - Lowest Profit__

__Movies Top 5 - Highest ROI__

__Movies Top 5 - Lowest ROI__

__Movies Top 5 - Most Votes__

__Movies Top 5 - Highest Rating__

__Movies Top 5 - Lowest Rating__

__Movies Top 5 - Most Popular__

## Find your next Movie

3. __Filter__ the Dataset for movies that meet the following conditions:

__Search 1: Science Fiction Action Movie with Bruce Willis (sorted from high to low Rating)__

__Search 2: Movies with Uma Thurman and directed by Quentin Tarantino (sorted from short to long runtime)__

__Search 3: Most Successful Pixar Studio Movies between 2010 and 2015 (sorted from high to low Revenue)__

__Search 4: Action or Thriller Movie with original language English and minimum Rating of 7.5 (most recent movies first)__

## Are Franchises more successful?

4. __Analyze__ the Dataset and __find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies__ in terms of:

- mean revenue
- median Return on Investment
- mean budget raised
- mean popularity
- mean rating

hint: use groupby()

__Franchise vs. Stand-alone: Average Revenue__

__Franchise vs. Stand-alone: Return on Investment / Profitability (median)__

__Franchise vs. Stand-alone: Average Budget__

__Franchise vs. Stand-alone: Average Popularity__

__Franchise vs. Stand-alone: Average Rating__

## Most Successful Franchises

5. __Find__ the __most successful Franchises__ in terms of

- __total number of movies__
- __total & mean budget__
- __total & mean revenue__
- __mean rating__

## Most Successful Directors

6. __Find__ the __most successful Directors__ in terms of

- __total number of movies__
- __total revenue__
- __mean rating__