# Project 1: Explanatory Data Analysis & Data Presentation (Movies Dataset)

# Project Brief for Self-Coders

Here you´ll have the opportunity to code major parts of Project 1 on your own. If you need any help or inspiration, have a look at the Videos or the Jupyter Notebook with the full code. <br> <br>
Keep in mind that it´s all about __getting the right results/conclusions__. It´s not about finding the identical code. Things can be coded in many different ways. Even if you come to the same conclusions, it´s very unlikely that we have the very same code. 

In [4]:
import pandas as pd
import numpy as np
import re
import seaborn as sns
from typing import List
from pandas import DataFrame
pd.options.display.float_format = "{:.2f}".format

## Data Import and first Inspection

1. __Import__ the movies dataset from the CSV file "movies_complete.csv". __Inspect__ the data.

In [2]:
movies_dataframe: DataFrame = pd.read_csv("./Datasets/movies_complete.csv", parse_dates=["release_date"])
movies_dataframe.head(10)   

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,...,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
0,862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.554033,Pixar Animation Studios,...,7.7,21.946943,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter
1,8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.797249,TriStar Pictures|Teitler Film|Interscope Commu...,...,6.9,17.015539,104.0,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston
2,15602,Grumpier Old Men,Still Yelling. Still Fighting. Still Ready for...,1995-12-22,Romance|Comedy,Grumpy Old Men Collection,en,,,Warner Bros.|Lancaster Gate,...,6.5,11.7129,101.0,A family wedding reignites the ancient feud be...,English,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...,7,4,Howard Deutch
3,31357,Waiting to Exhale,Friends are the people who let you be yourself...,1995-12-22,Comedy|Drama|Romance,,en,16.0,81.452156,Twentieth Century Fox Film Corporation,...,6.1,3.859495,127.0,"Cheated on, mistreated and stepped on, the wom...",English,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Whitney Houston|Angela Bassett|Loretta Devine|...,10,10,Forest Whitaker
4,11862,Father of the Bride Part II,Just When His World Is Back To Normal... He's ...,1995-02-10,Comedy,Father of the Bride Collection,en,,76.578911,Sandollar Productions|Touchstone Pictures,...,5.7,8.387519,106.0,Just when George Banks has recovered from his ...,English,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Steve Martin|Diane Keaton|Martin Short|Kimberl...,12,7,Charles Shyer
5,949,Heat,A Los Angeles Crime Saga,1995-12-15,Action|Crime|Drama|Thriller,,en,60.0,187.436818,Regency Enterprises|Forward Pass|Warner Bros.,...,7.7,17.924927,170.0,"Obsessive master thief, Neil McCauley leads a ...",English|Español,<img src='http://image.tmdb.org/t/p/w185//lbf2...,Al Pacino|Robert De Niro|Val Kilmer|Jon Voight...,65,71,Michael Mann
6,11860,Sabrina,You are cordially invited to the most surprisi...,1995-12-15,Comedy|Romance,,en,58.0,,Paramount Pictures|Scott Rudin Productions|Mir...,...,6.2,6.677277,127.0,An ugly duckling having undergone a remarkable...,Français|English,<img src='http://image.tmdb.org/t/p/w185//z1oN...,Harrison Ford|Julia Ormond|Greg Kinnear|Angie ...,57,53,Sydney Pollack
7,45325,Tom and Huck,The Original Bad Boys.,1995-12-22,Action|Adventure|Drama|Family,,en,,,Walt Disney Pictures,...,5.4,2.561161,97.0,"A mischievous young boy, Tom Sawyer, witnesses...",English|Deutsch,<img src='http://image.tmdb.org/t/p/w185//6yox...,Jonathan Taylor Thomas|Brad Renfro|Rachael Lei...,7,4,Peter Hewitt
8,9091,Sudden Death,Terror goes into overtime.,1995-12-22,Action|Adventure|Thriller,,en,35.0,64.350171,Universal Pictures|Imperial Entertainment|Sign...,...,5.5,5.23158,106.0,International action superstar Jean Claude Van...,English,<img src='http://image.tmdb.org/t/p/w185//gV1V...,Jean-Claude Van Damme|Powers Boothe|Dorian Har...,6,9,Peter Hyams
9,710,GoldenEye,No limits. No fears. No substitutes.,1995-11-16,Adventure|Action|Thriller,James Bond Collection,en,58.0,352.194034,United Artists|Eon Productions,...,6.6,14.686036,130.0,James Bond must unmask the mysterious head of ...,English|Pусский|Español,<img src='http://image.tmdb.org/t/p/w185//z0lj...,Pierce Brosnan|Sean Bean|Izabella Scorupco|Fam...,20,46,Martin Campbell


In [3]:
# Get the shape of the movies_dataframe
print("The shape of the movies_dataframe is {}".format(movies_dataframe.shape))

The shape of the movies_dataframe is (44691, 22)


__Some additional information on Features/Columns__:

* **id:** The ID of the movie (clear/unique identifier).
* **title:** The Official Title of the movie.
* **tagline:** The tagline of the movie.
* **release_date:** Theatrical Release Date of the movie.
* **genres:** Genres associated with the movie.
* **belongs_to_collection:** Gives information on the movie series/franchise the particular film belongs to.
* **original_language:** The language in which the movie was originally shot in.
* **budget_musd:** The budget of the movie in million dollars.
* **revenue_musd:** The total revenue of the movie in million dollars.
* **production_companies:** Production companies involved with the making of the movie.
* **production_countries:** Countries where the movie was shot/produced in.
* **vote_count:** The number of votes by users, as counted by TMDB.
* **vote_average:** The average rating of the movie.
* **popularity:** The Popularity Score assigned by TMDB.
* **runtime:** The runtime of the movie in minutes.
* **overview:** A brief blurb of the movie.
* **spoken_languages:** Spoken languages in the film.
* **poster_path:** The URL of the poster image.
* **cast:** (Main) Actors appearing in the movie.
* **cast_size:** number of Actors appearing in the movie.
* **director:** Director of the movie.
* **crew_size:** Size of the film crew (incl. director, excl. actors).

## The best and the worst movies...

2. __Filter__ the Dataset and __find the best/worst n Movies__ with the

- Highest Revenue
- Highest Budget
- Highest Profit (=Revenue - Budget)
- Lowest Profit (=Revenue - Budget)
- Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10) 
- Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Highest number of Votes
- Highest Rating (only movies with 10 or more Ratings)
- Lowest Rating (only movies with 10 or more Ratings)
- Highest Popularity

__Define__ an appropriate __user-defined function__ to reuse code.

__Movies Top 5 - Highest Revenue__

__Movies Top 5 - Highest Budget__

__Movies Top 5 - Highest Profit__

__Movies Top 5 - Lowest Profit__

__Movies Top 5 - Highest ROI__

__Movies Top 5 - Lowest ROI__

__Movies Top 5 - Most Votes__

__Movies Top 5 - Highest Rating__

__Movies Top 5 - Lowest Rating__

__Movies Top 5 - Most Popular__

## Find your next Movie

3. __Filter__ the Dataset for movies that meet the following conditions:

__Search 1: Science Fiction Action Movie with Bruce Willis (sorted from high to low Rating)__

__Search 2: Movies with Uma Thurman and directed by Quentin Tarantino (sorted from short to long runtime)__

__Search 3: Most Successful Pixar Studio Movies between 2010 and 2015 (sorted from high to low Revenue)__

__Search 4: Action or Thriller Movie with original language English and minimum Rating of 7.5 (most recent movies first)__

## Are Franchises more successful?

4. __Analyze__ the Dataset and __find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies__ in terms of:

- mean revenue
- median Return on Investment
- mean budget raised
- mean popularity
- mean rating

hint: use groupby()

__Franchise vs. Stand-alone: Average Revenue__

__Franchise vs. Stand-alone: Return on Investment / Profitability (median)__

__Franchise vs. Stand-alone: Average Budget__

__Franchise vs. Stand-alone: Average Popularity__

__Franchise vs. Stand-alone: Average Rating__

## Most Successful Franchises

5. __Find__ the __most successful Franchises__ in terms of

- __total number of movies__
- __total & mean budget__
- __total & mean revenue__
- __mean rating__

## Most Successful Directors

6. __Find__ the __most successful Directors__ in terms of

- __total number of movies__
- __total revenue__
- __mean rating__