# IMDB Data Analysis & Data Presentation

## Project Description

The IMDb Data Analysis Project is a detailed investigation into the vast IMDb database, aiming to uncover trends and insights in the film industry. By analyzing movie ratings, genres, release dates, cast and crew data, and audience preferences, the project aims to understand the factors influencing movie success. It will also explore the impact of technology on film, and provide valuable insights for filmmakers and production studios.

## Importing Libraries and modules

In [1]:
import pandas as pd
import numpy as np
pd.options.display.max_columns=None  # type: ignore
pd.options.display.float_format='{:.2F}'.format

## Loading the dataset and Inspecting the data

In [2]:
df=pd.read_csv('movies_complete.csv')
df.head()

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
0,862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.55,Pixar Animation Studios,United States of America,5415.0,7.7,21.95,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter
1,8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.8,TriStar Pictures|Teitler Film|Interscope Commu...,United States of America,2413.0,6.9,17.02,104.0,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston
2,15602,Grumpier Old Men,Still Yelling. Still Fighting. Still Ready for...,1995-12-22,Romance|Comedy,Grumpy Old Men Collection,en,,,Warner Bros.|Lancaster Gate,United States of America,92.0,6.5,11.71,101.0,A family wedding reignites the ancient feud be...,English,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...,7,4,Howard Deutch
3,31357,Waiting to Exhale,Friends are the people who let you be yourself...,1995-12-22,Comedy|Drama|Romance,,en,16.0,81.45,Twentieth Century Fox Film Corporation,United States of America,34.0,6.1,3.86,127.0,"Cheated on, mistreated and stepped on, the wom...",English,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Whitney Houston|Angela Bassett|Loretta Devine|...,10,10,Forest Whitaker
4,11862,Father of the Bride Part II,Just When His World Is Back To Normal... He's ...,1995-02-10,Comedy,Father of the Bride Collection,en,,76.58,Sandollar Productions|Touchstone Pictures,United States of America,173.0,5.7,8.39,106.0,Just when George Banks has recovered from his ...,English,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Steve Martin|Diane Keaton|Martin Short|Kimberl...,12,7,Charles Shyer


## Getting Info about the data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44691 entries, 0 to 44690
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     44691 non-null  int64  
 1   title                  44691 non-null  object 
 2   tagline                20284 non-null  object 
 3   release_date           44657 non-null  object 
 4   genres                 42586 non-null  object 
 5   belongs_to_collection  4463 non-null   object 
 6   original_language      44681 non-null  object 
 7   budget_musd            8854 non-null   float64
 8   revenue_musd           7385 non-null   float64
 9   production_companies   33356 non-null  object 
 10  production_countries   38835 non-null  object 
 11  vote_count             44691 non-null  float64
 12  vote_average           42077 non-null  float64
 13  popularity             44691 non-null  float64
 14  runtime                43179 non-null  float64
 15  ov

## Converting release_date column to DateTime

In [4]:
df['release_date']=pd.to_datetime(df['release_date'])
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44691 entries, 0 to 44690
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   id                     44691 non-null  int64         
 1   title                  44691 non-null  object        
 2   tagline                20284 non-null  object        
 3   release_date           44657 non-null  datetime64[ns]
 4   genres                 42586 non-null  object        
 5   belongs_to_collection  4463 non-null   object        
 6   original_language      44681 non-null  object        
 7   budget_musd            8854 non-null   float64       
 8   revenue_musd           7385 non-null   float64       
 9   production_companies   33356 non-null  object        
 10  production_countries   38835 non-null  object        
 11  vote_count             44691 non-null  float64       
 12  vote_average           42077 non-null  float64       
 13  p

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
0,862,Toy Story,,1995-10-30,Animation|Comedy|Family,Toy Story Collection,en,30.0,373.55,Pixar Animation Studios,United States of America,5415.0,7.7,21.95,81.0,"Led by Woody, Andy's toys live happily in his ...",English,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...,13,106,John Lasseter
1,8844,Jumanji,Roll the dice and unleash the excitement!,1995-12-15,Adventure|Fantasy|Family,,en,65.0,262.8,TriStar Pictures|Teitler Film|Interscope Commu...,United States of America,2413.0,6.9,17.02,104.0,When siblings Judy and Peter discover an encha...,English|Français,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...,26,16,Joe Johnston
2,15602,Grumpier Old Men,Still Yelling. Still Fighting. Still Ready for...,1995-12-22,Romance|Comedy,Grumpy Old Men Collection,en,,,Warner Bros.|Lancaster Gate,United States of America,92.0,6.5,11.71,101.0,A family wedding reignites the ancient feud be...,English,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...,7,4,Howard Deutch
3,31357,Waiting to Exhale,Friends are the people who let you be yourself...,1995-12-22,Comedy|Drama|Romance,,en,16.0,81.45,Twentieth Century Fox Film Corporation,United States of America,34.0,6.1,3.86,127.0,"Cheated on, mistreated and stepped on, the wom...",English,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Whitney Houston|Angela Bassett|Loretta Devine|...,10,10,Forest Whitaker
4,11862,Father of the Bride Part II,Just When His World Is Back To Normal... He's ...,1995-02-10,Comedy,Father of the Bride Collection,en,,76.58,Sandollar Productions|Touchstone Pictures,United States of America,173.0,5.7,8.39,106.0,Just when George Banks has recovered from his ...,English,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Steve Martin|Diane Keaton|Martin Short|Kimberl...,12,7,Charles Shyer


## Statistical Summary

In [5]:
df.describe(include=[np.number])

Unnamed: 0,id,budget_musd,revenue_musd,vote_count,vote_average,popularity,runtime,cast_size,crew_size
count,44691.0,8854.0,7385.0,44691.0,42077.0,44691.0,43179.0,44691.0,44691.0
mean,107186.24,21.67,68.97,111.65,6.0,2.96,97.57,12.48,10.31
std,111806.36,34.36,146.61,495.32,1.28,6.04,34.65,12.12,15.89
min,2.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
25%,26033.5,2.0,2.41,3.0,5.3,0.4,86.0,6.0,2.0
50%,59110.0,8.2,16.87,10.0,6.1,1.15,95.0,10.0,6.0
75%,154251.0,25.0,67.64,35.0,6.8,3.77,107.0,15.0,12.0
max,469172.0,380.0,2787.97,14075.0,10.0,547.49,1256.0,313.0,435.0


## Columns of the dataset

In [6]:
# df.columns

['id', 'title', 'tagline', 'release_date', 'genres',
       'belongs_to_collection', 'original_language', 'budget_musd',
       'revenue_musd', 'production_companies', 'production_countries',
       'vote_count', 'vote_average', 'popularity', 'runtime', 'overview',
       'spoken_languages', 'poster_path', 'cast', 'cast_size', 'crew_size',
       'director']

## Filtering Columns responsible to determine best and worst movies

In [7]:
ndf=df[['poster_path','title','budget_musd','revenue_musd','vote_count', 'vote_average', 'popularity']]
ndf

Unnamed: 0,poster_path,title,budget_musd,revenue_musd,vote_count,vote_average,popularity
0,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Toy Story,30.00,373.55,5415.00,7.70,21.95
1,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Jumanji,65.00,262.80,2413.00,6.90,17.02
2,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Grumpier Old Men,,,92.00,6.50,11.71
3,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Waiting to Exhale,16.00,81.45,34.00,6.10,3.86
4,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Father of the Bride Part II,,76.58,173.00,5.70,8.39
...,...,...,...,...,...,...,...
44686,<img src='http://image.tmdb.org/t/p/w185//pfC8...,Subdue,,,1.00,4.00,0.07
44687,<img src='http://image.tmdb.org/t/p/w185//xZkm...,Century of Birthing,,,3.00,9.00,0.18
44688,<img src='http://image.tmdb.org/t/p/w185//eGga...,Betrayal,,,6.00,3.80,0.90
44689,<img src='http://image.tmdb.org/t/p/w185//aorB...,Satan Triumphant,,,0.00,,0.00


## Create a column 'profit_musd' (revenue - budget)

In [8]:
ndf['profit_musd']=ndf['revenue_musd']-ndf['budget_musd']
ndf.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ndf['profit_musd']=ndf['revenue_musd']-ndf['budget_musd']


Unnamed: 0,poster_path,title,budget_musd,revenue_musd,vote_count,vote_average,popularity,profit_musd
0,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Toy Story,30.0,373.55,5415.0,7.7,21.95,343.55
1,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Jumanji,65.0,262.8,2413.0,6.9,17.02,197.8
2,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Grumpier Old Men,,,92.0,6.5,11.71,
3,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Waiting to Exhale,16.0,81.45,34.0,6.1,3.86,65.45
4,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Father of the Bride Part II,,76.58,173.0,5.7,8.39,


## Create a column 'return_musd' (revenue/budget)

In [9]:
ndf['return_musd']=ndf['revenue_musd']/ndf['budget_musd']
ndf.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ndf['return_musd']=ndf['revenue_musd']/ndf['budget_musd']


Unnamed: 0,poster_path,title,budget_musd,revenue_musd,vote_count,vote_average,popularity,profit_musd,return_musd
0,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Toy Story,30.0,373.55,5415.0,7.7,21.95,343.55,12.45
1,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Jumanji,65.0,262.8,2413.0,6.9,17.02,197.8,4.04
2,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Grumpier Old Men,,,92.0,6.5,11.71,,
3,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Waiting to Exhale,16.0,81.45,34.0,6.1,3.86,65.45,5.09
4,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Father of the Bride Part II,,76.58,173.0,5.7,8.39,,


## Rename Columns in Something Meaningful to present it later in Graphs

In [10]:
ndf=ndf.rename(columns={'poster_path':'',
                    'budget_musd':'Budget',
                    'revenue_musd':'Revenue',
                    'vote_count':'Vote',
                    'vote_average':'Average Rating',
                    'profit_musd':'Profit',
                    'return_musd':'Return'
                    })
ndf.columns=ndf.columns.str.title()
ndf.head()
# 'poster_path','title','budget_musd','revenue_musd','vote_count', 'vote_average', 'popularity'

Unnamed: 0,Unnamed: 1,Title,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
0,<img src='http://image.tmdb.org/t/p/w185//uXDf...,Toy Story,30.0,373.55,5415.0,7.7,21.95,343.55,12.45
1,<img src='http://image.tmdb.org/t/p/w185//vgpX...,Jumanji,65.0,262.8,2413.0,6.9,17.02,197.8,4.04
2,<img src='http://image.tmdb.org/t/p/w185//1FSX...,Grumpier Old Men,,,92.0,6.5,11.71,,
3,<img src='http://image.tmdb.org/t/p/w185//4wjG...,Waiting to Exhale,16.0,81.45,34.0,6.1,3.86,65.45,5.09
4,<img src='http://image.tmdb.org/t/p/w185//lf9R...,Father of the Bride Part II,,76.58,173.0,5.7,8.39,,


## Set Title as Index

In [11]:
ndf=ndf.set_index('Title')
ndf.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Toy Story,<img src='http://image.tmdb.org/t/p/w185//uXDf...,30.0,373.55,5415.0,7.7,21.95,343.55,12.45
Jumanji,<img src='http://image.tmdb.org/t/p/w185//vgpX...,65.0,262.8,2413.0,6.9,17.02,197.8,4.04
Grumpier Old Men,<img src='http://image.tmdb.org/t/p/w185//1FSX...,,,92.0,6.5,11.71,,
Waiting to Exhale,<img src='http://image.tmdb.org/t/p/w185//4wjG...,16.0,81.45,34.0,6.1,3.86,65.45,5.09
Father of the Bride Part II,<img src='http://image.tmdb.org/t/p/w185//lf9R...,,76.58,173.0,5.7,8.39,,


## Highest Rated Movies

In [12]:
ndf.sort_values(by='Average Rating',ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Portrait of a Young Man in Three Movements,,,,1.00,10.00,0.04,,
Brave Revolutionary,<img src='http://image.tmdb.org/t/p/w185//zAb2...,,,1.00,10.00,0.32,,
Other Voices Other Rooms,<img src='http://image.tmdb.org/t/p/w185//4ifP...,,,1.00,10.00,0.04,,
The Lion of Thebes,<img src='http://image.tmdb.org/t/p/w185//tdOc...,,,1.00,10.00,1.78,,
Katt Williams: Priceless: Afterlife,<img src='http://image.tmdb.org/t/p/w185//wKrH...,,,2.00,10.00,0.48,,
...,...,...,...,...,...,...,...,...
Altar of Fire,<img src='http://image.tmdb.org/t/p/w185//iJ78...,,,0.00,,0.00,,
The Wonders of Aladdin,<img src='http://image.tmdb.org/t/p/w185//AvfX...,,,0.00,,0.09,,
Deep Hearts,<img src='http://image.tmdb.org/t/p/w185//8jI4...,,,0.00,,0.01,,
Satan Triumphant,<img src='http://image.tmdb.org/t/p/w185//aorB...,,,0.00,,0.00,,


As we can see here that the highest rated movie is having one vote, so it's not suffient enough to judge movie on the basis of rating. so now let's find the median of votes and then filter movies above median of votes.


In [13]:
median_vote=ndf['Vote'].median()
median_vote

10.0

## Filtering movie on the basis of median votes and then finding the highest rated movie

In [14]:
ndf.loc[ndf['Vote']>=median_vote].sort_values(by='Average Rating',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
As I Was Moving Ahead Occasionally I Saw Brief Glimpses of Beauty,<img src='http://image.tmdb.org/t/p/w185//k0I6...,,,10.0,9.5,0.8,,
Planet Earth II,<img src='http://image.tmdb.org/t/p/w185//gTvA...,,,50.0,9.5,5.65,,
The Civil War,<img src='http://image.tmdb.org/t/p/w185//r4sW...,,,15.0,9.2,3.43,,
Dilwale Dulhania Le Jayenge,<img src='http://image.tmdb.org/t/p/w185//2CAL...,13.2,100.0,661.0,9.1,34.46,86.8,7.58
Cosmos,<img src='http://image.tmdb.org/t/p/w185//mYrn...,,,41.0,9.1,0.28,,


## Movies With Highest ROI

In [15]:
ndf.sort_values(by='Return',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Less Than Zero,<img src='http://image.tmdb.org/t/p/w185//1GY0...,0.0,12.4,77.0,6.1,4.03,12.4,12396383.0
Modern Times,<img src='http://image.tmdb.org/t/p/w185//7uoi...,0.0,8.5,881.0,8.1,8.16,8.5,8500000.0
Welcome to Dongmakgol,<img src='http://image.tmdb.org/t/p/w185//5iGV...,0.0,33.58,49.0,7.7,4.22,33.58,4197476.62
Aquí Entre Nos,<img src='http://image.tmdb.org/t/p/w185//oflx...,0.0,2.76,3.0,6.0,0.23,2.76,2755584.0
"The Karate Kid, Part II",<img src='http://image.tmdb.org/t/p/w185//mSne...,0.0,115.1,457.0,5.9,9.23,115.1,1018619.28


As we can see that highest ROI movie is having budget 0, so it's not sufficient enough to find Highest ROI movie. So we can find median budget and then find the higest ROI movie.

In [16]:
median_budget=ndf['Budget'].median()
median_budget

8.2

## Filtering movie on the basis of median Budget and then finding the highest ROI movie

In [17]:
ndf.loc[ndf['Budget']>=median_budget].sort_values(by='Return',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
E.T. the Extra-Terrestrial,<img src='http://image.tmdb.org/t/p/w185//cBfk...,10.5,792.97,3359.0,7.3,19.36,782.47,75.52
Star Wars,<img src='http://image.tmdb.org/t/p/w185//6FfC...,11.0,775.4,6778.0,8.1,42.15,764.4,70.49
The Sound of Music,<img src='http://image.tmdb.org/t/p/w185//5qQT...,8.2,286.21,966.0,7.4,9.07,278.01,34.9
Pretty Woman,<img src='http://image.tmdb.org/t/p/w185//hMVM...,14.0,463.0,1807.0,7.0,13.35,449.0,33.07
The Intouchables,<img src='http://image.tmdb.org/t/p/w185//w7Wx...,13.0,426.48,5410.0,8.2,16.09,413.48,32.81


## Filling NAN values in Budget column with 0

In [18]:
ndf['Budget']=ndf['Budget'].fillna(0)
ndf.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Toy Story,<img src='http://image.tmdb.org/t/p/w185//uXDf...,30.0,373.55,5415.0,7.7,21.95,343.55,12.45
Jumanji,<img src='http://image.tmdb.org/t/p/w185//vgpX...,65.0,262.8,2413.0,6.9,17.02,197.8,4.04
Grumpier Old Men,<img src='http://image.tmdb.org/t/p/w185//1FSX...,0.0,,92.0,6.5,11.71,,
Waiting to Exhale,<img src='http://image.tmdb.org/t/p/w185//4wjG...,16.0,81.45,34.0,6.1,3.86,65.45,5.09
Father of the Bride Part II,<img src='http://image.tmdb.org/t/p/w185//lf9R...,0.0,76.58,173.0,5.7,8.39,,


## Create a Function to find Best and Worst Movies

In [19]:
def best_movie():
    return ndf.loc[(ndf['Budget']>=median_budget) & (ndf['Vote']>=median_vote)].sort_values(by=['Return','Vote'],ascending=[False,False])
    # return top_movie.first_valid_index()
def worst_movie():
    return ndf.loc[(ndf['Budget']>=median_budget) & (ndf['Vote']>=median_vote)].sort_values(by=['Return','Vote'],ascending=[True,True])
    # return least_movie.first_valid_index()

## Top 5 - Highest Revenue

In [20]:
ndf.sort_values(by='Revenue',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avatar,<img src='http://image.tmdb.org/t/p/w185//btnl...,237.0,2787.97,12114.0,7.2,185.07,2550.97,11.76
Star Wars: The Force Awakens,<img src='http://image.tmdb.org/t/p/w185//9rd0...,245.0,2068.22,7993.0,7.5,31.63,1823.22,8.44
Titanic,<img src='http://image.tmdb.org/t/p/w185//9xjZ...,200.0,1845.03,7770.0,7.5,26.89,1645.03,9.23
The Avengers,<img src='http://image.tmdb.org/t/p/w185//RYMX...,220.0,1519.56,12000.0,7.4,89.89,1299.56,6.91
Jurassic World,<img src='http://image.tmdb.org/t/p/w185//rhr4...,150.0,1513.53,8842.0,6.5,32.79,1363.53,10.09


## Top 5 - Highest Budget

In [21]:
ndf.sort_values(by='Budget',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Pirates of the Caribbean: On Stranger Tides,<img src='http://image.tmdb.org/t/p/w185//keGf...,380.0,1045.71,5068.0,6.4,27.89,665.71,2.75
Pirates of the Caribbean: At World's End,<img src='http://image.tmdb.org/t/p/w185//oVh3...,300.0,961.0,4627.0,6.9,31.36,661.0,3.2
Avengers: Age of Ultron,<img src='http://image.tmdb.org/t/p/w185//4ssD...,280.0,1405.4,6908.0,7.3,37.38,1125.4,5.02
Superman Returns,<img src='http://image.tmdb.org/t/p/w185//6ZYO...,270.0,391.08,1429.0,5.4,13.28,121.08,1.45
Transformers: The Last Knight,<img src='http://image.tmdb.org/t/p/w185//s5HQ...,260.0,604.94,1440.0,6.2,39.19,344.94,2.33


## Top 5 - Highest Profit

In [22]:
ndf.sort_values(by='Profit',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avatar,<img src='http://image.tmdb.org/t/p/w185//btnl...,237.0,2787.97,12114.0,7.2,185.07,2550.97,11.76
Star Wars: The Force Awakens,<img src='http://image.tmdb.org/t/p/w185//9rd0...,245.0,2068.22,7993.0,7.5,31.63,1823.22,8.44
Titanic,<img src='http://image.tmdb.org/t/p/w185//9xjZ...,200.0,1845.03,7770.0,7.5,26.89,1645.03,9.23
Jurassic World,<img src='http://image.tmdb.org/t/p/w185//rhr4...,150.0,1513.53,8842.0,6.5,32.79,1363.53,10.09
Furious 7,<img src='http://image.tmdb.org/t/p/w185//d9jZ...,190.0,1506.25,4253.0,7.3,27.28,1316.25,7.93


## Top 5 - Highest ROI

In [23]:
best_movie().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
E.T. the Extra-Terrestrial,<img src='http://image.tmdb.org/t/p/w185//cBfk...,10.5,792.97,3359.0,7.3,19.36,782.47,75.52
Star Wars,<img src='http://image.tmdb.org/t/p/w185//6FfC...,11.0,775.4,6778.0,8.1,42.15,764.4,70.49
The Sound of Music,<img src='http://image.tmdb.org/t/p/w185//5qQT...,8.2,286.21,966.0,7.4,9.07,278.01,34.9
Pretty Woman,<img src='http://image.tmdb.org/t/p/w185//hMVM...,14.0,463.0,1807.0,7.0,13.35,449.0,33.07
The Intouchables,<img src='http://image.tmdb.org/t/p/w185//w7Wx...,13.0,426.48,5410.0,8.2,16.09,413.48,32.81


## Top 5 - Lowest Profit

In [24]:
ndf.sort_values(by='Profit',ascending=True).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
The Lone Ranger,<img src='http://image.tmdb.org/t/p/w185//b2je...,255.0,89.29,2361.0,5.9,12.73,-165.71,0.35
The Alamo,<img src='http://image.tmdb.org/t/p/w185//aZrW...,145.0,25.82,108.0,5.8,12.24,-119.18,0.18
Mars Needs Moms,<img src='http://image.tmdb.org/t/p/w185//lOKq...,150.0,38.99,202.0,5.6,7.25,-111.01,0.26
Valerian and the City of a Thousand Planets,<img src='http://image.tmdb.org/t/p/w185//jfIp...,197.47,90.02,905.0,6.7,15.26,-107.45,0.46
The 13th Warrior,<img src='http://image.tmdb.org/t/p/w185//7pyh...,160.0,61.7,524.0,6.4,10.31,-98.3,0.39


## Top 5 - Most Popular

In [25]:
ndf.sort_values(by='Popularity',ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Budget,Revenue,Vote,Average Rating,Popularity,Profit,Return
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Minions,<img src='http://image.tmdb.org/t/p/w185//tMaG...,74.0,1156.73,4729.0,6.4,547.49,1082.73,15.63
Wonder Woman,<img src='http://image.tmdb.org/t/p/w185//gfJG...,149.0,820.58,5025.0,7.2,294.34,671.58,5.51
Beauty and the Beast,<img src='http://image.tmdb.org/t/p/w185//tWqi...,160.0,1262.89,5530.0,6.8,287.25,1102.89,7.89
Baby Driver,<img src='http://image.tmdb.org/t/p/w185//rmnQ...,34.0,224.51,2083.0,7.2,228.03,190.51,6.6
Big Hero 6,<img src='http://image.tmdb.org/t/p/w185//xozr...,165.0,652.11,6289.0,7.8,213.85,487.11,3.95


## Find Science Fiction Action Movie With Bruce Willis

In [26]:
df.loc[df['genres'].str.contains('Science Fiction')& df['genres'].str.contains('Action') & df['cast'].str.contains('Bruce Willis')]

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
1448,18,The Fifth Element,There is no future without it.,1997-05-07,Adventure|Fantasy|Action|Thriller|Science Fiction,,en,90.0,263.92,Columbia Pictures|Gaumont,France,3962.0,7.3,24.31,126.0,"In 2257, a taxi driver is unintentionally give...",English|svenska|Deutsch,<img src='http://image.tmdb.org/t/p/w185//fPtl...,Bruce Willis|Gary Oldman|Ian Holm|Milla Jovovi...,114,134,Luc Besson
1786,95,Armageddon,The Earth's Darkest Day Will Be Man's Finest Hour,1998-07-01,Action|Thriller|Science Fiction|Adventure,,en,140.0,553.8,Jerry Bruckheimer Films|Touchstone Pictures|Va...,United States of America,2540.0,6.5,13.24,151.0,When an asteroid threatens to collide with Ear...,English|Pусский,<img src='http://image.tmdb.org/t/p/w185//fMtO...,Bruce Willis|Billy Bob Thornton|Ben Affleck|Li...,67,108,Michael Bay
14135,19959,Surrogates,How do you save humanity when the only thing t...,2009-09-24,Action|Science Fiction|Thriller,,en,80.0,122.44,Touchstone Pictures|Mandeville Films|Wintergre...,United States of America,1219.0,5.9,16.21,89.0,Set in a futuristic world where humans live in...,English|Français,<img src='http://image.tmdb.org/t/p/w185//v3Z0...,Bruce Willis|Radha Mitchell|Rosamund Pike|Jame...,44,25,Jonathan Mostow
19218,59967,Looper,"Hunted By Your Future, Haunted By Your Past",2012-09-26,Action|Thriller|Science Fiction,,en,30.0,47.04,Endgame Entertainment|FilmDistrict|DMG Enterta...,China|United States of America,4777.0,6.6,12.73,118.0,"In the futuristic action thriller Looper, time...",English,<img src='http://image.tmdb.org/t/p/w185//sNjL...,Joseph Gordon-Levitt|Bruce Willis|Emily Blunt|...,34,42,Rian Johnson
20333,72559,G.I. Joe: Retaliation,,2013-03-26,Adventure|Action|Science Fiction|Thriller,G.I. Joe (Live-Action) Collection,en,130.0,371.88,Paramount Pictures|Di Bonaventura Pictures|Has...,United States of America,3045.0,5.4,10.56,110.0,"Framed for crimes against the country, the G.I...",English,<img src='http://image.tmdb.org/t/p/w185//3rWI...,Dwayne Johnson|D.J. Cotrona|Adrianne Palicki|B...,20,28,Jon M. Chu
27619,307663,Vice,Where the future is your past.,2015-01-16,Thriller|Science Fiction|Action|Adventure,,en,10.0,,Grindstone Entertainment Group|K5 Internationa...,United States of America,245.0,4.1,19.24,96.0,Julian Michaels has designed the ultimate reso...,English,<img src='http://image.tmdb.org/t/p/w185//nPqN...,Ambyr Childers|Thomas Jane|Bryan Greenberg|Bru...,51,56,Brian A Miller


## Filter Movies With Actor Uma Thurman and Director Quentin Tarantino

In [27]:
df.loc[df['cast'].str.contains('Uma Thurman') & df['director'].str.contains('Quentin Tarantino')]

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
291,680,Pulp Fiction,Just because you are a character doesn't mean ...,1994-09-10,Thriller|Crime,,en,8.0,213.93,Miramax Films|A Band Apart|Jersey Films,United States of America,8670.0,8.3,140.95,154.0,"A burger-loving hit man, his philosophical par...",English|Español|Français,<img src='http://image.tmdb.org/t/p/w185//d5iI...,John Travolta|Samuel L. Jackson|Uma Thurman|Br...,54,87,Quentin Tarantino
6667,24,Kill Bill: Vol. 1,Go for the kill.,2003-10-10,Action|Crime,Kill Bill Collection,en,30.0,180.95,Miramax Films|A Band Apart|Super Cool ManChu,United States of America,5091.0,7.7,25.26,111.0,An assassin is shot at the altar by her ruthle...,English|日本語|Français,<img src='http://image.tmdb.org/t/p/w185//v7Ta...,Uma Thurman|Lucy Liu|Vivica A. Fox|Daryl Hanna...,36,161,Quentin Tarantino
7208,393,Kill Bill: Vol. 2,The bride is back for the final cut.,2004-04-16,Action|Crime|Thriller,Kill Bill Collection,en,30.0,152.16,Miramax Films|A Band Apart|Super Cool ManChu,United States of America,4061.0,7.7,21.53,136.0,The Bride unwaveringly continues on her roarin...,English|普通话|Español|广州话 / 廣州話,<img src='http://image.tmdb.org/t/p/w185//2yhg...,Uma Thurman|David Carradine|Daryl Hannah|Micha...,27,130,Quentin Tarantino


## Most Successful Pixar Movies from 2010 to 2015 (Highest Revenue)

In [28]:
pixar_movie=df.loc[df['production_companies'].str.contains('Pixar').fillna(False)]
pixar_movie=pixar_movie.loc[pixar_movie['release_date'].between('2010-01-01','2015-12-31')]
pixar_movie.sort_values(by='revenue_musd',ascending=False).head()

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
15236,10193,Toy Story 3,No toy gets left behind.,2010-06-16,Animation|Family|Comedy,Toy Story Collection,en,200.0,1066.97,Walt Disney Pictures|Pixar Animation Studios,United States of America,4710.0,7.6,16.97,103.0,"Woody, Buzz, and the rest of Andy's toys haven...",English|Español,<img src='http://image.tmdb.org/t/p/w185//amY0...,Tom Hanks|Tim Allen|Ned Beatty|Joan Cusack|Mic...,45,38,Lee Unkrich
29957,150540,Inside Out,Meet the little voices inside your head.,2015-06-09,Drama|Comedy|Animation|Family,,en,175.0,857.61,Walt Disney Pictures|Pixar Animation Studios,United States of America,6737.0,7.9,23.99,94.0,"Growing up can be a bumpy road, and it's no ex...",English,<img src='http://image.tmdb.org/t/p/w185//lRHE...,Amy Poehler|Phyllis Smith|Richard Kind|Bill Ha...,65,50,Pete Docter
20888,62211,Monsters University,School never looked this scary.,2013-06-20,Animation|Family,"Monsters, Inc. Collection",en,200.0,743.56,Walt Disney Pictures|Pixar Animation Studios,United States of America,3622.0,7.0,16.27,104.0,A look at the relationship between Mike and Su...,English,<img src='http://image.tmdb.org/t/p/w185//tyHH...,Billy Crystal|John Goodman|Steve Buscemi|Helen...,24,13,Dan Scanlon
17220,49013,Cars 2,Ka-ciao!,2011-06-11,Animation|Family|Adventure|Comedy,Cars Collection,en,200.0,559.85,Walt Disney Pictures|Pixar Animation Studios,United States of America,2088.0,5.8,13.69,106.0,Star race car Lightning McQueen and his pal Ma...,English|日本語|Italiano|Français,<img src='http://image.tmdb.org/t/p/w185//okIz...,Owen Wilson|Larry the Cable Guy|Michael Caine|...,47,40,John Lasseter
18900,62177,Brave,Change your fate.,2012-06-21,Animation|Adventure|Comedy|Family|Action|Fantasy,,en,185.0,538.98,Walt Disney Pictures|Pixar Animation Studios,United States of America,4760.0,6.7,15.88,93.0,Brave is set in the mystical Scottish Highland...,English,<img src='http://image.tmdb.org/t/p/w185//8l0p...,Kelly Macdonald|Billy Connolly|Emma Thompson|J...,15,44,Brenda Chapman


## Action Or Thriller Movie with Original Language English with minimum rating of 7.5(Most Recent

In [29]:
action_movie=df.loc[df['genres'].str.contains('Action') | df['genres'].str.contains('Thriller')]
action_movie=action_movie.loc[action_movie['original_language']=='en']
action_movie.loc[(action_movie['vote_average']>=7.5) & (action_movie['vote_count']>=10)].sort_values(by='release_date',ascending=False).head()

Unnamed: 0,id,title,tagline,release_date,genres,belongs_to_collection,original_language,budget_musd,revenue_musd,production_companies,production_countries,vote_count,vote_average,popularity,runtime,overview,spoken_languages,poster_path,cast,cast_size,crew_size,director
44490,417320,Descendants 2,Long live evil.,2017-07-21,TV Movie|Family|Action|Comedy|Music|Adventure,Descendants Collection,en,,,Walt Disney Television,United States of America,171.0,7.5,15.84,111.0,When the pressure to be royal becomes too much...,Dansk,<img src='http://image.tmdb.org/t/p/w185//8BNy...,Dove Cameron|Sofia Carson|Cameron Boyce|Booboo...,17,3,Kenny Ortega
43941,374720,Dunkirk,The event that shaped our world,2017-07-19,Action|Drama|History|Thriller|War,,en,100.0,519.88,Canal+|Studio Canal|Warner Bros.|Syncopy|RatPa...,Netherlands|France|United Kingdom|United State...,2712.0,7.5,30.94,107.0,The miraculous evacuation of Allied soldiers f...,English|Français|Deutsch,<img src='http://image.tmdb.org/t/p/w185//ebSn...,Fionn Whitehead|Tom Glynn-Carney|Jack Lowden|H...,66,214,Christopher Nolan
42624,382614,The Book of Henry,Never leave things undone.,2017-06-16,Thriller|Drama|Crime,,en,10.0,4.22,Sidney Kimmel Entertainment|Double Nickel Ente...,United States of America,84.0,7.6,24.55,105.0,"Naomi Watts stars as Susan, a single mother of...",English,<img src='http://image.tmdb.org/t/p/w185//suLF...,Naomi Watts|Jaeden Lieberher|Jacob Tremblay|Sa...,27,27,Colin Trevorrow
26273,283995,Guardians of the Galaxy Vol. 2,Obviously.,2017-04-19,Action|Adventure|Comedy|Science Fiction,Guardians of the Galaxy Collection,en,200.0,863.42,Walt Disney Pictures|Marvel Studios,United States of America,4858.0,7.6,185.33,137.0,The Guardians must fight to keep their newfoun...,English,<img src='http://image.tmdb.org/t/p/w185//y4MB...,Chris Pratt|Zoe Saldana|Dave Bautista|Vin Dies...,63,131,James Gunn
41506,263115,Logan,His time has come,2017-02-28,Action|Drama|Science Fiction,The Wolverine Collection,en,97.0,616.8,Twentieth Century Fox Film Corporation|Donners...,United States of America,6310.0,7.6,54.58,137.0,"In the near future, a weary Logan cares for an...",English|Español,<img src='http://image.tmdb.org/t/p/w185//fnbj...,Hugh Jackman|Patrick Stewart|Dafne Keen|Boyd H...,104,250,James Mangold


## Are Franchises More Successful ?

### All Franchises

In [30]:
df['franchise']=df['belongs_to_collection'].notnull()

### Count Franchise/Standalone Movies

In [31]:
df['franchise'].value_counts()

franchise
False    40228
True      4463
Name: count, dtype: int64

### Revenue (Franchise Vs Standalone Movies)

In [32]:
df.groupby('franchise').agg(
    avg_revenue=('revenue_musd','mean')
)

Unnamed: 0_level_0,avg_revenue
franchise,Unnamed: 1_level_1
False,44.74
True,165.71


### Budget (Franchise Vs Standalone Movies)

In [33]:
df.groupby('franchise').agg(
    avg_budget=('budget_musd','mean')
)

Unnamed: 0_level_0,avg_budget
franchise,Unnamed: 1_level_1
False,18.05
True,38.32


### Average Rating (Franchise Vs Standalone Movies)

In [34]:
df.groupby('franchise').agg(
    avg_rating=('vote_average','mean')
)

Unnamed: 0_level_0,avg_rating
franchise,Unnamed: 1_level_1
False,6.01
True,5.96


### Popularity (Franchise Vs Standalone Movies)

In [35]:
df.groupby('franchise').agg(
    avg_popularity=('popularity','mean')
)

Unnamed: 0_level_0,avg_popularity
franchise,Unnamed: 1_level_1
False,2.59
True,6.25


### Return Of Investments (Franchise Vs Standalone Movies)

In [36]:
df['ROI']=df['revenue_musd']/df['budget_musd']
df.groupby('franchise').agg(
    avg_return=('ROI','median')
)

Unnamed: 0_level_0,avg_return
franchise,Unnamed: 1_level_1
False,1.62
True,3.71


### Use aggregate functions to calculate all necessary info about Franchise

In [37]:
df.groupby('franchise').agg(
    {
        'title':'count',
        'revenue_musd':['mean','sum'],
        'budget_musd':['mean','sum'],
        'ROI':'median',
        'vote_average':'mean',
        'popularity':'mean',
        'vote_count':['mean','sum']
    }

)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
franchise,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
False,40228,44.74,264251.06,18.05,131243.17,1.62,6.01,2.59,78.29,3149432.0
True,4463,165.71,245082.42,38.32,60622.0,3.71,5.96,6.25,412.39,1840487.0


## Most Successful Franchise ?

In [38]:
franchise_collection=df.dropna(subset='belongs_to_collection')
franchise_collection=franchise_collection.groupby('belongs_to_collection').agg(
    {
        'title':'count',
        'revenue_musd':['mean','sum'],
        'budget_musd':['mean','sum'],
        'ROI':'median',
        'vote_average':'mean',
        'popularity':'mean',
        'vote_count':['mean','sum']
    }
)
franchise_collection

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
... Has Fallen Collection,2,183.39,366.78,65.00,130.00,2.86,6.00,13.01,2333.00,4666.00
00 Schneider Filmreihe,1,,0.00,,0.00,,6.50,1.93,16.00,16.00
08/15 Collection,1,,0.00,,0.00,,5.90,0.63,4.00,4.00
100 Girls Collection,2,,0.00,,0.00,,5.15,3.08,64.00,128.00
101 Dalmatians (Animated) Collection,2,215.88,215.88,4.00,4.00,53.97,6.25,13.06,937.00,1874.00
...,...,...,...,...,...,...,...,...,...,...
Сказки Чуковского,1,,0.00,,0.00,,3.00,0.73,3.00,3.00
Чебурашка и крокодил Гена,1,,0.00,,0.00,,6.70,0.88,7.00,7.00
Что Творят мужчины! (Коллекция),2,,0.00,2.00,2.00,,3.15,1.30,5.50,11.00
男はつらいよ シリーズ,3,,0.00,,0.00,,7.00,0.04,0.67,2.00


### Largest Franchise

In [39]:
franchise_collection.sort_values(by=('title','count'),ascending=False)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
The Bowery Boys,29,,0.00,,0.00,,6.67,0.20,0.72,21.00
Totò Collection,27,,0.00,,0.00,,6.84,1.05,18.04,487.00
James Bond Collection,26,273.35,7106.97,59.22,1539.65,6.13,6.34,13.45,1284.31,33392.00
Zatôichi: The Blind Swordsman,26,,0.00,,0.00,,6.40,1.10,11.19,291.00
The Carry On Collection,25,,0.00,,0.00,,6.17,3.22,21.04,526.00
...,...,...,...,...,...,...,...,...,...,...
Göta kanal collection,1,,0.00,,0.00,,5.40,0.80,12.00,12.00
Hailey Dean Mystery Collection,1,,0.00,,0.00,,6.00,0.22,1.00,1.00
The Adventures of Mickey Matson Collection,1,,0.00,,0.00,,4.80,0.60,6.00,6.00
The 1997 Trilogy,1,,0.00,,0.00,,6.70,2.40,11.00,11.00


### Highest Revenue

In [40]:
franchise_collection.sort_values(by=('revenue_musd','sum'),ascending=False)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Harry Potter Collection,8,963.42,7707.37,160.00,1280.00,6.17,7.54,26.25,5983.25,47866.00
Star Wars Collection,8,929.31,7434.49,106.79,854.35,8.24,7.38,23.41,5430.38,43443.00
James Bond Collection,26,273.35,7106.97,59.22,1539.65,6.13,6.34,13.45,1284.31,33392.00
The Fast and the Furious Collection,8,640.64,5125.10,126.12,1009.00,4.94,6.66,10.80,3197.00,25576.00
Pirates of the Caribbean Collection,5,904.32,4521.58,250.00,1250.00,3.45,6.88,53.97,5016.00,25080.00
...,...,...,...,...,...,...,...,...,...,...
Les Profs,1,,0.00,12.00,12.00,,5.40,0.24,367.00,367.00
Les Mystères de l'ouest (Collection),2,,0.00,,0.00,,5.70,0.07,2.50,5.00
Les Charlots - Saga,1,,0.00,,0.00,,6.50,2.79,13.00,13.00
Les Boys,1,,0.00,,0.00,,5.40,0.53,7.00,7.00


### Highest Average Revenue

In [41]:
franchise_collection.sort_values(by=('revenue_musd','mean'),ascending=False)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Avatar Collection,1,2787.97,2787.97,237.00,237.00,11.76,7.20,185.07,12114.00,12114.00
The Avengers Collection,2,1462.48,2924.96,250.00,500.00,5.96,7.35,63.63,9454.00,18908.00
Frozen Collection,2,1274.22,1274.22,150.00,150.00,8.49,7.10,16.88,3035.00,6070.00
Finding Nemo Collection,2,984.45,1968.91,147.00,294.00,7.57,7.20,19.99,5312.50,10625.00
The Hobbit Collection,3,978.51,2935.52,250.00,750.00,3.83,7.23,25.21,5981.33,17944.00
...,...,...,...,...,...,...,...,...,...,...
Сказки Чуковского,1,,0.00,,0.00,,3.00,0.73,3.00,3.00
Чебурашка и крокодил Гена,1,,0.00,,0.00,,6.70,0.88,7.00,7.00
Что Творят мужчины! (Коллекция),2,,0.00,2.00,2.00,,3.15,1.30,5.50,11.00
男はつらいよ シリーズ,3,,0.00,,0.00,,7.00,0.04,0.67,2.00


### Most Expensive Franchises (Budget)

In [42]:
franchise_collection.sort_values(by=('budget_musd','sum'),ascending=False)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
James Bond Collection,26,273.35,7106.97,59.22,1539.65,6.13,6.34,13.45,1284.31,33392.00
Harry Potter Collection,8,963.42,7707.37,160.00,1280.00,6.17,7.54,26.25,5983.25,47866.00
Pirates of the Caribbean Collection,5,904.32,4521.58,250.00,1250.00,3.45,6.88,53.97,5016.00,25080.00
The Fast and the Furious Collection,8,640.64,5125.10,126.12,1009.00,4.94,6.66,10.80,3197.00,25576.00
X-Men Collection,6,468.14,2808.83,163.83,983.00,3.02,6.82,9.71,4593.83,27563.00
...,...,...,...,...,...,...,...,...,...,...
Little Bear,1,,0.00,,0.00,,6.00,0.08,2.00,2.00
Lilla Jönsonligan Collection,2,,0.00,,0.00,,5.10,1.03,7.00,14.00
Library Wars Collection,2,,0.00,,0.00,,5.50,0.69,1.50,3.00
Lezioni di Cioccolato Collection,1,,0.00,,0.00,,5.80,2.06,42.00,42.00


### Highest Rated Franchises

In [43]:
franchise_collection.loc[franchise_collection[('vote_count','mean')]>=1000].sort_values(by=('vote_average','mean'),ascending=False)

Unnamed: 0_level_0,title,revenue_musd,revenue_musd,budget_musd,budget_musd,ROI,vote_average,popularity,vote_count,vote_count
Unnamed: 0_level_1,count,mean,sum,mean,sum,median,mean,mean,mean,sum
belongs_to_collection,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
The Lord of the Rings Collection,3,972.18,2916.54,88.67,266.00,11.73,8.03,30.27,8253.00,24759.00
The Godfather Collection,3,143.13,429.38,24.33,73.00,3.66,7.97,31.64,3677.00,11031.00
Blade Runner Collection,1,33.14,33.14,28.00,28.00,1.18,7.90,96.27,3833.00,3833.00
The Man With No Name Collection,3,11.83,35.50,0.67,2.00,25.00,7.83,14.17,1422.67,4268.00
The Dark Knight Collection,3,821.24,2463.72,195.00,585.00,4.34,7.80,57.42,9681.00,29043.00
...,...,...,...,...,...,...,...,...,...,...
Zoolander Collection,2,58.37,116.75,39.00,78.00,1.65,5.40,10.28,1088.50,2177.00
Dumb and Dumber Collection,3,152.13,456.38,25.00,75.00,4.25,5.33,11.51,1081.67,3245.00
xXx Collection,3,231.56,694.67,71.67,215.00,3.96,5.33,16.60,1172.00,3516.00
The Mask Collection,2,351.58,351.58,53.50,107.00,15.29,5.10,10.32,1448.00,2896.00


## Most Successful Directors

### Most Number Of Movies (top 5)

In [44]:
df['director'].value_counts().head()

director
John Ford           66
Michael Curtiz      65
Werner Herzog       54
Alfred Hitchcock    53
Georges Méliès      49
Name: count, dtype: int64

### Highest Revenues By Directors

In [45]:
df.groupby('director')['revenue_musd'].sum().sort_values(ascending=False).head()

director
Steven Spielberg   9256.62
Peter Jackson      6528.24
Michael Bay        6437.47
James Cameron      5900.61
David Yates        5334.56
Name: revenue_musd, dtype: float64

### Highest Number of Franchises directed by Directors

In [46]:
df.loc[df['belongs_to_collection'].notna()].value_counts('director').sort_values(ascending=False).head()

director
Gerald Thomas       25
William Beaudine    19
Ere Kokkonen        17
Kunihiko Yuyama     15
Robert Rodriguez    13
Name: count, dtype: int64

### Highest Rated Movies of director having vote count more than 10000 and movies directed more than 10.

In [47]:
director_data=df.groupby('director').agg(
    {
        'title':'count',
        'vote_count':'sum',
        'vote_average':'mean'
    }
)
director_data.loc[(director_data['title']>10) & (director_data['vote_count']>10000)].sort_values(by='vote_average',ascending=False).head()

Unnamed: 0_level_0,title,vote_count,vote_average
director,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Hayao Miyazaki,14,14700.0,7.7
Christopher Nolan,11,67344.0,7.62
Martin Scorsese,39,35541.0,7.22
Peter Jackson,13,47571.0,7.14
Joel Coen,17,18139.0,7.02


### To find succesful director in any specific genre i.e. Action

In [48]:
ndf2=df.loc[df['genres'].notnull()]
ndf2=ndf2.loc[ndf2['genres'].str.contains('Action')]
ndf2.groupby('director').agg(
    {
    'revenue_musd':'sum'
    }
).sort_values(by='revenue_musd',ascending=False).head()

Unnamed: 0_level_0,revenue_musd
director,Unnamed: 1_level_1
Michael Bay,5988.25
Peter Jackson,5443.67
James Cameron,4038.54
Christopher Nolan,3809.13
Roland Emmerich,3782.82


### To find succesful director in any specific genre i.e. Horror

In [49]:
ndf3=df.loc[df['genres'].notnull()]
ndf3=ndf3.loc[ndf3['genres'].str.contains('Horror')]
ndf3.groupby('director').agg(
    {
    'revenue_musd':'sum'
    }
).sort_values(by='revenue_musd',ascending=False).head()

Unnamed: 0_level_0,revenue_musd
director,Unnamed: 1_level_1
Paul W.S. Anderson,982.29
James Wan,861.31
Wes Craven,834.93
Francis Lawrence,816.23
Ridley Scott,689.0


## To Find Successful Actors

### Actors series

In [50]:
df['cast']

0        Tom Hanks|Tim Allen|Don Rickles|Jim Varney|Wal...
1        Robin Williams|Jonathan Hyde|Kirsten Dunst|Bra...
2        Walter Matthau|Jack Lemmon|Ann-Margret|Sophia ...
3        Whitney Houston|Angela Bassett|Loretta Devine|...
4        Steve Martin|Diane Keaton|Martin Short|Kimberl...
                               ...                        
44686              Leila Hatami|Kourosh Tahami|Elham Korda
44687    Angel Aquino|Perry Dizon|Hazel Orencio|Joel To...
44688    Erika Eleniak|Adam Baldwin|Julie du Page|James...
44689    Iwan Mosschuchin|Nathalie Lissenko|Pavel Pavlo...
44690                                                  NaN
Name: cast, Length: 44691, dtype: object

### Split Actor Names to a DataFrame

In [51]:
df=df.set_index('id')

In [52]:
actor_df=df['cast'].str.split('|',expand=True)
actor_df

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1
862,Tom Hanks,Tim Allen,Don Rickles,Jim Varney,Wallace Shawn,John Ratzenberger,Annie Potts,John Morris,Erik von Detten,Laurie Metcalf,R. Lee Ermey,Sarah Freeman,Penn Jillette,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8844,Robin Williams,Jonathan Hyde,Kirsten Dunst,Bradley Pierce,Bonnie Hunt,Bebe Neuwirth,David Alan Grier,Patricia Clarkson,Adam Hann-Byrd,Laura Bell Bundy,James Handy,Gillian Barber,Brandon Obray,Cyrus Thiedeke,Gary Joseph Thorup,Leonard Zola,Lloyd Berry,Malcolm Stewart,Annabel Kershaw,Darryl Henriques,Robyn Driscoll,Peter Bryant,Sarah Gilson,Florica Vlad,June Lion,Brenda Lockmuller,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
15602,Walter Matthau,Jack Lemmon,Ann-Margret,Sophia Loren,Daryl Hannah,Burgess Meredith,Kevin Pollak,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
31357,Whitney Houston,Angela Bassett,Loretta Devine,Lela Rochon,Gregory Hines,Dennis Haysbert,Michael Beach,Mykelti Williamson,Lamont Johnson,Wesley Snipes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
11862,Steve Martin,Diane Keaton,Martin Short,Kimberly Williams-Paisley,George Newbern,Kieran Culkin,BD Wong,Peter Michael Goetz,Kate McGregor-Stewart,Jane Adams,Eugene Levy,Lori Alan,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
439050,Leila Hatami,Kourosh Tahami,Elham Korda,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
111109,Angel Aquino,Perry Dizon,Hazel Orencio,Joel Torre,Bart Guingona,Soliman Cruz,Roeder,Angeli Bayani,Dante Perez,Betty Uy-Regala,Modesta,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
67758,Erika Eleniak,Adam Baldwin,Julie du Page,James Remar,Damian Chapa,Louis Mandylor,Tom Wright,Jeremy Lelliott,James Quattrochi,Jason Widener,Joe Sabatino,Kiko Ellsworth,Don Swayze,Peter Dobson,Darrell Dubovsky,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
227506,Iwan Mosschuchin,Nathalie Lissenko,Pavel Pavlov,Aleksandr Chabrov,Vera Orlova,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [53]:
actor_df=actor_df.stack().reset_index(drop=True,level=1)
actor_df=actor_df.to_frame()
actor_df

Unnamed: 0_level_0,0
id,Unnamed: 1_level_1
862,Tom Hanks
862,Tim Allen
862,Don Rickles
862,Jim Varney
862,Wallace Shawn
...,...
227506,Iwan Mosschuchin
227506,Nathalie Lissenko
227506,Pavel Pavlov
227506,Aleksandr Chabrov


### Rename column label from 0 to 'Actor'

In [54]:
actor_df.columns=['Actor']
actor_df

Unnamed: 0_level_0,Actor
id,Unnamed: 1_level_1
862,Tom Hanks
862,Tim Allen
862,Don Rickles
862,Jim Varney
862,Wallace Shawn
...,...
227506,Iwan Mosschuchin
227506,Nathalie Lissenko
227506,Pavel Pavlov
227506,Aleksandr Chabrov


### Merge Dataframe with Actors DataFrame

In [55]:
actor_df=actor_df.merge(df[['title','revenue_musd','vote_average','popularity']],on='id')
actor_df

Unnamed: 0_level_0,Actor,title,revenue_musd,vote_average,popularity
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
862,Tom Hanks,Toy Story,373.55,7.70,21.95
862,Tim Allen,Toy Story,373.55,7.70,21.95
862,Don Rickles,Toy Story,373.55,7.70,21.95
862,Jim Varney,Toy Story,373.55,7.70,21.95
862,Wallace Shawn,Toy Story,373.55,7.70,21.95
...,...,...,...,...,...
227506,Iwan Mosschuchin,Satan Triumphant,,,0.00
227506,Nathalie Lissenko,Satan Triumphant,,,0.00
227506,Pavel Pavlov,Satan Triumphant,,,0.00
227506,Aleksandr Chabrov,Satan Triumphant,,,0.00


### Number of Unique Actors

In [56]:
actor_df['Actor'].nunique()

201501

### Actors with highest number of movies

In [57]:
actor_df['Actor'].value_counts().sort_values(ascending=False).head()

Actor
Bess Flowers         240
Christopher Lee      148
John Wayne           125
Samuel L. Jackson    122
Michael Caine        110
Name: count, dtype: int64

### Creating label aggregation of actor_df

In [58]:
actor_df_imp_data=actor_df.groupby('Actor').agg(
    total_revenue=('revenue_musd','sum'),
    average_rating=('vote_average','mean'),
    average_popularity=('popularity','mean'),
    movies=('title','count'),
    average_revenue=('revenue_musd','mean')
)
actor_df_imp_data

Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
\tCheung Chi-Sing,0.00,5.90,3.05,1,
\tDouglas Hegdahl,0.00,4.00,0.15,1,
\tRobert Osth,0.00,6.00,1.81,1,
\tYip Chun,0.00,6.75,1.80,2,
Jorge de los Reyes,0.00,8.10,3.47,1,
...,...,...,...,...,...
长泽雅美,0.35,6.40,2.82,11,0.35
陳美貞,83.06,7.00,6.49,1,83.06
高桥一生,333.11,6.74,9.10,8,166.55
강계열,0.00,6.00,0.44,1,


### Actors who have acted in more than 10 films

In [59]:
actor_df_imp_data.loc[actor_df_imp_data['movies']>=10]


Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"""Weird Al"" Yankovic",205.40,6.04,7.70,10,34.23
'Snub' Pollard,20.45,6.24,2.85,25,5.11
50 Cent,1054.76,5.86,7.36,23,81.14
A Martinez,7.53,5.73,3.20,11,3.77
A.J. Buckley,332.41,5.47,2.48,12,83.10
...,...,...,...,...,...
Патрик О’Нил,235.94,6.14,4.11,17,78.65
Том Ву,462.54,5.80,9.46,10,231.27
Эрика Элениак,964.97,5.63,4.94,14,192.99
松田龙平,2.63,6.46,2.08,11,2.63


### Highest Revenue

In [60]:
df1=actor_df_imp_data.sort_values(by='total_revenue',ascending=False).head(20)
df1

Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Stan Lee,19414.96,6.51,29.94,48,647.17
Samuel L. Jackson,17109.62,6.27,11.7,122,213.87
Warwick Davis,13256.03,6.29,13.09,34,662.8
Frank Welker,13044.15,6.31,9.57,107,326.1
John Ratzenberger,12596.13,6.48,10.96,46,449.86
Jess Harnell,12234.61,6.44,10.92,35,611.73
Hugo Weaving,11027.58,6.47,10.97,40,459.48
Ian McKellen,11015.59,6.35,15.45,44,478.94
Johnny Depp,10653.76,6.44,12.38,69,217.42
Alan Rickman,10612.63,6.72,10.4,45,353.75


### Highest Number of Films

In [61]:
df2=actor_df_imp_data.sort_values(by='movies',ascending=False).head(20)
df2

Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Bess Flowers,368.91,6.18,2.03,240,14.76
Christopher Lee,9417.05,5.91,4.75,148,324.73
John Wayne,236.09,5.71,3.09,125,11.24
Samuel L. Jackson,17109.62,6.27,11.7,122,213.87
Michael Caine,8053.4,6.27,8.27,110,191.75
Gérard Depardieu,1247.61,6.05,3.7,109,95.97
John Carradine,255.84,5.55,2.43,109,19.68
Donald Sutherland,5390.77,6.23,7.0,108,138.22
Jackie Chan,4699.19,6.28,5.86,108,146.85
Frank Welker,13044.15,6.31,9.57,107,326.1


### Highest Rating

In [62]:
df3=actor_df_imp_data.sort_values(by='average_rating',ascending=False).head(20)
df3

Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Julián Infantino,0.0,10.0,0.35,1,
Sharon Danziger,0.0,10.0,0.06,1,
Tang Ching,0.0,10.0,0.12,1,
Tania Grant,0.0,10.0,0.06,1,
Tobias Nilsson,0.0,10.0,0.37,1,
Ewa Swann,0.0,10.0,0.57,1,
Bela Lugosi Jr.,0.0,10.0,0.04,1,
Georgette Baudry,0.0,10.0,0.01,1,
Emerson Collins,0.0,10.0,0.14,1,
Valmike Rampersad,0.0,10.0,0.38,1,


### Popularity

In [63]:
df4=actor_df_imp_data.sort_values(by='average_popularity',ascending=False).head(20)
df4

Unnamed: 0_level_0,total_revenue,average_rating,average_popularity,movies,average_revenue
Actor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alex Dowding,1156.73,6.4,547.49,1,1156.73
Emily Carey,820.58,7.2,294.34,1,820.58
Brooke Ence,820.58,7.2,294.34,1,820.58
Roy Martin Thorn,820.58,7.2,294.34,1,820.58
Steve Doyle,820.58,7.2,294.34,1,820.58
Karl Fredrick Hiemeyer,820.58,7.2,294.34,1,820.58
Kattreya Scheurer-Smith,820.58,7.2,294.34,1,820.58
Fred Fergus,820.58,7.2,294.34,1,820.58
Edward Wolstenholme,820.58,7.2,294.34,1,820.58
Betty Adewole,820.58,7.2,294.34,1,820.58


## Find Common Actors in the top lists

In [64]:
common_actors=pd.concat([df1,df2,df3,df4])
common_actors=common_actors.reset_index()
common_actors

Unnamed: 0,Actor,total_revenue,average_rating,average_popularity,movies,average_revenue
0,Stan Lee,19414.96,6.51,29.94,48,647.17
1,Samuel L. Jackson,17109.62,6.27,11.70,122,213.87
2,Warwick Davis,13256.03,6.29,13.09,34,662.80
3,Frank Welker,13044.15,6.31,9.57,107,326.10
4,John Ratzenberger,12596.13,6.48,10.96,46,449.86
...,...,...,...,...,...,...
75,Hari James,820.58,7.20,294.34,1,820.58
76,Freddy Carter,820.58,7.20,294.34,1,820.58
77,Zac Whitehead,820.58,7.20,294.34,1,820.58
78,Adam Sef,820.58,7.20,294.34,1,820.58


### Find Duplicate Records of Actors

In [65]:
common_actors.loc[common_actors['Actor'].duplicated()]

Unnamed: 0,Actor,total_revenue,average_rating,average_popularity,movies,average_revenue
23,Samuel L. Jackson,17109.62,6.27,11.7,122,213.87
29,Frank Welker,13044.15,6.31,9.57,107,326.1


## What are the most successful/popular genres? Has this changed over time (e.g. 90's vs. 20's)?

### Spliting Genres to Dataframe

In [66]:
genre_df=df['genres'].str.split('|',expand=True)
genre_df

Unnamed: 0_level_0,0,1,2,3,4,5,6,7
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
862,Animation,Comedy,Family,,,,,
8844,Adventure,Fantasy,Family,,,,,
15602,Romance,Comedy,,,,,,
31357,Comedy,Drama,Romance,,,,,
11862,Comedy,,,,,,,
...,...,...,...,...,...,...,...,...
439050,Drama,Family,,,,,,
111109,Drama,,,,,,,
67758,Action,Drama,Thriller,,,,,
227506,,,,,,,,


In [67]:
genre_df=genre_df.stack().reset_index(drop=True,level=1)
genre_df=genre_df.to_frame()
genre_df

Unnamed: 0_level_0,0
id,Unnamed: 1_level_1
862,Animation
862,Comedy
862,Family
8844,Adventure
8844,Fantasy
...,...
439050,Family
111109,Drama
67758,Action
67758,Drama


### Rename column label from 0 to 'Genre'

In [68]:
genre_df.columns=['Genre']
genre_df

Unnamed: 0_level_0,Genre
id,Unnamed: 1_level_1
862,Animation
862,Comedy
862,Family
8844,Adventure
8844,Fantasy
...,...
439050,Family
111109,Drama
67758,Action
67758,Drama


### Merge genre_df and original dataframe

In [69]:
genre_df=genre_df.merge(df[['title','revenue_musd','popularity','vote_average','release_date']],on='id')
genre_df

Unnamed: 0_level_0,Genre,title,revenue_musd,popularity,vote_average,release_date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
862,Animation,Toy Story,373.55,21.95,7.70,1995-10-30
862,Comedy,Toy Story,373.55,21.95,7.70,1995-10-30
862,Family,Toy Story,373.55,21.95,7.70,1995-10-30
8844,Adventure,Jumanji,262.80,17.02,6.90,1995-12-15
8844,Fantasy,Jumanji,262.80,17.02,6.90,1995-12-15
...,...,...,...,...,...,...
439050,Family,Subdue,,0.07,4.00,NaT
111109,Drama,Century of Birthing,,0.18,9.00,2011-11-17
67758,Action,Betrayal,,0.90,3.80,2003-08-01
67758,Drama,Betrayal,,0.90,3.80,2003-08-01


### Creating label aggregation of genre_df

In [70]:
genre_df_imp_data=genre_df.groupby('Genre').agg(
    revenue_total=('revenue_musd','sum'),
    average_popularity=('popularity','mean'),
    average_vote=('vote_average','mean')
)
genre_df_imp_data

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Action,201388.05,4.78,5.75
Adventure,199978.67,6.0,5.88
Animation,67432.97,4.75,6.45
Comedy,166845.05,3.25,5.97
Crime,63375.73,4.15,6.1
Documentary,1449.11,0.96,6.66
Drama,160754.36,3.03,6.17
Family,107076.78,4.77,5.93
Fantasy,103920.15,5.36,5.93
Foreign,291.54,0.77,5.97


### Genre With Highest Revenue

In [71]:
genre_df_imp_data.sort_values(by='revenue_total',ascending=False).head()

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Action,201388.05,4.78,5.75
Adventure,199978.67,6.0,5.88
Comedy,166845.05,3.25,5.97
Drama,160754.36,3.03,6.17
Thriller,129724.55,4.51,5.74


### Genre With Highest Rating

In [72]:
genre_df_imp_data.sort_values(by='average_vote',ascending=False).head()

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Documentary,1449.11,0.96,6.66
Animation,67432.97,4.75,6.45
History,14902.2,3.48,6.41
Music,13370.29,2.56,6.33
War,15910.46,3.35,6.29


### Genre With Highest Popularity

In [73]:
genre_df_imp_data.sort_values(by='average_popularity',ascending=False).head()

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adventure,199978.67,6.0,5.88
Fantasy,103920.15,5.36,5.93
Science Fiction,97847.96,5.0,5.48
Action,201388.05,4.78,5.75
Family,107076.78,4.77,5.93


### Highest revenue generated by Genre in 90's

In [74]:
ninetees_df=genre_df.loc[(genre_df['release_date']>='1900') & (genre_df['release_date']<'2000')]
ninetees_df

Unnamed: 0_level_0,Genre,title,revenue_musd,popularity,vote_average,release_date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
862,Animation,Toy Story,373.55,21.95,7.70,1995-10-30
862,Comedy,Toy Story,373.55,21.95,7.70,1995-10-30
862,Family,Toy Story,373.55,21.95,7.70,1995-10-30
8844,Adventure,Jumanji,262.80,17.02,6.90,1995-12-15
8844,Fantasy,Jumanji,262.80,17.02,6.90,1995-12-15
...,...,...,...,...,...,...
84419,Thriller,House of Horrors,,0.22,6.30,1946-03-29
222848,Science Fiction,Caged Heat 3000,,0.66,3.50,1995-01-01
30840,Drama,Robin Hood,,5.68,5.70,1991-05-13
30840,Action,Robin Hood,,5.68,5.70,1991-05-13


In [75]:
ninetees_df.groupby('Genre')['revenue_musd'].sum().sort_values(ascending=False).head()

Genre
Drama       54700.53
Comedy      46649.82
Action      43691.10
Thriller    39902.46
Adventure   39462.55
Name: revenue_musd, dtype: float64

### Popularites of Genre in 90's

In [76]:
ninetees_df.groupby('Genre')['popularity'].mean().sort_values(ascending=False).head()

Genre
Adventure         3.80
Thriller          3.71
Fantasy           3.69
Science Fiction   3.61
Family            3.57
Name: popularity, dtype: float64

### Average Rating of Genre in 90's

In [77]:
ninetees_df.groupby('Genre')['vote_average'].mean().sort_values(ascending=False).head()

Genre
Documentary   6.66
Animation     6.46
History       6.37
War           6.24
Drama         6.23
Name: vote_average, dtype: float64

## Highest revenue generated by Genre in 20's

In [78]:
twenties=genre_df.loc[genre_df['release_date']>='2000']
twenties

Unnamed: 0_level_0,Genre,title,revenue_musd,popularity,vote_average,release_date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
131232,Drama,Two Friends,,0.00,,2002-03-20
131232,Foreign,Two Friends,,0.00,,2002-03-20
79782,Drama,Venice,,0.15,7.50,2010-05-25
79782,Romance,Venice,,0.15,7.50,2010-05-25
141210,Comedy,The Sleepover,,0.14,8.00,2013-10-12
...,...,...,...,...,...,...
289923,Horror,The Burkittsville 7,,0.39,7.00,2000-10-03
111109,Drama,Century of Birthing,,0.18,9.00,2011-11-17
67758,Action,Betrayal,,0.90,3.80,2003-08-01
67758,Drama,Betrayal,,0.90,3.80,2003-08-01


In [79]:
twenties.groupby('Genre')['revenue_musd'].sum().sort_values(ascending=False).head()

Genre
Adventure   160516.12
Action      157696.95
Comedy      120195.23
Drama       106053.84
Thriller     89822.09
Name: revenue_musd, dtype: float64

### Popularites of Genre in 20's

In [80]:
twenties.groupby('Genre')['popularity'].mean().sort_values(ascending=False).head()

Genre
Adventure         8.60
Fantasy           7.16
Science Fiction   6.39
Action            6.23
Animation         5.92
Name: popularity, dtype: float64

### Average Rating of Genre in 20's

In [81]:
twenties.groupby('Genre')['vote_average'].mean().sort_values(ascending=False).head()

Genre
Documentary   6.67
Music         6.51
History       6.46
Animation     6.44
War           6.37
Name: vote_average, dtype: float64

## Find Most Successful Production Companies ?

In [82]:
production_df=df['production_companies'].str.split('|',expand=True)
production_df=production_df.stack().reset_index(drop=True,level=1)
production_df=production_df.to_frame()
production_df

Unnamed: 0_level_0,0
id,Unnamed: 1_level_1
862,Pixar Animation Studios
8844,TriStar Pictures
8844,Teitler Film
8844,Interscope Communications
15602,Warner Bros.
...,...
30840,20th Century Fox Television
30840,CanWest Global Communications
111109,Sine Olivia
67758,American World Pictures


### Rename column 0 to production_house

In [83]:
production_df.columns=['production_house']
production_df

Unnamed: 0_level_0,production_house
id,Unnamed: 1_level_1
862,Pixar Animation Studios
8844,TriStar Pictures
8844,Teitler Film
8844,Interscope Communications
15602,Warner Bros.
...,...
30840,20th Century Fox Television
30840,CanWest Global Communications
111109,Sine Olivia
67758,American World Pictures


### Merge production_df and original dataframe

In [84]:
production_df=production_df.merge(df[['title','revenue_musd','vote_average','popularity']],on='id')
production_df

Unnamed: 0_level_0,production_house,title,revenue_musd,vote_average,popularity
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
862,Pixar Animation Studios,Toy Story,373.55,7.70,21.95
8844,TriStar Pictures,Jumanji,262.80,6.90,17.02
8844,Teitler Film,Jumanji,262.80,6.90,17.02
8844,Interscope Communications,Jumanji,262.80,6.90,17.02
15602,Warner Bros.,Grumpier Old Men,,6.50,11.71
...,...,...,...,...,...
30840,20th Century Fox Television,Robin Hood,,5.70,5.68
30840,CanWest Global Communications,Robin Hood,,5.70,5.68
111109,Sine Olivia,Century of Birthing,,9.00,0.18
67758,American World Pictures,Betrayal,,3.80,0.90


### Creating label aggregation of production_df

In [85]:
production_df_imp_data=production_df.groupby('production_house').agg(
    revenue_total=('revenue_musd','sum'),
    average_popularity=('popularity','mean'),
    average_vote=('vote_average','mean')
)
production_df_imp_data

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
production_house,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Movie Studios,0.00,0.79,5.50
"""DIA"" Productions GmbH & Co. KG",44.35,17.71,5.80
# Andrea Sperling Productions,0.00,3.72,5.70
# Lexyn Productions,0.00,6.27,6.30
'A' Production Committee,0.00,0.00,
...,...,...,...
영화사 집,0.00,1.89,7.00
이디오플랜,0.00,1.96,6.90
인벤트 디,0.00,1.22,6.90
타임스토리그룹,0.00,1.26,6.50


### Production Companies with the higest Revenue

In [86]:
production_df_imp_data.sort_values(by='revenue_total',ascending=False).head(10)

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
production_house,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Warner Bros.,63525.19,6.22,6.15
Universal Pictures,55259.19,7.76,6.17
Paramount Pictures,48769.4,5.68,6.12
Twentieth Century Fox Film Corporation,47687.75,6.3,6.19
Walt Disney Pictures,40837.27,12.46,6.29
Columbia Pictures,32279.74,7.51,6.03
New Line Cinema,22173.39,8.7,5.9
Amblin Entertainment,17343.72,11.49,6.53
DreamWorks SKG,15475.75,11.49,6.41
Dune Entertainment,15003.79,18.71,5.92


### Production Companies with the higest popularity

In [87]:
production_df_imp_data.sort_values(by='average_popularity',ascending=False).head(10)

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
production_house,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
TENCENT PICTURES,820.58,294.34,7.2
The Donners' Company,783.11,187.86,7.4
DefyNite Films,88.76,183.87,7.0
Cruel & Unusual Films,1693.84,162.89,6.45
Wanda Pictures,1123.72,157.32,6.95
Artemple - Hollywood,369.33,154.8,7.9
Vita-Ray Dutch Productions (III),1153.3,145.88,7.1
Deluxe Digital Studios,1153.3,145.88,7.1
Kinberg Genre,1327.05,108.29,6.9
Image Nation,20.5,88.44,5.4


### Production Companies with the higest Rating

In [88]:
production_df_imp_data.sort_values(by='average_vote',ascending=False).head(10)

Unnamed: 0_level_0,revenue_total,average_popularity,average_vote
production_house,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chase Productions,0.57,0.57,10.0
Rhino Media,0.0,0.72,10.0
Boot Strapped Films,0.0,0.06,10.0
Wood-Thomas Pictures,0.0,0.04,10.0
BMore Pictures,0.0,0.53,10.0
Lanterna Editrice,0.0,0.17,10.0
Capital Film s.p.a.,0.0,0.6,10.0
Goldig Film Company,0.0,0.12,10.0
Den Danske Filmskole,0.0,0.38,10.0
UFO Pictures,0.0,0.06,10.0
