[Pixar Animation](https://en.wikipedia.org/wiki/Pixar) is one of the most well known animation studios in the world and many people worldwide religiously watch every new released film.

Here we use a dataset on Pixar movies gathered from multiple sources including:
* Box Office Mojo
* IMDB
* Rotten Tomatoes
* Metacritic

Here are some of the columns in our dataset **PixarMovies.csv**:
* Year Released - the year the movie was released.
* Movie - the name of the movie.
* RT Score - the Rotten Tomatoes rating for the movie.
* IMDB Score - the IMDB rating for the movie.
* Metacritic SCore - the Metacritic rating for the movie.
* Opening Weekend - the amount of revenue the movie made on opening weekend (in millions of dollars).
* Worldwide Gross - the total amount of revenue the movie has made to date.
* Production Budget - the amount of money spent to produce the film (in millions of dollars).
* Oscars Won - the number of Oscar awards the movie won.

In [1]:
# Setup the environment by importing the libraries we need
import pandas as pd
import matplotlib.pyplot as plt
# Note: Importing seaborn effects all matplotlib and pandas plots as well
import seaborn as sns

# Run the Jupyter magic so that plots are displayed in the notebook
%matplotlib notebook

In [3]:
# Read the dataset into a DataFrame and determine the dimensions
pixar_movies = pd.read_csv('../data/PixarMovies.csv')
pixar_movies.shape

(15, 16)

In [4]:
# Display the entire dataset since it isn't too big
pixar_movies.head(15)

Unnamed: 0,Year Released,Movie,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
0,1995,Toy Story,81,100,8.3,92,29.14,362.0,191.8,356.21,170.2,52.98%,47.02%,30,3.0,0.0
1,1998,A Bug's Life,96,92,7.2,77,33.26,363.4,162.8,277.18,200.6,44.80%,55.20%,45,1.0,0.0
2,1999,Toy Story 2,92,100,7.9,88,57.39,485.0,245.9,388.43,239.2,50.70%,49.32%,90,1.0,0.0
3,2001,"Monsters, Inc.",90,96,8.1,78,62.58,528.8,255.9,366.12,272.9,48.39%,51.61%,115,3.0,1.0
4,2003,Finding Nemo,104,99,8.2,90,70.25,895.6,339.7,457.46,555.9,37.93%,62.07%,94,4.0,1.0
5,2004,The Incredibles,115,97,8.0,90,70.47,631.4,261.4,341.28,370.0,41.40%,58.60%,92,4.0,2.0
6,2006,Cars,116,74,7.2,73,60.12,462.0,244.1,302.59,217.9,52.84%,47.16%,70,2.0,0.0
7,2007,Ratatouille,111,96,8.0,96,47.0,623.7,206.4,243.65,417.3,33.09%,66.91%,150,5.0,1.0
8,2008,WALL-E,97,96,8.4,94,63.1,521.3,223.8,253.11,297.5,42.93%,57.07%,180,6.0,1.0
9,2009,Up,96,98,8.3,88,68.11,731.3,293.0,318.9,438.3,40.07%,59.93%,175,5.0,2.0


In [5]:
# Get the datatypes for each column
pixar_movies.dtypes

Year Released                int64
Movie                       object
Length                       int64
RT Score                     int64
IMDB Score                 float64
Metacritic Score             int64
Opening Weekend            float64
Worldwide Gross            float64
Domestic Gross             float64
Adjusted Domestic Gross    float64
International Gross        float64
Domestic %                  object
International %             object
Production Budget            int64
Oscars Nominated           float64
Oscars Won                 float64
dtype: object

In [6]:
# Generate some summary statistics
pixar_movies.describe()

Unnamed: 0,Year Released,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Production Budget,Oscars Nominated,Oscars Won
count,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,15.0,14.0,14.0
mean,2006.066667,101.533333,89.333333,7.846667,82.8,67.990667,612.486667,258.506667,318.448,353.986667,133.4,2.857143,0.785714
std,5.933761,9.927355,16.45195,0.655599,12.119642,23.270468,190.193934,66.518284,73.321064,135.061615,59.696614,2.0327,0.801784
min,1995.0,81.0,39.0,6.3,57.0,29.14,362.0,162.8,194.43,170.2,30.0,0.0,0.0
25%,2002.0,96.0,85.0,7.3,75.0,58.755,503.15,215.1,261.35,256.05,91.0,1.0,0.0
50%,2007.0,102.0,96.0,8.0,88.0,66.3,559.9,245.9,318.9,336.6,150.0,3.0,1.0
75%,2010.5,109.0,98.5,8.3,92.0,76.45,704.2,280.75,361.165,427.8,182.5,4.75,1.0
max,2015.0,116.0,100.0,8.8,96.0,110.31,1063.2,415.0,457.46,648.2,200.0,6.0,2.0


In [7]:
# Strip the percentage sign (%) from the end of values and convert to float
pixar_movies['Domestic %'] = pixar_movies['Domestic %'].str.rstrip('%').astype(float)
pixar_movies['International %'] = pixar_movies['International %'].str.rstrip('%').astype(float)
pixar_movies[['Domestic %', 'International %']].head()

Unnamed: 0,Domestic %,International %
0,52.98,47.02
1,44.8,55.2
2,50.7,49.32
3,48.39,51.61
4,37.93,62.07


In [8]:
# Multiply IMDB Scroe column by 10 to convert to a 100 point scale
pixar_movies['IMDB Score'] = pixar_movies['IMDB Score'] * 10
pixar_movies.head()

Unnamed: 0,Year Released,Movie,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
0,1995,Toy Story,81,100,83.0,92,29.14,362.0,191.8,356.21,170.2,52.98,47.02,30,3.0,0.0
1,1998,A Bug's Life,96,92,72.0,77,33.26,363.4,162.8,277.18,200.6,44.8,55.2,45,1.0,0.0
2,1999,Toy Story 2,92,100,79.0,88,57.39,485.0,245.9,388.43,239.2,50.7,49.32,90,1.0,0.0
3,2001,"Monsters, Inc.",90,96,81.0,78,62.58,528.8,255.9,366.12,272.9,48.39,51.61,115,3.0,1.0
4,2003,Finding Nemo,104,99,82.0,90,70.25,895.6,339.7,457.46,555.9,37.93,62.07,94,4.0,1.0


In [9]:
# Create a new DataFrame with the last row filtered out
filtered_pixar = pixar_movies.dropna()
filtered_pixar

Unnamed: 0,Year Released,Movie,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
0,1995,Toy Story,81,100,83.0,92,29.14,362.0,191.8,356.21,170.2,52.98,47.02,30,3.0,0.0
1,1998,A Bug's Life,96,92,72.0,77,33.26,363.4,162.8,277.18,200.6,44.8,55.2,45,1.0,0.0
2,1999,Toy Story 2,92,100,79.0,88,57.39,485.0,245.9,388.43,239.2,50.7,49.32,90,1.0,0.0
3,2001,"Monsters, Inc.",90,96,81.0,78,62.58,528.8,255.9,366.12,272.9,48.39,51.61,115,3.0,1.0
4,2003,Finding Nemo,104,99,82.0,90,70.25,895.6,339.7,457.46,555.9,37.93,62.07,94,4.0,1.0
5,2004,The Incredibles,115,97,80.0,90,70.47,631.4,261.4,341.28,370.0,41.4,58.6,92,4.0,2.0
6,2006,Cars,116,74,72.0,73,60.12,462.0,244.1,302.59,217.9,52.84,47.16,70,2.0,0.0
7,2007,Ratatouille,111,96,80.0,96,47.0,623.7,206.4,243.65,417.3,33.09,66.91,150,5.0,1.0
8,2008,WALL-E,97,96,84.0,94,63.1,521.3,223.8,253.11,297.5,42.93,57.07,180,6.0,1.0
9,2009,Up,96,98,83.0,88,68.11,731.3,293.0,318.9,438.3,40.07,59.93,175,5.0,2.0


In [10]:
# Set the Movie column as the index for both DataFrames
pixar_movies.set_index('Movie', inplace=True)
filtered_pixar.set_index('Movie', inplace=True)
pixar_movies.head()

Unnamed: 0_level_0,Year Released,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
Movie,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Toy Story,1995,81,100,83.0,92,29.14,362.0,191.8,356.21,170.2,52.98,47.02,30,3.0,0.0
A Bug's Life,1998,96,92,72.0,77,33.26,363.4,162.8,277.18,200.6,44.8,55.2,45,1.0,0.0
Toy Story 2,1999,92,100,79.0,88,57.39,485.0,245.9,388.43,239.2,50.7,49.32,90,1.0,0.0
"Monsters, Inc.",2001,90,96,81.0,78,62.58,528.8,255.9,366.12,272.9,48.39,51.61,115,3.0,1.0
Finding Nemo,2003,104,99,82.0,90,70.25,895.6,339.7,457.46,555.9,37.93,62.07,94,4.0,1.0


In [11]:
# Create a new DataFrame containing just the critics reviews
critics_reviews = pixar_movies[['RT Score', 'IMDB Score', 'Metacritic Score']]
critics_reviews.head()

Unnamed: 0_level_0,RT Score,IMDB Score,Metacritic Score
Movie,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Toy Story,100,83.0,92
A Bug's Life,92,72.0,77
Toy Story 2,100,79.0,88
"Monsters, Inc.",96,81.0,78
Finding Nemo,99,82.0,90


In [12]:
# Use the DataFrame plot() metod to visualize this new DataFrame
critics_reviews.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x117a99940>

In [13]:
# The resulting plot is a little cramped, so lets tweak the figure size
critics_reviews.plot(figsize=(9,6))

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x117ca3780>

Note: Note all movie names are listed on the x-axis and the vertical grid line on the x-aixs exist only for every other movie.

In [14]:
# Box plot
critics_reviews.plot(kind='box', figsize=(9,5))

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1182bb080>

In [16]:
# Stacked bar plot
revenue_proportions = filtered_pixar[['Domestic %', 'International %']]
revenue_proportions.plot(kind='bar', stacked='True', figsize=(9,6))

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x119384940>

Create a grouped bar plot to explore if there's any correlation between the number of Oscars a movie was nominated for and the number it actually won.

In [21]:
# Create a grouped bar plot
movie_oscars = filtered_pixar[['Oscars Nominated', 'Oscars Won']]
movie_oscars.plot(kind='bar')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x11a1dfa58>

What plots can you generate to better understand which columns correlate with the Adjusted Domestic Gross revenue column, which describes the total domestic revenue adjusted for economic and ticket price inflation?

In [24]:
# Generate plots to better understand which columsn correlate with the Adjusted Domestic Gross revenue

# Compute pairwise correlation of columns to understand which columns may have interesting correlation
pixar_movies.corr()

Unnamed: 0,Year Released,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,Adjusted Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
Year Released,1.0,0.534099,-0.385842,-0.076138,-0.258043,0.740376,0.503504,0.417111,-0.373099,0.503543,-0.438875,0.438724,0.89241,-0.014485,0.296232
Length,0.534099,1.0,-0.492301,-0.406876,-0.288763,0.471698,0.3124,0.123327,-0.310743,0.379133,-0.500744,0.500658,0.307082,-0.069799,0.125731
RT Score,-0.385842,-0.492301,1.0,0.876587,0.867997,-0.423794,0.167612,0.349368,0.654057,0.064001,0.30044,-0.300373,-0.331351,0.632767,0.438174
IMDB Score,-0.076138,-0.406876,0.876587,1.0,0.891237,-0.096896,0.347247,0.548467,0.619059,0.218876,0.240026,-0.240066,-0.033545,0.817596,0.575408
Metacritic Score,-0.258043,-0.288763,0.867997,0.891237,1.0,-0.273258,0.220493,0.341773,0.532589,0.142197,0.157633,-0.157579,-0.213031,0.838883,0.503676
Opening Weekend,0.740376,0.471698,-0.423794,-0.096896,-0.273258,1.0,0.688564,0.624218,0.026921,0.662183,-0.421982,0.421986,0.757867,-0.056367,0.316225
Worldwide Gross,0.503504,0.3124,0.167612,0.347247,0.220493,0.688564,1.0,0.883506,0.460129,0.973036,-0.543928,0.543915,0.551215,0.384349,0.621834
Domestic Gross,0.417111,0.123327,0.349368,0.548467,0.341773,0.624218,0.883506,1.0,0.672332,0.751641,-0.099319,0.099302,0.385702,0.402677,0.625018
Adjusted Domestic Gross,-0.373099,-0.310743,0.654057,0.619059,0.532589,0.026921,0.460129,0.672332,1.0,0.316879,0.299128,-0.298995,-0.34764,0.325413,0.311573
International Gross,0.503543,0.379133,0.064001,0.218876,0.142197,0.662183,0.973036,0.751641,0.316879,1.0,-0.716986,0.716976,0.586223,0.352585,0.582735


Domesitic Gross obviously has a strong positive correlation with Adusted Domestic Gross.  It makes sense that all of the critic reviews have a strong positive correlation with money made.  What isn't obvious aprior is which review score will correlate most strongly with box office success.  For example, it looks like RT has a really strong correlation, ubt Metacritic less so.

In [32]:
pixar_movies.plot(x='RT Score', y='Adjusted Domestic Gross', kind='scatter')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x12954c2b0>

In [34]:
adjusted_gross = pixar_movies.copy()
adjusted_gross.set_index('Adjusted Domestic Gross', inplace=True)
adjusted_gross.head()

Unnamed: 0_level_0,Year Released,Length,RT Score,IMDB Score,Metacritic Score,Opening Weekend,Worldwide Gross,Domestic Gross,International Gross,Domestic %,International %,Production Budget,Oscars Nominated,Oscars Won
Adjusted Domestic Gross,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
356.21,1995,81,100,83.0,92,29.14,362.0,191.8,170.2,52.98,47.02,30,3.0,0.0
277.18,1998,96,92,72.0,77,33.26,363.4,162.8,200.6,44.8,55.2,45,1.0,0.0
388.43,1999,92,100,79.0,88,57.39,485.0,245.9,239.2,50.7,49.32,90,1.0,0.0
366.12,2001,90,96,81.0,78,62.58,528.8,255.9,272.9,48.39,51.61,115,3.0,1.0
457.46,2003,104,99,82.0,90,70.25,895.6,339.7,555.9,37.93,62.07,94,4.0,1.0
