# Data analysis of Disney datasets - by Andrew Ostensen

# Introduction

## Question of interest
In this analysis, I will investigate the following question associated with this collection of Disney datasets.

**QUESTION:**  Which person (actor or director) has had the greatest financial impact on Disney revenues. This is interesting because Disney has different revenue streams and I would be interested to see which Disney animated movie(s) and therefore which actor/director is most impactful. I would expect the people associated with a blockbuster movie such as **The Lion King** to have had the most impact.

## Dataset description 

The datasets were obtained from the [website](https://data.world/kgarrett/disney-character-success-00-16) [1] and the following quote is from [website](https://en.wikipedia.org/wiki/List_of_Walt_Disney_Animation_Studios_films) [2].

"Walt Disney Animation Studios is an American animation studio headquartered in Burbank, California, the original feature film division of The Walt Disney Company. The studio's films are also often called "Disney Classics", or "Disney Animated Canon"

The Disney dataset is composed of $5$ tables, `disney_movies_total_gross.csv`, `disney_revenue_1991-2016.csv`, `disney-characters.csv`, `disney-director.csv` and `disney-voice-actors.csv` . Each table is stored in a `.csv` file and contains different information about Disney animated characters, movies, actors, directors and Disney corporate revenues. I will use the `disney_movies_total_gross.csv`,  `disney_revenue_1991-2016.csv`, `disney-director.csv`, and `disney-voice-actors.csv` tables formally described below:

* **disney_movies_total_gross.csv**
    * This file contains financial information on Disney movies (including non animated movies).  Columns include the movie title, the release date of the movie, the genre and MPAA rating, and its total gross and inflation adjusted gross take.
* **disney_revenue_1991_2016.csv**
    * This file contains financial information on Disney corporate revenue between 1991 and 2016, including the year, revenue from the Studio Entertainment division, revenue from the Consumer Products division, revenue from the Disney Interactive division, revenue from Parks & Resorts, and revenue from Disney Media Networks.
* **disney_director.csv**
    * This file contains information on the movie directors for Disney animated films, including the movie name and the name of the movie director.    
* **disney_voice_actors.csv**
    * This file includes information on the voice actors for Disney animated films, including the movie character name, the voice actor name and the movie name.

# Methods and Results

I am interested in analysing how the different people involved in making a movie can impact the revenue of the Disney corporation. For this analysis I will use the **disney_movies_total_gross**, the **disney_revenue_1991_2016**, the **disney_director** and the **disney_voice_actors** tables.

## Step 1
The first step is to import the tables and do some basic analysis such as observing null values and the data types for each dataframe.

In [1]:
# Import all of the required libraries needed for this analysis
import altair as alt
import pandas as pd

# Import all of the required files
# Convert the 'release_date' column in the 'movies' DataFrame to a datetime64 datatype
directors = pd.read_csv('data/disney-director.csv')
actors = pd.read_csv('data/disney-voice-actors.csv')
movies = pd.read_csv('data/disney_movies_total_gross.csv', parse_dates=['release_date'])
revenue = pd.read_csv('data/disney_revenue_1991-2016.csv')

#### Step 1(a)
**directors** dataframe

In [2]:
directors.head()

Unnamed: 0,name,director
0,Snow White and the Seven Dwarfs,David Hand
1,Pinocchio,Ben Sharpsteen
2,Fantasia,full credits
3,Dumbo,Ben Sharpsteen
4,Bambi,David Hand


In [3]:
directors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   name      56 non-null     object
 1   director  56 non-null     object
dtypes: object(2)
memory usage: 1.0+ KB


The **directors** table has 56 rows and 2 columns. Each movie name has an associated director.  There are no missing entries and the datatype is appropriate for these columns.  Both the **name** and **director** columns are of interest for this analysis.

#### Step 1(b)
**actors** dataframe

In [4]:
actors.head()

Unnamed: 0,character,voice-actor,movie
0,Abby Mallard,Joan Cusack,Chicken Little
1,Abigail Gabble,Monica Evans,The Aristocats
2,Abis Mal,Jason Alexander,The Return of Jafar
3,Abu,Frank Welker,Aladdin
4,Achilles,,The Hunchback of Notre Dame


In [5]:
actors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 935 entries, 0 to 934
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   character    935 non-null    object
 1   voice-actor  882 non-null    object
 2   movie        935 non-null    object
dtypes: object(3)
memory usage: 22.0+ KB


In [6]:
actors.describe()

Unnamed: 0,character,voice-actor,movie
count,935,882,935
unique,922,655,139
top,Penny,Frank Welker,DuckTales
freq,3,24,31


The **actors** table has 935 rows and 3 columns. Each movie character has a corresponding voice actor and movie.  A voice actor can be associated with more than one movie character.  There are no missing entries and the datatype is appropriate for these columns.  

The most frequent voice-actor in the **actors** dataframe is **None**.  This is not useful for this analysis so therefore rows containing **None** should be deleted.

The **voice-actor** and **movie** columns are of interest for this analysis.

#### Step 1(c)
**movies** dataframe

In [7]:
movies.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,"$184,925,485","$5,228,953,251"
1,Pinocchio,1940-02-09,Adventure,G,"$84,300,000","$2,188,229,052"
2,Fantasia,1940-11-13,Musical,G,"$83,320,000","$2,187,090,808"
3,Song of the South,1946-11-12,Adventure,G,"$65,000,000","$1,078,510,579"
4,Cinderella,1950-02-15,Drama,G,"$85,000,000","$920,608,730"


In [8]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 579 entries, 0 to 578
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   movie_title               579 non-null    object        
 1   release_date              579 non-null    datetime64[ns]
 2   genre                     562 non-null    object        
 3   MPAA_rating               523 non-null    object        
 4   total_gross               579 non-null    object        
 5   inflation_adjusted_gross  579 non-null    object        
dtypes: datetime64[ns](1), object(5)
memory usage: 27.3+ KB


The **movies** dataframe has 579 rows and 6 columns. Each movie has a corresponding release date, genre, MPAA rating, total gross and inflation adjusted gross.  The **genre** and **MPAA_rating** columns both have null values but are not of interest for this analysis.  

The **total_gross** and **inflation_adjusted_gross** columns have the wrong datatype - they need to be either int64 or float64.  The **movie_title**, **release_date** and **inflation_adjusted_gross** columns are of interest for this analysis.

#### Step 1(d)
**revenue** dataframe

In [9]:
revenue.head()

Unnamed: 0,Year,Studio Entertainment[NI 1],Disney Consumer Products[NI 2],Disney Interactive[NI 3][Rev 1],Walt Disney Parks and Resorts,Disney Media Networks,Total
0,1991,2593.0,724.0,,2794.0,,6111
1,1992,3115.0,1081.0,,3306.0,,7502
2,1993,3673.4,1415.1,,3440.7,,8529
3,1994,4793.0,1798.2,,3463.6,359.0,10414
4,1995,6001.5,2150.0,,3959.8,414.0,12525


In [10]:
revenue.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 7 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Year                             26 non-null     int64  
 1   Studio Entertainment[NI 1]       25 non-null     float64
 2   Disney Consumer Products[NI 2]   24 non-null     float64
 3   Disney Interactive[NI 3][Rev 1]  12 non-null     float64
 4   Walt Disney Parks and Resorts    26 non-null     float64
 5   Disney Media Networks            23 non-null     object 
 6   Total                            26 non-null     int64  
dtypes: float64(4), int64(2), object(1)
memory usage: 1.5+ KB


The **revenue** dataframe has 26 rows and 7 columns. Each year has a corresponding revenue number from the following Disney divisions: 

        Studio Entertainment, 
        Disney Consumer Products, 
        Disney Interactive, 
        Walt Disney Parks and Resorts and 
        Disney Media Networks.      
A final column of the Total revenue per year is also provided.  Four of the columns have null data.  

The **Disney Media Networks** column has the wrong datatype - it needs to be float64.  

The **Year**, **Studio Entertainment** and **Disney Consumer Products** columns are of interest for this analysis.

## Step 2
The next step is to do some basic data wrangling such as fixing the improper data types and dealing with the null values observed in **Step 1**.

#### Step 2(a)
**movies** dataframe:

In [11]:
# import the custom script
import string_conversion as sc

# Change the 'total_gross' and 'inflation_adjusted_gross' columns from str to int
movies = sc.string_conversion(movies, 'total_gross')
movies = sc.string_conversion(movies, 'inflation_adjusted_gross')

# Confirm the change is successful
movies.dtypes

movie_title                         object
release_date                datetime64[ns]
genre                               object
MPAA_rating                         object
total_gross                          int64
inflation_adjusted_gross             int64
dtype: object

In [12]:
# Add a new 'Year' column to the dataframe
movies = movies.assign(Year=movies['release_date'].dt.year)

# Only keep the columns of interest
movies = movies[['movie_title', 'Year', 'inflation_adjusted_gross']]

# Confirm changes by viewing dataframe
movies.head()

Unnamed: 0,movie_title,Year,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,1937,5228953251
1,Pinocchio,1940,2188229052
2,Fantasia,1940,2187090808
3,Song of the South,1946,1078510579
4,Cinderella,1950,920608730


Create a **movies_plt** dataframe to plot the top 10 grossing movies.

In [13]:
# Sort the 'movies' dataframe by 'inflation_adjusted_gross'
movies_gross = (movies.sort_values(by='inflation_adjusted_gross', ascending=False)
                      .reset_index()
                      .loc[:9, ['movie_title', 'inflation_adjusted_gross']]
             )

# Plot the top 10 movies using a bar chart
top_10_movies_plt = (alt.Chart(movies_gross, width=500, height=300)
                        .mark_bar()
                        .encode(x=alt.X('movie_title:N', sort='-y', title='Movie'),
                                y=alt.Y('inflation_adjusted_gross:Q', title='Movie Gross ($)'))
                        .properties(title="Figure 1 - Top Grossing Disney Movies")
                    )
top_10_movies_plt

#### Step 2(b)
**revenue** dataframe:

First create a **discount** column to account for the effects of inflation.

Calculation is: [3]

            PV = FV/(1 + i)^n
            
            where: PV = present value
                   FV = future value
                   i  = inflation rate
                   n  = number of years
                   
 Inflation rate assumed to be 2.51%. [4]                   

In [15]:
# Create 'discount' column
inflation = 0.0251
revenue = revenue.assign(Discount=(1.0/(1.0+inflation)**(revenue['Year']-revenue.loc[0, 'Year'])))

# Discount rates are relative to row 0 in the DataFrame (1991 dollars)
revenue[['Year', 'Discount']].head()

Unnamed: 0,Year,Discount
0,1991,1.0
1,1992,0.975515
2,1993,0.951629
3,1994,0.928328
4,1995,0.905597


Create an **Entertainment** column from the existing **Studio Entertainment[NI 1]** column.

Modify as follows:

            Fill null values with the previous row's data.
            Create a rolling average with 2 years of data.
            Shift the data back one year.
            Apply the discount rate.

Example:  1995 Studio Entertainment revenue = average of 1995/1996 revenue (in 1991 dollars).

The assumption in this calculation is that a movie impacts revenue in both the current and subsequent year.

In [16]:
# Create 'Entertainment' column
revenue = (revenue.assign(Entertainment=revenue['Studio Entertainment[NI 1]']
                  .fillna(method='bfill')
                  .rolling(window=2)
                  .mean()
                  .shift(-1))
           )
revenue = revenue.assign(Entertainment=revenue['Entertainment']*revenue['Discount'])
revenue[['Year', 'Entertainment']].head()

Unnamed: 0,Year,Entertainment
0,1991,2854.0
1,1992,3311.091601
2,1993,4028.434628
3,1994,5010.416564
4,1995,5878.457755


Create a **Consumer** column from the existing **Disney Consumer Products[NI 2]** column.

Modify using the same assumptions as the **Entertainment** column.

In [17]:
# Create 'Consumer' column
revenue = (revenue.assign(Consumer=revenue['Disney Consumer Products[NI 2]']
                  .fillna(method='bfill')
                  .rolling(window=2)
                  .mean()
                  .shift(-1))
           )
revenue = revenue.assign(Consumer=revenue['Consumer']*revenue['Discount'])
revenue[['Year', 'Consumer']].head()

Unnamed: 0,Year,Consumer
0,1991,902.5
1,1992,1217.490976
2,1993,1528.934256
3,1994,1832.611671
4,1995,2686.001263


Create a **Total** column, which sums the **Entertainment** and **Consumer** columns and multiplies this value by 1000000 so that the value is in dollars.

The assumption in this calculation is that a movie impacts the Studio Entertainment and Consumer Products revenue but has minimal impact on Parks & Recreation, Media Networks & Interactive gaming revenue. 

Finally, drop any null values and only keep the required columns.

In [18]:
# Create 'Total' column
revenue = (revenue.assign(Total=(revenue['Consumer']+revenue['Entertainment'])*1000000.0)
                  .reset_index()
           )

# Only keep the columns of interest
revenue = revenue[['Year', 'Consumer', 'Entertainment', 'Total']]

# Drop any null values
revenue = revenue.dropna()

# Confirm changes by viewing DataFrame
revenue

Unnamed: 0,Year,Consumer,Entertainment,Total
0,1991,902.5,2854.0,3756500000.0
1,1992,1217.490976,3311.091601,4528583000.0
2,1993,1528.934256,4028.434628,5557369000.0
3,1994,1832.611671,5010.416564,6843028000.0
4,1995,2686.001263,5878.457755,8564459000.0
5,1996,3341.10679,6167.177815,9508285000.0
6,1997,3005.500573,5959.29361,8964794000.0
7,1998,2615.809849,5631.368239,8247178000.0
8,1999,2309.419254,5142.886415,7452306000.0
9,2000,2076.866525,5199.366544,7276233000.0


#### Step 2(c)
**actors** dataframe:

Remove the **None** actors from the dataframe.

In [19]:
# Remove the 'None' actors
actors = actors[~(actors['voice-actor']=='None')]
actors.describe()

Unnamed: 0,character,voice-actor,movie
count,935,882,935
unique,922,655,139
top,Penny,Frank Welker,DuckTales
freq,3,24,31


## Step 3
The next step is to combine the dataframes modified in **Step 2** to create a master dataframe for final visualization and analysis.

#### Step 3(a)
Merge **movies** and **revenue** dataframes.

In [20]:
# Merge the 'movies' and 'revenue' dataframes
disney_df = movies.merge(revenue, how='inner', left_on='Year', right_on='Year')
disney_df.head()

Unnamed: 0,movie_title,Year,inflation_adjusted_gross,Consumer,Entertainment,Total
0,White Fang,1991,69540672,902.5,2854.0,3756500000.0
1,Scenes from a Mall,1991,19149495,902.5,2854.0,3756500000.0
2,Haakon Haakonsen,1991,30084149,902.5,2854.0,3756500000.0
3,The Marrying Man,1991,24939118,902.5,2854.0,3756500000.0
4,Oscar,1991,47181395,902.5,2854.0,3756500000.0


#### Step 3(b)
Merge with the **actors** dataframe.

In [21]:
# Merge the 'actors' dataframe
# Drop the 'character' column
disney_df = (disney_df.merge(actors, how='inner', left_on='movie_title', right_on='movie')
                                       .drop(columns=['character', 'movie'])
                        )
disney_df.head()

Unnamed: 0,movie_title,Year,inflation_adjusted_gross,Consumer,Entertainment,Total,voice-actor
0,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Jo Anne Worley
1,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Mary Kay Bergman
2,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Alex Murphey
3,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Robby Benson
4,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Paige O'Hara


#### Step 3(c)
Merge with the **directors** dataframe.

In [22]:
# Merge the 'directors' dataframe
disney_df = (disney_df.merge(directors, how='inner', left_on='movie_title', right_on='name')
                                       .drop(columns=['name'])
            )
disney_df.head()

Unnamed: 0,movie_title,Year,inflation_adjusted_gross,Consumer,Entertainment,Total,voice-actor,director
0,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Jo Anne Worley,Gary Trousdale
1,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Mary Kay Bergman,Gary Trousdale
2,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Alex Murphey,Gary Trousdale
3,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Robby Benson,Gary Trousdale
4,Beauty and the Beast,1991,363017667,902.5,2854.0,3756500000.0,Paige O'Hara,Gary Trousdale


#### Step 3(d)
Combine the **voice-actor** and **director** columns using the **melt()** method.

In [23]:
# Melt the 'voice-actor' and 'director' columns into a single column called 'role'
disney_df = (disney_df.melt(id_vars=['Year', 'movie_title', 'inflation_adjusted_gross', 'Total'],
                         value_vars=['voice-actor', 'director'],
                         var_name=['role'],
                         value_name='name')
                     .drop_duplicates()
                     .reset_index(drop=True)
               )
disney_df.head()

Unnamed: 0,Year,movie_title,inflation_adjusted_gross,Total,role,name
0,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Jo Anne Worley
1,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Mary Kay Bergman
2,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Alex Murphey
3,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Robby Benson
4,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Paige O'Hara


## Step 4
The final step is to analyze the master **disney_df** dataframe to determine which person (actor or director) has had the largest amount of revenue attributed to them.

#### Step 4(a)
First, determine the total movie gross on a per year basis. 

In [24]:
# Create a 'yr_movies' DataFrame 
yr_movies = (disney_df.groupby(by='Year')
                      .sum()['inflation_adjusted_gross']
                      .reset_index()
                      .rename(columns={'inflation_adjusted_gross': 'total_movies_gross'})
            )
yr_movies

Unnamed: 0,Year,total_movies_gross
0,1991,7260353340
1,1992,6187568492
2,1994,12936428927
3,1995,3841193398
4,1996,4429436871
5,1997,4186676476
6,1998,3685733144
7,1999,3122902794
8,2000,3816371088
9,2001,1752633708


#### Step 4(b)
Merge the **yr_movies** DataFrame with the **disney_df** DataFrame.

Create an **adjusted_revenue** column in the new DataFrame.

Calulation is:

            Fraction of a movies gross for a year / 
            Total Consumer Products & Entertainment revenue for a year   
The **adjusted_revenue** is assumed to be the movies contribution to corporate revenue for that year.

In [25]:
# Merge the 'yr_movies' DataFrame
final_df = disney_df.merge(yr_movies, how='left', left_on='Year', right_on='Year')
final_df.head()

Unnamed: 0,Year,movie_title,inflation_adjusted_gross,Total,role,name,total_movies_gross
0,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Jo Anne Worley,7260353340
1,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Mary Kay Bergman,7260353340
2,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Alex Murphey,7260353340
3,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Robby Benson,7260353340
4,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Paige O'Hara,7260353340


In [26]:
# Create 'adjusted_revenue' column
final_df = final_df.assign(adjusted_revenue=final_df['inflation_adjusted_gross']/final_df['total_movies_gross']*final_df['Total'])
final_df.head()

Unnamed: 0,Year,movie_title,inflation_adjusted_gross,Total,role,name,total_movies_gross,adjusted_revenue
0,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Jo Anne Worley,7260353340,187825000.0
1,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Mary Kay Bergman,7260353340,187825000.0
2,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Alex Murphey,7260353340,187825000.0
3,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Robby Benson,7260353340,187825000.0
4,1991,Beauty and the Beast,363017667,3756500000.0,voice-actor,Paige O'Hara,7260353340,187825000.0


#### Step 4(c)
Group the DataFrame by the actor/director **name** column.

Sum the **adjusted_revenue** column and then sort in a descending order.

This will show the person(s) with the highest total revenue contribution.

In [27]:
names = (final_df.groupby(by='name')
               .sum()
               .sort_values(by='adjusted_revenue', ascending=False)
               .reset_index()
        )

# Keep the top 10 names for plotting
names_plt = names.loc[:9, ['name', 'adjusted_revenue']]
                            
names_plt

Unnamed: 0,name,adjusted_revenue
0,Daniel Henney,3269300000.0
1,Don Hall,3269300000.0
2,David Ogden Stiers,2185488000.0
3,Frank Welker,1682018000.0
4,Jim Cummings,1418298000.0
5,D. B. Sweeney,1382371000.0
6,Chris Buck,1347168000.0
7,Mark Dindal,1241987000.0
8,Ron Clements,1135523000.0
9,Gary Trousdale,1113422000.0


#### Step 4(d)
Next, plot the results to display each persons impact on adjusted revenue.

In [40]:
# Visualize the top 10 people's contributions to revenue using a bar plot.
top_10_plot = (alt.Chart(names_plt, width=500, height=300).mark_bar()
                                                .encode(x=alt.X('name:N', sort='-y', title=None, 
                                                                axis=alt.Axis(labelFontSize=12)),
                                                        y=alt.Y('adjusted_revenue:Q', title='Adjusted Revenue ($)'))
                                                .properties(title="Figure 2 - Contributions to Revenue by Person")
         )
top_10_plot

#### Step 4(e)
Finally, view the movie(s) associated with each person in the previous step.

In [29]:
# Merge the 'top_10_plot' dataframe with the 'disney_df' dataframe
# to link the highest contributors to their associated movies
name_movies = names_plt.merge(disney_df, how='inner', left_on='name', right_on='name')
name_movies = name_movies[['name', 'movie_title']]

name_movies

Unnamed: 0,name,movie_title
0,Daniel Henney,Big Hero 6
1,Don Hall,Big Hero 6
2,David Ogden Stiers,Beauty and the Beast
3,David Ogden Stiers,Pocahontas
4,David Ogden Stiers,The Hunchback of Notre Dame
5,David Ogden Stiers,Atlantis: The Lost Empire
6,David Ogden Stiers,Lilo & Stitch
7,Frank Welker,Beauty and the Beast
8,Frank Welker,Aladdin
9,Frank Welker,Pocahontas


# Discussions

In this work, I analyzed the Disney datasets and tried to compute which person(s) had the most impact on company revenue. To complete this analysis I made several assumptions, the biggest assumption being that the fraction of a movies gross take for the year is directly correlated to the Disney corporation's studio entertainment and consumer products revenue.

It is quite surprising that **Don Hall** and **Daniel Henney** who worked on **Big Hero 6** are the biggest contributors. I would have guessed that someone who worked on a bigger hit movie such as **The Lion King** or someone with multiple movie credits such as third place contributor **David Ogden Stiers** would have been the biggest contributor to company revenue.

The biggest limitation to this analysis is that the datasets for actors and directors only include Disney Animation movies and do not include other Disney properties such as **Star Wars Ep. VII: The Force Awakens**. Consequently, if a non Disney Animation movies is a huge hit, the total movie gross will be larger for that year and the fraction attributed to the Disney Animation movie will be skewed.  Having additional data with actors and directors for all Disney movies would be an improvement.

# References

Refer to the following resources listed below which were used for this analysis.

## Resources used

* [Data Source](https://data.world/kgarrett/disney-character-success-00-16) [1]
    * The Disney datasets used in this work was curated by **Kelly Garrett**.
* [Quote Source](https://en.wikipedia.org/wiki/List_of_Walt_Disney_Animation_Studios_films) [2]
    * Information on Disney Studios was taken from **Wikipedia**.
* [Present Value Calculation](https://www.calculatorsoup.com/calculators/financial/present-value-calculator.php) [3]
    * The Present Value calculation is from **calculatorsoup.com**.
* [Inflation Rate](https://www.minneapolisfed.org/about-us/monetary-policy/inflation-calculator/consumer-price-index-1913-) [4]
    * The Inflation Rate is from the **Federal Reserve Bank of Minneapolis**.