# The influence of yearly production on the gross income

As a second visualization, let's take a look at the total gross income at the different release year. To do this, only year of release date will be considered.

In [1]:
# Lets import all the required libraries needed for this analysis
import altair as alt
import pandas as pd

# import all the required files
total_gross = pd.read_csv("data/disney_movies_total_gross.csv")

In [2]:
# using the code replace to erase the '$'and then change the datatype to int.
total_gross['inflation_adjusted_gross']=total_gross['inflation_adjusted_gross'].replace({'\$':''},regex=True)
total_gross['inflation_adjusted_gross']=total_gross['inflation_adjusted_gross'].replace({'\,':''},regex=True).astype('int64')

In [3]:
# split the release date and only year variable will be considered
dates = (total_gross['release_date'].str.split(',', expand=True).rename(columns = {0:'Month_day',1:'Year'}))
total_gross_year =(total_gross.assign(year=dates['Year'].astype('int')))
total_gross_year

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,year
0,Snow White and the Seven Dwarfs,"Dec 21, 1937",Musical,G,"$184,925,485",5228953251,1937
1,Pinocchio,"Feb 9, 1940",Adventure,G,"$84,300,000",2188229052,1940
2,Fantasia,"Nov 13, 1940",Musical,G,"$83,320,000",2187090808,1940
3,Song of the South,"Nov 12, 1946",Adventure,G,"$65,000,000",1078510579,1946
4,Cinderella,"Feb 15, 1950",Drama,G,"$85,000,000",920608730,1950
...,...,...,...,...,...,...,...
574,The Light Between Oceans,"Sep 2, 2016",Drama,PG-13,"$12,545,979",12545979,2016
575,Queen of Katwe,"Sep 23, 2016",Drama,PG,"$8,874,389",8874389,2016
576,Doctor Strange,"Nov 4, 2016",Adventure,PG-13,"$232,532,923",232532923,2016
577,Moana,"Nov 23, 2016",Adventure,PG,"$246,082,029",246082029,2016


To faciliate the analysis, before we can do anything let's categorize the years into several groups into a column named year_group. Based on literature searching, there are 7 eras of Disney filmmaking. Accordingly, we will Use the following conditions for each era showing the different stages of development of Disney filmmaking

*from 1937 to 1942 → "The Golden Age"
*from 1943 to 1949 → "The Wartime Era"
*from 1950 to 1959 → "The Silver Age"
*from 1970 to 1988 → "The Bronze Age"
*from 1989 to 1999 → "The Disney Renaissance"
*from 2000 to 2009 → "Post Renaissance Era"
*from 2010 to present → "The Revival Era"

The total grossing income for each group is calculated based on the sum of each year, as expressed as: $\sum_{i=1}^n X_i$


In [4]:
# generate plotting data
total_gross_year.loc[(total_gross_year['year']<=1942)&(total_gross_year['year']>=1937),'year_group']='The Golden Age'
total_gross_year.loc[(total_gross_year['year']<=1949)&(total_gross_year['year']>=1943),'year_group']='The Wartime Era'
total_gross_year.loc[(total_gross_year['year']<=1959)&(total_gross_year['year']>=1950),'year_group']='The Silver Age'
total_gross_year.loc[(total_gross_year['year']<=1988)&(total_gross_year['year']>=1970),'year_group']='The Bronze Age'
total_gross_year.loc[(total_gross_year['year']<=1999)&(total_gross_year['year']>=1989),'year_group']='The Disney Renaissance'
total_gross_year.loc[(total_gross_year['year']<=2009)&(total_gross_year['year']>=2000),'year_group']='Post Renaissance Era'
total_gross_year.loc[(total_gross_year['year']>=2010),'year_group']='The Revival Era'
total_gross_stage = pd.DataFrame(total_gross_year.groupby('year_group')['inflation_adjusted_gross'].sum().sort_values(ascending=False))
total_gross_stage = total_gross_stage.reset_index()
total_gross_stage

Unnamed: 0,year_group,inflation_adjusted_gross
0,The Disney Renaissance,18841155750
1,Post Renaissance Era,15791503349
2,The Revival Era,13150493912
3,The Golden Age,9604273111
4,The Bronze Age,4601649994
5,The Silver Age,2706430071
6,The Wartime Era,1078510579


Now that we have it in the proper format, we can generate a bar plot to visualize it.

In [5]:
# Use altair to generate a bar plot
total_gross_stage_plot = (
    alt.Chart(total_gross_stage, width=500, height=300)
    .mark_bar()
    .encode(
        x=alt.X("year_group:O", sort='-y', title="Different development stage"),
        y=alt.Y("inflation_adjusted_gross:Q", title="Inflation Adjusted Total Gross"),
    )
    .properties(title="The total gross income from movies in different stage of Disney filmmaking")
)
total_gross_stage_plot

It shows that the Disney Renaissance (from 1989 to 1999) gave the highest amount of gross income followed by Post Renaissance Era (from 2000 to 2009) and The Revival Era (from 2010 to now). This tells us that from 1989 till now, there exists a decreasing trend in gross income by film making. On the other hand, there exists another dreasing trend from The Golden Age (from 1937 to 1942) to The Wartime Era (from 1943 to 1949) which implys the significant influence of war. Two questions arise here: 1) what is the reason for the abnormal order of The Bronze Age and The Silver Age, and 2) why call the time period of 1937-1942 The Golden Age which didnot create the highest amount of gross income.

Next, let us take a look at the production in these different periods.

In [6]:
# count by the movie title
movie_number_period = pd.DataFrame(total_gross_year.groupby('year_group')['movie_title'].agg('count'))
movie_number_period = movie_number_period['movie_title'].sort_values(ascending=False)
movie_number_period

year_group
The Disney Renaissance    247
Post Renaissance Era      172
The Revival Era            86
The Bronze Age             59
The Silver Age              4
The Golden Age              3
The Wartime Era             1
Name: movie_title, dtype: int64

From this table, it can be seen there is a linear relationship between the production and total gross income except for The Golden Age. It can be concluded that the reason for the high gross income for a period of time is mainly due to the high production. However, the porpularity of movie in a specific period will contribute to another reason as The Golden Age shows. To clarify this, the average gross income at different period of time will be given as follows. 


The average gross income can be calculated based mean equation given in math review section, another way of expression is like this: $\frac{1}{n}\left(\sum_{i=1}^{n}{x_i}\right)$

In [7]:
# generate the average gross income data
total_gross_year_average = pd.DataFrame(total_gross_year.groupby('year_group')['inflation_adjusted_gross'].mean().sort_values(ascending=False))
total_gross_year_average = total_gross_year_average.reset_index()
total_gross_year_average

Unnamed: 0,year_group,inflation_adjusted_gross
0,The Golden Age,3201424000.0
1,The Wartime Era,1078511000.0
2,The Silver Age,676607500.0
3,The Revival Era,152912700.0
4,Post Renaissance Era,91811070.0
5,The Bronze Age,77994070.0
6,The Disney Renaissance,76279980.0


In [8]:
# Use altair to generate a bar plot
total_gross_year_average_plot = (
    alt.Chart(total_gross_year_average, width=500, height=300)
    .mark_bar()
    .encode(
        x=alt.X("year_group:O", sort='-y', title="Different development stage"),
        y=alt.Y("inflation_adjusted_gross:Q", title="Averaged inflation Adjusted Total Gross"),
    )
    .properties(title="The average total gross income from movies in different stage of Disney filmmaking")
)
total_gross_year_average_plot 

It shows that in the Golden Age (from 1937 to 1942) Disney movies created enormous amount of average gross income whereas there was an dramtic drop in The Wartime Era followed by The Silver Age. The least amount of average gross income is associated with both The Bornze Age and The Disney Renaissance which implys the lowest ebb of development of Disney filmmaking. After that, the average grosss income shows an increasing trend until now but is still just about the one fourth of that in The Silver Age. The average gross income can partially reflect the popularity of Disney films in different stages of its development.  It can be seen that films made in The Golden Age are the most popular ones in the Disney filmmaking history. 