# The influence of movie genre on the gross income

First, I am interested in computing the total income of different movies as a result of their genres. Therefore, I will need to use table that contain information on gross income of various movies which is the disney_movie_total_gross table.

Before moving further, let us import the table and get the basic information about it.

In [1]:
# Lets import all the required libraries needed for this analysis
import altair as alt
import pandas as pd
# import all the required files
total_gross = pd.read_csv("data/disney_movies_total_gross.csv")



Let's see what the table look like.

In [2]:
total_gross.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,"Dec 21, 1937",Musical,G,"$184,925,485","$5,228,953,251"
1,Pinocchio,"Feb 9, 1940",Adventure,G,"$84,300,000","$2,188,229,052"
2,Fantasia,"Nov 13, 1940",Musical,G,"$83,320,000","$2,187,090,808"
3,Song of the South,"Nov 12, 1946",Adventure,G,"$65,000,000","$1,078,510,579"
4,Cinderella,"Feb 15, 1950",Drama,G,"$85,000,000","$920,608,730"


Let's get some other information about the table.

In [3]:
total_gross.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 579 entries, 0 to 578
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   movie_title               579 non-null    object
 1   release_date              579 non-null    object
 2   genre                     562 non-null    object
 3   MPAA_rating               523 non-null    object
 4   total_gross               579 non-null    object
 5   inflation_adjusted_gross  579 non-null    object
dtypes: object(6)
memory usage: 27.3+ KB


The gross income table has  579  rows and  6  columns. Every total_gross has a movie_title, a release_date, the genre, the MPAA_rating it was related to and the inflation_adjusted_gross it was converted to for a better comparison.

As a first visualization, let's look at the influence of movie genre on the total gross. To do this, I will use the total_gross table and select two columns of genre and inflation_adjusted_gross. I will first erase the '$' in the inflation_adjusted_gross and change the data type from object to int for computing. Then I will group by genre and then compute the average total gross for each genre.

In [4]:
# using the code replace to erase the '$'and then change the datatype to int.
total_gross['inflation_adjusted_gross']=total_gross['inflation_adjusted_gross'].replace({'\$':''},regex=True)
total_gross['inflation_adjusted_gross']=total_gross['inflation_adjusted_gross'].replace({'\,':''},regex=True).astype('int64')
total_gross['inflation_adjusted_gross']

0      5228953251
1      2188229052
2      2187090808
3      1078510579
4       920608730
          ...    
574      12545979
575       8874389
576     232532923
577     246082029
578     529483936
Name: inflation_adjusted_gross, Length: 579, dtype: int64

In [5]:
# group by genre and compute the average inflation_adjusted_gross of genres.
total_gross_genre_average = pd.DataFrame(total_gross.groupby('genre')['inflation_adjusted_gross'].mean().sort_values(ascending=False))

# Reset the index so we can plot using altair
total_gross_genre_average = total_gross_genre_average.reset_index()
total_gross_genre_average

Unnamed: 0,genre,inflation_adjusted_gross
0,Musical,603597900.0
1,Adventure,190397400.0
2,Action,137473400.0
3,Thriller/Suspense,89653790.0
4,Comedy,84667730.0
5,Romantic Comedy,77777080.0
6,Western,73815710.0
7,Drama,71893020.0
8,Concert/Performance,57410840.0
9,Black Comedy,52243490.0


Now that we have it in the proper format, we can generate a bar plot to visualize it.

In [6]:
# Use altair to generate a bar plot
total_gross_genre_average_plot = (
    alt.Chart(total_gross_genre_average, width=500, height=300)
    .mark_bar()
    .encode(
        x=alt.X("genre:O", sort='-y', title="Movie Genre"),
        y=alt.Y("inflation_adjusted_gross:Q", title="Inflation Adjusted Total Gross"),
    )
    .properties(title="The averaged total gross income of different movie genres")
)
total_gross_genre_average_plot 

From the above plot, there exists a decreasing trend in total gross income with the variation of movie genre. At first sight, it seems movie genre has a significant influence on the total gross income. As can be seen from the plot, there exists a drastic decrease at the beginning of the trend curve. The gross income of Musical genre is abnoramlly high which is about three times of that of Advanture, which is the second highest one. The least amount of gross is associated with Documentary genre. To validate this assertion, however, further analysis work is needed.