## Final Proposal: Disney and Marvel merger: who saved whom?

#### Principal investigator: Aigul Saiapova <br>  Email: as10369@nyu.edu



This project will study the interdynamics of the 2009 acquisition of Marvel Entertainment by The Wall Disney Company. Since the time of the merger, Marvel Cinematic Universe (MCU) productions have been hitting box office records all around the world, while The Walt Disney Company kept growing in size and profitability. According to the most recent news, MCU has become the most profitable movie franchise in the history of cinematography. Disney, at the same time, went on a "shopping spree" to acquire the ABC broadcasting company (1995), Lucasfilm (famous for its production of Star Wars), 20th and 21th Century Fox, FOX Television Studios and Pixar ever since. With these extended capabilities, The Walt Disney Company has formed a massive conglomerate, strongly exhibiting monopolistic behavior. Some [media sources](https://www.fool.com/investing/2018/04/25/how-the-avengers-saved-disneys-movie-business.aspx) claim that the acquisition of Marvel Entertainment is what "saved Disney's movie business" while [others](http://fortune.com/2015/10/08/disney-marvel/) claim the opposite. Therefore, my objective for this project is to disentangle the causal-correllational effect of this merger. Here is how I plan to proceed:

1. Look at the box office data from the dawn to dusk of Marvel Cinematic Universe, marking the date of the acquisition.


2. Examine the review dynamics for the MCU productions by using data on average film ratings and variance in number of reviewers from websites like [IMDb](https://www.imdb.com/list/ls066946827/) (International Movie Database) and [Rotten Tomatoes](https://www.rottentomatoes.com/franchise/marvel_cinematic_universe/), widely used for reference by the audience.


3. Screen effect: how many screens on average could display films of the MCU, and whether that had an effect on profitability.


4. Look at the general entertainment trends: what if the popularity of the Marvel content is a consequence of shifting preferences for entertainment rather than the sheer excellence of the franchise?


The project will have two parts: an analytical, programming based assignment presented in this Jupyter Notebook and a separate PowerPoint presentation explaining data points and providing additional reference information.


#### Install all the required packages for the case analysis...

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import weightedcalcs as wc

#### Import the data.
My excel document has four sheets: one describing the contents, second contains box office statistics, the third is review data, the last has the industry statistics. I will import them as separate variables.

In [15]:
path="C:\\Users\\Aygul Sayapova\\Desktop\\NYU New York\\Data bootcamp\\MARVEL DISNEY PROJECT\\MCU_box.xlsx"
box_office=pd.read_excel(path, sheet_name="box_office")

In [16]:
review=pd.read_excel(path, sheet_name="review")

In [19]:
industry_stats=pd.read_excel(path, sheet_name="industry_stat")

In [17]:
box_office.set_index("Name", inplace=True)

Although I comprised the data manually, it still helps to remember the size of it:

In [18]:
box_office.shape

(22, 12)

In [20]:
review.shape

(22, 11)

In [24]:
industry_stats.shape

(12, 5)

In [21]:
box_office.head()

Unnamed: 0_level_0,US_release_date,open_week_revenue,total_gross_US,total_gross_intl,total_world,inflation_adj_open_week_US,inlfation_adjusted_US,inlfation_adjusted_intl,inlfation_adjusted_world,num_theatres,num_tix_sold,CPI_adj_2008
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Iron Man,2008-02-05,98618668,318412101,266762121,585174222,45805230.0,147892300.0,123902500.0,271794800.0,4154,44347089,0.464468
The Incredible Hulk,2008-06-13,55414050,134806913,128620638,263427551,25738060.0,62613520.0,59740190.0,122353700.0,3508,18775336,
Iron Man 2,2010-05-07,128122480,312433331,311500000,623933331,59508820.0,145115300.0,144681800.0,289797200.0,4390,39598647,
Thor,2011-05-06,65723338,181030624,268295994,449326618,30526400.0,84082970.0,124615000.0,208697900.0,3963,22828578,
Captain America: The First Avenger,2011-07-22,65058524,176654505,193915269,370569774,30217610.0,82050400.0,90067470.0,172117900.0,3715,22276735,


As we can see, there are 12 descriptive columns for the data, including:
- The release date
- Revenue in the opening week (nominal data)
- Total box office revenue in the US and internationally (nominal). For the most of the movies, at least 50% of their revenue is collected outside the US. So we can trace popularity of the franchise in the world as well.
- Real values for the aforementioned revenue values to account for price inflation. (GDP deflator number is the last column)
- The screen effect: the number of theatres where the films were screened.
- Number of tickets sold per each movie in the US

In [13]:
review.head()

Unnamed: 0,title,genre,year,runtime_min,imdb_rating,imdb_vote,imdb_metascore,rt_audience_pct,rt_prof_rev,rt_rev_num,rt_user_rating_num
0,Iron Man,"Action,Adventure,Sci-Fi",2008.0,126,7.9,737719,79,0.91,0.93,275,1081473
1,The Incredible Hulk,"Action,Adventure,Sci-Fi",2008.0,112,6.8,342355,61,0.7,0.67,229,738333
2,Iron Man 2,"Action,Adventure,Sci-Fi",2010.0,124,7.0,556666,57,0.71,0.73,289,479864
3,Thor,"Action,Adventure,Fantasy",2011.0,115,7.0,570814,57,0.76,0.77,282,246791
4,Captain America: The First Avenger,"Action,Adventure,Sci-Fi",2011.0,124,6.9,547368,66,0.74,0.8,263,187978


In this sheet, there are:
- Genre
- Year of release
- Runtime in minutes
- Ratings: IMDb rating: average of individual user ratings; metascore - rating given by critics, number of users voted; Rotten Tomatoes: audience ratings, critics ratings, the respective numbers behind those scores.

In [23]:
industry_stats.head()

Unnamed: 0,year,num_tix_sold,box_office_nominal,box_office_adj,avg_tix_price
0,2008,1358041408,9750739371,12371757232,7.18
1,2009,1418567388,10639257284,12923123576,7.5
2,2010,1328549021,10482254025,12103081587,7.89
3,2011,1282891721,10173333767,11687143588,7.93
4,2012,1402603148,11164723987,12777714678,7.96


Pretty straightforward for the industry stats.

In [10]:
box_office.dtypes

US_release_date               datetime64[ns]
open_week_revenue                      int64
total_gross_US                         int64
total_gross_intl                       int64
total_world                            int64
inflation_adj_open_week_US           float64
inlfation_adjusted_US                float64
inlfation_adjusted_intl              float64
inlfation_adjusted_world             float64
num_theatres                           int64
CPI_adj_2008                         float64
dtype: object

In [14]:
review.dtypes

title                  object
genre                  object
year                  float64
runtime_min             int64
imdb_rating           float64
imdb_vote               int64
imdb_metascore          int64
rt_audience_pct       float64
rt_prof_rev           float64
rt_rev_num              int64
rt_user_rating_num      int64
dtype: object

In [25]:
industry_stats.dtypes

year                    int64
num_tix_sold            int64
box_office_nominal      int64
box_office_adj          int64
avg_tix_price         float64
dtype: object

## Summary

Therefore, it looks like I have enough data to conduct my analysis. Next: graphs!