There are 6820 movies in the dataset (220 movies per year, 1986-2016). In order to stablish the main interest in our data set, we have utilized the Exploratory Data Analysis (EDA) and correlation methods.
We explore the dataset through some visualizations to answer the following questions.
- What are the top 5 movies by gross revenue?
- What are the stars that made the most movies in this period of time? And the directors?
- What are the directors that have generated the most revenue?
- What are the best movies by score?
- What is the volume of movies coming out per year?
In our EDA analysis it was clear that throughout the last decade, the revenue in average was notably stable, even with a few peaks (e.g., 2015, year in which we had Star Wars >VII, Jurassic World, Avengers: Age of Ultron, etc.) But in 2020, which cinemas and studios shut down, the industry's revenue fell by ~89%.
Correlation: Through this analysis, we can clearly see that variables like company, director, star, and country where a movie is released have little to no correlation with the actual revenue. Whereas budget and the votes a movie gets seems to have greater impact in its earnings. Runtime also shows some correlation with the budget, as longer films tend to cost higher to be produced. Votes and Budgets have the highest correlations. On the other hand, the gross and company has low correlation.