- Project Introduction
- Technologies Used
- Methods Used
- Project Description
- Feature Notebooks and Deliverables
- Most Important Findings
- 1. What is the correlation between budget and profit? Which budget ranges should be considered for making a Box Office success?
- 2. Do actors/actresses and directors play a role in a movie's success?
- 3. How does a movie's score rating impact profit?
- 4. How do genres play in with profit and profit margin?
- 5. Are duration and content rating relevant to profit?
- 6. Conclusion and Feature Recommendations
- Acknowledgments
- Licences
- Contact
The goal of this project was to recommend strategies for a Movie Studio on how to make a Box Office success.
- Data Processing / Data Cleaning
- Data Analysis
- Descriptive Statistics
- Feature Engineering
- Data Visualization
- Reporting
As part of my mentorship, I was tasked to imagine being a Data Scientist for a Top Movie Studio. After a series of box office flops, the producers of the said studio are starting to question their strategy and need some direction. I suggest a new approach - using data to determine what factors go into making a successful film. Luckily, I have a dataset of over 5000 films to mine for insights. My producers ask me to spend some time analyzing the data and present a report detailing your findings, along with recommendations on how to revamp the studio’s strategy.
In order to analyze data and find best recommendations for the studio, I explored a couple of questions based on reasearch and various assumptions:
- What is the correlation between budget and profit? Which budget ranges should be considered for making a Box Office success?
- Do actors/actresses and directors play a role in a movie's success?
- How does a movie's score rating impact profit?
- How does the trend of profit, revenue, profit margin, and other attributes change over years, and can it be relevant to future strategy?
- How do genres play in with profit and profit margin?
- Is there a pattern in common plot keywords with successful movies?
- Are duration and content rating relevant to profit?
- What can we learn from number of votes, and critic reviews in regard to genres and profit?
The dataset for this analysis was downloaded from the Kaggle website: movie_dataset.csv
- Data - folder containing raw and processed data
- movie_data.csv
- movies_preprocessed.csv
- Assets - folder containing all images used for Readme.md, presentation, and blog posts.
- 1. Data Preprocessing and Early EDA - notebook containing early data exploration and data cleaning
- 2. Feature Engineering and Data Analysis - notebook containing feature engineering and strategy recommendations
- Blog post on Data Processing and Data Cleaning: how to | Cleaning and preparing a movie dataset
- Blog post on Feature Engineering and Data Analysis: project | What makes a movie a Box Office succes?
-
Data Preprocessing and Early EDA
- 1. Imports
- 2. Data
- 3. Early EDA and Data Cleaning
- 3.1 Missing values
- 3.2 Duplicate rows
- 3.3 Outliers and noisiness
- 3.4 Mismatched data types
- 3.5 Structural errors
- 3.6 Saving data for the next stage
-
Feature Engineering and Data Analysis
- 1. Imports
- 2. Data
- 3. Feature Engineering
- 3.1 Profit
- 3.2 ROI
- 3.3 Profit margin
- 3.4 VAR (Value Above Replacement)
- 3.5 Removing irrelevant features
- 4. Data Analysis
- 4.1 What is the correlation between budget and profit? Which budget ranges should be considered for making a Box Office success?
- 4.2 Do actors/actresses and directors play a role in a movie’s success?
- 4.3 How does a movie’s score rating impact profit?
- 4.4 How does the trend of profit, revenue, profit margin, and other attributes change over years, and can it be relevant to future strategy?
- 4.5 How do genres play in with profit and profit margin?
- 4.6 Is there a pattern in common plot keywords with successful movies?
- 4.7 Are duration and content rating relevant to profit?
- 4.8 What can we learn from number of votes, and critic reviews in regards to genres and profit?
Link to the presentation: presentation.pdf
1. What is the correlation between budget and profit? Which budget ranges should be considered for making a Box Office success?
*Visualizing relationship between budget and profit
- Strategy recommendation: I recommend that the optimal budget value for making a successful movie should not be less than $40 MM, and on average I recommend it to be somewhere around $75 MM, as on average these budgets result with a good profit margin above 0.6.(This recommendation was made based on a profit margin greater than 0.6.) There is evidence that higher budget movies risk a smaller profit margin, as shown in the analysis, therefore I cannot support the claim that large budgets are a certain indicator of a Box Office success.
- Strategy recommendation: With great certainty, I can recommend that the studio takes into account the VAR score of an actor or an actress when hiring, and even more so the VAR score of the person who will direct the movie:
- For actors and actresses, I recommend a range value of VAR between 1.0 and 3.0
- For directors, I recommend a range value of VAR between 1.0 and 2.50
- Strategy recommendation: I recommend taking into account the average movie scores (not less than 7.0) of a director, when hiring one. I consider it might have a positive impact on profit. Another recommendation regarding movie scores will be in relation to movie’s genre, and will be detailed later in the report.
- Strategy recommendation: I recommend investing in the Animation genre, in the above-mentioned budget range of $40MM to $75MM, as well as Family and Adventure genres, as they show a desirable Return Of Investment, and are not as expensive. Those genres can be on the lower end of the budget recommendations — $40MM.
*Visualizing genres
- Strategy recommendation: The studio should focus on PG-13 movies, as the most common profitable genres (Animation, Adventure, Family) are in this group.
The Top Movie Studio producers stated a great business question: What are we doing wrong, and what can we do to change our strategy? Fortunately for them, there are many answers to the question, and some of them were presented in this project. I would separate a couple of important recommendations as part of this conclusion.
The data available showed that budget values between $40 MM and $75 MM, for a good profit margin of 0.6. The most profitable genres are Animation, Adventure, and Family — all three are PG-13 content rating. Also, included in the recommendation, is the importance of directors and actors/actresses. Directors with an average movie score above 7 show great potential to bring significant value to a movie, even so if their VAR score is between 1.0 and 2.5. The same goes for actors/actresses — if their VAR score is between 1.0 and 3.0, they will bring value to a movie.
As the quest for a more detailed analysis and answers is always on, I will include a couple of Future recommendations that I deem to be potentially valuable to the studio.
I recommend including and exploring data on Oscar winning directors, actors/actresses, and movies. During my time working on this project, I wondered if creating sequels is a good opportunity for a successful movie — this is something I would also consider including. Besides that, I would also add demographic information of users if applicable, and other movie studio box office successes in the dataset in order to obtain more detailed analysis. I believe this information would bring additional insight into assessing determinants of a Box Office success.
Thanks to jeremy-lee93 for an inspiration regarding the VAR values, as well as exploring the profit margin.
Database Contents License (DbCL) v1.0
Find me on LinkedIn, Twitter or adzictanja.com.