# Microsoft Movie Studio Analysis


## 1. Project Overview
### 1.1. Project Goal
To provide Microsoft with actionable insights on the types of films that are currently performing well at the box office. This will help guide their strategy in creating successful original video content for their new movie studio.

#### Specific Goals:
- Identify the top-performing genres at the box office.
- Analyze the characteristics of successful films, such as budget, runtime, cast, and release date.
- Examine the correlation between movie ratings (from IMDB, Rotten Tomatoes, etc.) and box office success.
- Provide three concrete recommendations based on the analysis to inform Microsoft’s movie production strategy.
### 1.2. Audience
The primary audience for this analysis is the business stakeholders, specifically the head of Microsoft's new movie studio. The insights derived from this analysis will assist them in making informed decisions regarding their movie production strategy.
### 1.3. Dataset
For this comprehensive analysis, data was sourced from multiple reputable sources in the movie industry to ensure thorough coverage of relevant information. The main datasets used include:

- **IMDB**: The Internet Movie Database (IMDB)
- **Box Office Mojo**: Box Office Mojo
- **Rotten Tomatoes**: Rotten Tomatoes
- **TheMovieDB**: TheMovieDB
- **The Numbers**: The Numbers

The primary datasets were sourced from `im.db`, a SQLite database containing detailed movie information, and `bom.movie_gross.csv.gz`, a compressed CSV file from Box Office Mojo containing box office gross data.


## 2. Business Understanding


### 2.1. Stakeholder and Key Business Questions
To address the needs of the stakeholders and guide the analysis effectively, the following key business questions were identified:

- What genres of movies are performing best at the box office?
- What are the characteristics (budget, duration, cast, etc.) of high-performing movies?
- How do ratings from different sources (IMDB, Rotten Tomatoes, etc.) correlate with box office success?



## 3. Data Understanding and Preparation
### 3.1. Data Collection
For this analysis, data was sourced from multiple reputable sources in the movie industry. The primary datasets were obtained from the following sources:

- **IMDB**
- **Box Office Mojo**
- **Rotten Tomatoes**
- **TheMovieDB**
- **The Numbers**

The main datasets used for this analysis were sourced from `im.db`, a SQLite database containing detailed movie information, and `bom.movie_gross.csv.gz`, a compressed CSV file from Box Office Mojo containing box office gross data.


In [1]:
#Import necessary libraries
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns
import gzip

In [2]:
# Load datasets

### 3.2. Data Cleaning
Data cleaning is a crucial step to ensure the integrity and reliability of the analysis. In this phase, the following steps were performed:

- **Handle Missing Values**: Missing values were identified and appropriately handled to avoid any biases in the analysis.
- **Standardize Data Formats**: Data formats such as dates and currencies were standardized to ensure consistency across the datasets.
- **Merge Datasets**: The datasets were merged on common keys, such as movie titles and release dates, to consolidate the information for analysis.


In [3]:
# Handle missing values



In [4]:
# Standardize data formats



In [5]:
# Merge datasets


### 3.3. Data Exploration
Exploratory data analysis (EDA) was conducted to gain insights into the dataset and identify patterns and trends. This involved:

- Initial exploration to understand data distributions and relationships.
- Summary statistics and visualizations, such as histograms and box plots, to visualize the data.



In [None]:
# Initial exploration



In [None]:
# Visualizations


In [None]:
# Example: distribution of movie budgets

## 4. Data Analysis
### 4.1. Exploratory Data Analysis (EDA)
EDA was conducted to analyze the dataset and address the key business questions. The analysis included:

- Analyzing genre performance at the box office.
- Investigating the relationship between movie ratings and box office revenue.
- Studying the impact of movie budgets on profitability.




In [6]:
# Analyze genre performance

# Ratings vs Box Office Revenue


### 4.2. Feature Engineering
Feature engineering involved creating new features based on existing data and transforming categorical data into numerical format where necessary. This step aimed to enhance the predictive power of the analysis.



In [7]:
# Create new features


### 4.3. Visualization
Visualization played a crucial role in presenting the findings of the analysis in a clear and understandable manner. Visualizations such as bar charts and scatter plots were used to support the analysis and make it accessible to a non-technical audience.



In [8]:
# Visualizations for findings

# Scatter plot for ratings vs revenue


## 5. Recommendations
### 5.1. Business Recommendations
Based on the analysis, the following recommendations were made to inform Microsoft’s movie production strategy:

- Focus on producing films in genres that consistently perform well at the box office.
- Invest in films with moderate budgets to maximize return on investment (ROI).
- Consider movie ratings from IMDB and Rotten Tomatoes when greenlighting new projects.