# Project Overview -(Phase 1 Project)
### Using exploratory data analysis to generate insights for a business stakeholder.

## 1. The Business problem
### a) Specifying the Business problem
> Microsoft sees all the big companies creating original video content and they want to get in on the fun. 
> They have decided to create a new movie studio, but they don’t know anything about creating movies.

>**Problem Statement:** You are charged with exploring what types of films are currently doing the best at the box office. You must then >translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

### b) Defining the Metric for Success
To define the metric for success in this project, the following factors should be considered:

1. `Box Office Revenue` : The primary metric for success in the movie industry is the box office revenue generated by a film. This analysis should aim to identify the types of films that have been most successful in terms of generating high box office revenue.

2. `Audience Reception` : While revenue is crucial, the audience's reception and satisfaction are also important indicators of success. Factors such as ratings, reviews, and audience engagement can provide insights into the types of films that resonate well with viewers.

3. `Return on Investment (ROI)`: Another important metric is the return on investment, which measures the profitability of a film. It is calculated by dividing the film's total revenue by its production budget. Identifying films with high ROI can provide valuable insights for Microsoft's new movie studio to prioritize investments.

Overall, success can be measured by identifying the most profitable film genres, optimizing ROI, satisfying audience preferences, gaining market share, and outperforming competitors. 

### c) Understanding the context
In the given context, Microsoft has decided to venture into the movie industry and create its own movie studio. However, they lack knowledge and experience in the field of movie production.

The objective is to analyze the existing movie landscape and provide actionable insights to the head of Microsoft's new movie studio. These insights will help Microsoft make informed decisions on the types of films they should create to maximize their chances of success.


### d) Recording the Experimental Design

 The  following experimental design will be followed in order to conduct the exploratory data analysis and generate insights for Microsoft's new movie studio:

1. `Data Collection`: Gathering of  relevant data on movie box office performance, including information on film genres, box office revenue, production budgets, ratings, and other relevant variables. 

2. `Data Cleaning and Preparation`: Cleaning the collected data to ensure its quality and reliability. This step involves handling missing values, removing duplicates, standardizing variables, and transforming data if necessary. 

3. `Data Exploration and Descriptive Analysis`: Performing exploratory data analysis to gain initial insights into the dataset. This can include examining the distribution of box office revenue, analyzing the relationship between revenue and different variables (e.g., genres, production budgets), identifying trends or patterns in film performance, and summarizing descriptive statistics.

4. `Analysis and Visualization` : Utilizing appropriate statistical techniques and visualization tools to analyze the data and uncover insights. Explore the relationships between different variables, identify correlations or trends, and visualize the findings to facilitate understanding and interpretation.

6. `Insights Generation` : Based on the analysis results, generating actionable insights that Microsoft's movie studio can be used to make informed decisions. Identifying  the film genres that have performed well in terms of revenue and ROI, highlight audience preferences and trends, and suggest potential opportunities or gaps in the market.

7. `Reporting and Presentation` : Preparing a comprehensive report or presentation summarizing the findings, insights, and recommendations.
using visualizations, charts, and tables to present the results effectively.



## 2. Reading The data
For this analysis We will be using Three datasets

1. `title.basics.csv` - To get to know various Movies and their genres
2. `tmdb.movies.csv` -To get to know various Movies and their popularity
3. `tn.movie_budgets.csv` - To get the revenue and production budget


In [51]:
# Importing the necessary libraries
import numpy as np
import pandas as pd


In [61]:
dataset1= pd.read_csv('Data-files/title.basics.csv')
dataset2= pd.read_csv('Data-files/tmdb.movies.csv')
dataset3= pd.read_csv('Data-files/tn.movie_budgets.csv')

## 3. Checking Our datasets
We will be checking our three datasets to get a deeper understanding of what they entail

In [57]:
# Determining the no. of records in all our datasets
num_records1 = dataset1.shape[0]
print("Number of records in Dataset 1 is:", num_records1)
num_records2 = dataset2.shape[0]
print("Number of records in Dataset 2 is:", num_records2)
num_records3 = dataset3.shape[0]
print("Number of records in Dataset 3 is:", num_records3)

Number of records in Dataset 1 is: 146144
Number of records in Dataset 2 is: 26517
Number of records in Dataset 3 is: 5782


In [None]:
# Previewing the top of our dataset
#