# Title

## Overview

## Business Problem

Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

Movie production has significant up-front costs that will require internal stakeholder support to adequately fund new projects. Further, it will be important to generate engagement with Microsoft's titles both to maximize return on investment and to legitimize Microsoft as a content producer in the future. Therefore, this analysis aims to generate recommendations on how best to deploy the content production budget. 

We will use data from IMDB, The Numbers and The Movie Database to determine answers to the following questions:

- What genres of movie are likely to optimize return on investment?

- What is the relationship between movie budget and popularity using The Movie Database's popularity score? Similarly, can we use the popularity score as a proxy for engagement based on spending?

- What season/month should we target releases in order to optimize our return on investment?





## Data Understanding

For this analysis, we will utilize a database from IMDB, and two datasets from The Numbers.com and The Movie Database. The IMDB database contains robust information for each title--most notably the title, release date and relevant genres. The dataset from TheNumbers.com will primarily be used for return on investment data, including worldwide gross revenue and movie production budget. The Movie Database contains a proprietary popularity score which is calculated based on a number of factors to measure engagement. Documentation for TMDB's popularity score can be found [here.](https://developers.themoviedb.org/3/getting-started/popularity).

In [None]:
# import necessary packages
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
import zipfile
import numpy as np
%matplotlib inline

The IMDB dataset includes information on movies spanning nearly a century. Given the changing appetites over time, we elect to limit our dataset to movies released in 2010 or later.

In [None]:
# Extract IMDb SQL .db file
with zipfile.ZipFile('./Data/im.db.zip') as zipObj:
    # Extract all contents of .zip file into current directory
    zipObj.extractall(path='./Data/')
    
# Create connection to IMDb DB
conn = sqlite3.connect('./data/im.db')

## Data Preparation

### Data Cleaning

### Merging Datasets

### Feature Engineering

## Analysis

### Return on Investment by Genre


### Popularity by Investment

### Popularity by Release Month

## Conclusions


### Next Steps