![example](images/director_shot.jpeg)

# Project Title

**Authors:** Muiru Kamau
***

## Overview


## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

We are going to be working with movie datasets from three sources

1. [The Numbers](https://www.the-numbers.com/) : Every record represents a movie, containing details about that particular film such as its <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">production_budget.</span>The financial information such as budget and gross earnings are measured in U.S. dollars. To evaluate the success of a film, we will primarily rely on the attributes of <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">worldwide_gross earnings</span> and <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">production_budget</span> which will be used to create the ROI.


2. [Box office mojo](https://www.boxofficemojo.com/): Every record represents a movie, containing details about that particular film such as its <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">domestic_gross.</span>


 3. [IMDB](https://www.imdb.com/) : the database contains multiple tables relating to movie attributes, eg ratings.<span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;"> Movie_basics</span> and  <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">movie_ratings</span> will be useful.

Box office mojo and The Number data are csv files we can open using <span style="background-color: #3d3d3d; color: #ffffff; padding: 2px 5px; border-radius: 3px;">pd.read_csv</span>. IMDB is located in a zipped SQlite database, we must unzip and then query using SQlite.

***

We also have two other datasets that I won't be exploring but were provided for this analysis
 
 1. [TheMovieDB](https://www.themoviedb.org/)

 2. [Rotten Tomatoes](https://www.rottentomatoes.com/)

## Import and Data description

In [1]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile
import sqlite3

%matplotlib inline

In [2]:
with zipfile.ZipFile('zippedData/im.db.zip') as my_zip:
    zipfile.ZipFile.extractall(my_zip, path='ZippedData')
    
# Created a new file called im.db
# Added the file to .gitignore because it's too big to upload to GitHub

In [3]:
# Connect to the SQLite database file named "im.db" located in the "zippedData" directory
con = sqlite3.connect('zippedData/im.db')

In [4]:
# Read all tables from the "sqlite_schema" table in the SQLite database connected to by "con"

pd.read_sql("""
SELECT *
FROM sqlite_schema
WHERE type='table'
""", con)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,movie_basics,movie_basics,2,"CREATE TABLE ""movie_basics"" (\n""movie_id"" TEXT..."
1,table,directors,directors,3,"CREATE TABLE ""directors"" (\n""movie_id"" TEXT,\n..."
2,table,known_for,known_for,4,"CREATE TABLE ""known_for"" (\n""person_id"" TEXT,\..."
3,table,movie_akas,movie_akas,5,"CREATE TABLE ""movie_akas"" (\n""movie_id"" TEXT,\..."
4,table,movie_ratings,movie_ratings,6,"CREATE TABLE ""movie_ratings"" (\n""movie_id"" TEX..."
5,table,persons,persons,7,"CREATE TABLE ""persons"" (\n""person_id"" TEXT,\n ..."
6,table,principals,principals,8,"CREATE TABLE ""principals"" (\n""movie_id"" TEXT,\..."
7,table,writers,writers,9,"CREATE TABLE ""writers"" (\n""movie_id"" TEXT,\n ..."


In [5]:
# Get a preview of the data from the movie_basics table 

query =  """
SELECT 
* FROM 
movie_basics
"""

movie_basics = pd.read_sql(query, con) 
movie_basics.head()

Unnamed: 0,movie_id,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"


## Data Preparation







In [11]:
# Here you run your code to clean the data

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [12]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***