![example](images/director_shot.jpeg)

# Project Title

**Authors:** Student 1, Student 2, Student 3
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [17]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [83]:
# Load file
df = pd.read_csv('tn.movie_budgets.csv')

# Remove "$" and "," from production_budget columns & convert to millions
df['production_budget'] = df['production_budget'].str.replace(' ', '')
df['production_budget'] = df['production_budget'].str.replace('$', '')
df['production_budget'] = df['production_budget'].str.replace(',', '')
df['production_budget'] = df['production_budget'].str[:-6]

# Remove "$" and "," from domestic_gross columns & convert to millions
df['domestic_gross'] = df['domestic_gross'].str.replace(' ', '')
df['domestic_gross'] = df['domestic_gross'].str.replace('$', '')
df['domestic_gross'] = df['domestic_gross'].str.replace(',', '')
df['domestic_gross'] = df['domestic_gross'].str[:-6]

# Remove "$" and "," from worldwide_gross columns & convert to millions
df['worldwide_gross'] = df['worldwide_gross'].str.replace(' ', '')
df['worldwide_gross'] = df['worldwide_gross'].str.replace('$', '')
df['worldwide_gross'] = df['worldwide_gross'].str.replace(',', '')
df['worldwide_gross'] = df['worldwide_gross'].str[:-6]

# Filter blank strings
df = df[df.production_budget != '']
df = df[df.worldwide_gross != '']
df = df[df.domestic_gross != '']

# Change figures from str to int class type
df['production_budget'] = df['production_budget'].astype(int)
df['domestic_gross'] = df['domestic_gross'].astype(int)
df['worldwide_gross'] = df['worldwide_gross'].astype(int)

# Create international_gross col
df['international_gross'] = df['worldwide_gross'] - df['domestic_gross']

# Filter international_gross
## possibly remove
df = df[df.international_gross != 0]

df


Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross,international_gross
0,1,"Dec 18, 2009",Avatar,425,760,2776,2016
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,410,241,1045,804
2,3,"Jun 7, 2019",Dark Phoenix,350,42,149,107
3,4,"May 1, 2015",Avengers: Age of Ultron,330,459,1403,944
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,317,620,1316,696
...,...,...,...,...,...,...,...
5242,43,"Aug 3, 2005",Junebug,1,2,3,1
5243,44,"Aug 1, 2008",Frozen River,1,2,6,4
5244,45,"Nov 21, 2001",Sidewalks of New York,1,2,3,1
5246,47,"Sep 29, 2000",The Broken Hearts Club: A Romantic Comedy,1,1,2,1


In [95]:
# Load genre file
genre_df = pd.read_csv('tmdb.movies.csv')

# Filter to relevant columns
genre_df = genre_df[['genre_ids', 'title']]

# Replace genre id with corresponding genre
## Remove "-"?, remove " "? from genres?
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('12', 'adventure')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('14', 'fantasy')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('16', 'animation')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('18', 'drama')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('27', 'horror')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('28', 'action')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('35', 'comedy')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('36', 'history')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('37', 'western')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('53', 'thriller')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('80', 'crime')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('99', 'documentary')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('878', 'sci-fi')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('9648', 'mystery')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('10402', 'music')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('10749', 'romance')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('10751', 'family')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('10752', 'war')
genre_df['genre_ids'] = genre_df['genre_ids'].str.replace('10770', 'tv movie')

genre_df

Unnamed: 0,genre_ids,title
0,"[adventure, fantasy, family]",Harry Potter and the Deathly Hallows: Part 1
1,"[fantasy, adventure, animation, family]",How to Train Your Dragon
2,"[adventure, action, sci-fi]",Iron Man 2
3,"[animation, comedy, family]",Toy Story
4,"[action, sci-fi, adventure]",Inception
...,...,...
26512,"[horror, drama]",Laboratory Conditions
26513,"[drama, thriller]",_EXHIBIT_84xxx_
26514,"[fantasy, action, adventure]",The Last One
26515,"[family, adventure, action]",Trailer Made


In [120]:
#Create copy of df
useful_columns = ['Movie', 'Genre', 'Production_Budget', 'Domestic_Gross', 'International_Gross', 'Worldwide_Gross']
same_title_list = []

#combined_df = pd.DataFrame(columns=useful_columns)
combined_df = [useful_columns]

#convert both title columns to lists for conversion later
genre_title_list = genre_df['title'].tolist()
gross_title_list = df['movie'].tolist()

#Create duplicate title list
for title in genre_title_list:
    for movie in gross_title_list:
        if title == movie:
            same_title_list.append(title)
        else:
            continue

#convert same_title_list to DF?
            
#pd.DataFrame(d)

#Add genre_ids to combined_df
#if df['movie'] == genre_df['title']:
    
print(len(same_title_list))

1565


In [None]:
for title in :
    d.append(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating':  p.passer_rating()
        }
    )

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [6]:
# Here you run your code to clean the data

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [108]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***