## BUSINESS PROBLEM 
Your company now sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies. You are charged with exploring what types of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of your company's new movie studio can use to help decide what type of films to create


## BUSINESS SOLUTION  
By analyzing domestic profit, worldwide profit, and production profit ratios, the new movie studio can identify which films offer the best returns on investment. These insights help guide decisions on budgeting, genre selection, and marketing strategy, enabling the company to focus on producing movies that maximize profitability and audience appeal.


## Get started 


In [45]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3


## Data Analysis  

### Objectives  
1: Load the three datasets 




In [46]:
df1 = pd.read_csv('Data\cleaned_movie_budgets.csv')
df1.head()


Unnamed: 0,release_date,title,production_budget,domestic_gross,worldwide_gross,year
0,2009-12-18,Avatar,425000000.0,760507625.0,2776345000.0,2009
1,2011-05-20,Pirates of the Caribbean: On Stranger Tides,410600000.0,241063875.0,1045664000.0,2011
2,2019-06-07,Dark Phoenix,350000000.0,42762350.0,149762400.0,2019
3,2015-05-01,Avengers: Age of Ultron,330600000.0,459005868.0,1403014000.0,2015
4,2017-12-15,Star Wars Ep. VIII: The Last Jedi,317000000.0,620181382.0,1316722000.0,2017


In [47]:
df2 = pd.read_csv('Data\cleaned_bom_movie_gross.csv')
df2.head()

Unnamed: 0,title,domestic_gross,foreign_gross
0,Toy Story 3,415000000.0,652000000.0
1,Alice in Wonderland (2010),334200000.0,691300000.0
2,Harry Potter and the Deathly Hallows Part 1,296000000.0,664300000.0
3,Inception,292600000.0,535700000.0
4,Shrek Forever After,238700000.0,513900000.0


In [52]:
df3 = pd.read_csv('Data/tmdb.movies.csv')
df3.head()


Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


## Statistics of each dataset  

In [None]:
df1.describe()

Unnamed: 0,production_budget,domestic_gross,worldwide_gross,year
count,5782.0,5782.0,5782.0,5782.0
mean,31587760.0,41873330.0,91487460.0,2003.967139
std,41812080.0,68240600.0,174720000.0,12.724386
min,1100.0,0.0,0.0,1915.0
25%,5000000.0,1429534.0,4125415.0,2000.0
50%,17000000.0,17225940.0,27984450.0,2007.0
75%,40000000.0,52348660.0,97645840.0,2012.0
max,425000000.0,936662200.0,2776345000.0,2020.0


In [None]:
df2.describe()

Unnamed: 0,domestic_gross,foreign_gross
count,3359.0,2037.0
mean,28745850.0,74872810.0
std,66982500.0,137410600.0
min,100.0,600.0
25%,120000.0,3700000.0
50%,1400000.0,18700000.0
75%,27900000.0,74900000.0
max,936700000.0,960500000.0


In [None]:
df3.describe()

Unnamed: 0,id,top_critic
count,5503.0,5503.0
mean,1060.814828,0.202435
std,578.983778,0.401851
min,3.0,0.0
25%,573.0,0.0
50%,1092.0,0.0
75%,1545.0,0.0
max,2000.0,1.0


## Finding the relationships


In [56]:
# finding correlations between datasets
## Finding the relationships
merged_df1_df3 = pd.merge(df1, df3, left_on='title', right_on='title', how='inner')
merged_df1_df3 


Unnamed: 0.1,release_date_x,title,production_budget,domestic_gross,worldwide_gross,year,Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date_y,vote_average,vote_count
0,2009-12-18,Avatar,425000000.0,760507625.0,2.776345e+09,2009,6,"[28, 12, 14, 878]",19995,en,Avatar,26.526,2009-12-18,7.4,18676
1,2011-05-20,Pirates of the Caribbean: On Stranger Tides,410600000.0,241063875.0,1.045664e+09,2011,2470,"[12, 28, 14]",1865,en,Pirates of the Caribbean: On Stranger Tides,30.579,2011-05-20,6.4,8571
2,2015-05-01,Avengers: Age of Ultron,330600000.0,459005868.0,1.403014e+09,2015,14169,"[28, 12, 878]",99861,en,Avengers: Age of Ultron,44.383,2015-05-01,7.3,13457
3,2018-04-27,Avengers: Infinity War,300000000.0,678815482.0,2.048134e+09,2018,23811,"[12, 28, 14]",299536,en,Avengers: Infinity War,80.773,2018-04-27,8.3,13948
4,2017-11-17,Justice League,300000000.0,229024295.0,6.559452e+08,2017,20623,"[28, 12, 14, 878]",141052,en,Justice League,34.953,2017-11-17,6.2,7510
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2380,2015-09-01,Exeter,25000.0,0.0,4.897920e+05,2015,14678,"[53, 27]",226458,en,Exeter,5.934,2015-03-26,4.7,121
2381,2015-04-21,Ten,25000.0,0.0,0.000000e+00,2015,12326,"[12, 27, 9648, 53]",279516,en,Ten,1.575,2014-03-28,5.4,5
2382,2014-12-31,Dry Spell,22000.0,0.0,0.000000e+00,2014,10470,"[35, 10749]",255266,en,Dry Spell,0.600,2013-02-14,6.0,1
2383,2013-01-04,All Superheroes Must Die,20000.0,0.0,0.000000e+00,2013,8893,"[878, 53]",86304,en,All Superheroes Must Die,2.078,2013-01-04,3.9,19
