## Final Project Submission

Please fill out:
* Student name: Cameryn
* Student pace: part time
* Scheduled project review date/time: 
* Instructor name: Victor Geislinger
* Blog post URL: 


# Importing important packages
For this project, the pandas, numpy, seaborn, matplotlib, and sqlite3 packages are needed and imported. The matplotlib is also declared as inline to avoid any potential issues in the future.

In [1]:
import pandas as pd # for dataframes
import numpy as np
import seaborn as sns # for making our charts more readable
import matplotlib.pyplot as plt
import sqlite3
from sqlalchemy import create_engine

%matplotlib inline

# Importing all data as CSV and TSV files

First, it is important to bring all of the data into a usable, readable format, in this case, pandas dataframes.

**Note that one file, 'rt.reviews.tsv' is encoded differently and thus needed to be converted using the correct encoding declaration.**

In [2]:
#Import data
bom_df = pd.read_csv('zippedData/bom.movie_gross.csv')
im_n_basics_df = pd.read_csv('zippedData/imdb.name.basics.csv')
im_akas_df = pd.read_csv('zippedData/imdb.title.akas.csv')
im_basics_df = pd.read_csv('zippedData/imdb.title.basics.csv')
im_crew_df = pd.read_csv('zippedData/imdb.title.crew.csv')
im_principals_df = pd.read_csv('zippedData/imdb.title.principals.csv')
im_ratings_df = pd.read_csv('zippedData/imdb.title.ratings.csv')
rt_movie_info_df = pd.read_csv('zippedData/rt.movie_info.tsv', sep='\t')
rt_reviews_df = pd.read_csv('zippedData/rt.reviews.tsv', sep='\t', encoding = 'latin1')
tmdb_movies_df = pd.read_csv('zippedData/tmdb.movies.csv')
tn_budgets_df = pd.read_csv('zippedData/tn.movie_budgets.csv')

After importing them, the .info() of each dataframe was checked in order to determine the data types of each column and determine what needed to be done in the below cells. That is not shown below for each dataframe, simply due to the amount of space that does take, but it is shown for the first dataframe, bom_df.

**Note the 'foreign_gross' column's type. This will prevent future statistical analysis, which is addressed in the future.**

In [3]:
bom_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3387 entries, 0 to 3386
Data columns (total 5 columns):
title             3387 non-null object
studio            3382 non-null object
domestic_gross    3359 non-null float64
foreign_gross     2037 non-null object
year              3387 non-null int64
dtypes: float64(1), int64(1), object(3)
memory usage: 132.4+ KB


# Data Cleaning
In order to appropriately understand and interpret each dataframe in the context of the others, it is important to clean andappropriately format each column, removing unhelpful and ill-formated data and replacing where possible.

### Changing Column Type
The issue noted previously with bom_bf['foreign_gross'], is common across a few of our dataframes, thus the below function has been written to prevent unnecessary repetition.

In [4]:
def column_type_changer(column, df):
    """Returns stripped, numeric column from object-type column
       Args:
           column: Column to be converted.
           df: DataFrame that column is located in.
       Returns:
           Series data converted to numeric data type"""
    
    df[column] = df[column].str.replace('$','') # Removing non-numeric chars
    df[column] = df[column].str.replace(',','') # Removing non-numeric chars
    df[column] = df[column].str.replace(' minutes','') # Removing non-numeric chars
    df[column] = df[column].apply(pd.to_numeric) # Converts to numbers
    return df[column]

In [5]:
bom_df['foreign_gross'] = column_type_changer('foreign_gross', bom_df)

In [6]:
print(bom_df.describe())
bom_df.head()

       domestic_gross  foreign_gross         year
count    3.359000e+03   2.037000e+03  3387.000000
mean     2.874585e+07   7.487281e+07  2013.958075
std      6.698250e+07   1.374106e+08     2.478141
min      1.000000e+02   6.000000e+02  2010.000000
25%      1.200000e+05   3.700000e+06  2012.000000
50%      1.400000e+06   1.870000e+07  2014.000000
75%      2.790000e+07   7.490000e+07  2016.000000
max      9.367000e+08   9.605000e+08  2018.000000


Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000.0,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000.0,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000.0,2010
3,Inception,WB,292600000.0,535700000.0,2010
4,Shrek Forever After,P/DW,238700000.0,513900000.0,2010


### Data from Box Office Mojo

In [7]:
print(bom_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3387 entries, 0 to 3386
Data columns (total 5 columns):
title             3387 non-null object
studio            3382 non-null object
domestic_gross    3359 non-null float64
foreign_gross     2037 non-null float64
year              3387 non-null int64
dtypes: float64(2), int64(1), object(2)
memory usage: 132.4+ KB
None


Note the number of null objects in the 'foreign_gross' column. This could be attributed to a number of things, though often it could simply be that the film wasn't released abroad. Without context, it may cause more problems that it would solve to replace that information. Further exploration of the other data may be necessary.

### Data from IMDb

In [8]:
im_n_basics_df.info()
im_n_basics_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 606648 entries, 0 to 606647
Data columns (total 6 columns):
nconst                606648 non-null object
primary_name          606648 non-null object
birth_year            82736 non-null float64
death_year            6783 non-null float64
primary_profession    555308 non-null object
known_for_titles      576444 non-null object
dtypes: float64(2), object(4)
memory usage: 27.8+ MB


Unnamed: 0,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer","tt0837562,tt2398241,tt0844471,tt0118553"
1,nm0061865,Joseph Bauer,,,"composer,music_department,sound_department","tt0896534,tt6791238,tt0287072,tt1682940"
2,nm0062070,Bruce Baum,,,"miscellaneous,actor,writer","tt1470654,tt0363631,tt0104030,tt0102898"
3,nm0062195,Axel Baumann,,,"camera_department,cinematographer,art_department","tt0114371,tt2004304,tt1618448,tt1224387"
4,nm0062798,Pete Baxter,,,"production_designer,art_department,set_decorator","tt0452644,tt0452692,tt3458030,tt2178256"


In [9]:
im_akas_df.info()
im_akas_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 331703 entries, 0 to 331702
Data columns (total 8 columns):
title_id             331703 non-null object
ordering             331703 non-null int64
title                331703 non-null object
region               278410 non-null object
language             41715 non-null object
types                168447 non-null object
attributes           14925 non-null object
is_original_title    331678 non-null float64
dtypes: float64(1), int64(1), object(6)
memory usage: 20.2+ MB


Unnamed: 0,title_id,ordering,title,region,language,types,attributes,is_original_title
0,tt0369610,10,Джурасик свят,BG,bg,,,0.0
1,tt0369610,11,Jurashikku warudo,JP,,imdbDisplay,,0.0
2,tt0369610,12,Jurassic World: O Mundo dos Dinossauros,BR,,imdbDisplay,,0.0
3,tt0369610,13,O Mundo dos Dinossauros,BR,,,short title,0.0
4,tt0369610,14,Jurassic World,FR,,imdbDisplay,,0.0


### Merging DataFrames
In order to better utilize this data, the ratings dataframe will be merged with the basics dataframe, as the ratings dataframe does not contain any keys that otherwise would refer to other dataframes in future data manipulation. The crew dataframe will also be merged into the dataframe, utilizing the tconstant column in order to maintain consistency, and to allow future joins and subqueries to be much more simple.

**While this does remove some lines from the data, as the client is clearly attempting to determine what the best option is to enter the film business, requiring ratings seems to be a good choice.**

In [10]:
im_crew_df.head() # Quick preview of crew dataframe

Unnamed: 0,tconst,directors,writers
0,tt0285252,nm0899854,nm0899854
1,tt0438973,,"nm0175726,nm1802864"
2,tt0462036,nm1940585,nm1940585
3,tt0835418,nm0151540,"nm0310087,nm0841532"
4,tt0878654,"nm0089502,nm2291498,nm2292011",nm0284943


In [11]:
im_ratings_df.head() # Quick preview of ratings dataframe

Unnamed: 0,tconst,averagerating,numvotes
0,tt10356526,8.3,31
1,tt10384606,8.9,559
2,tt1042974,6.4,20
3,tt1043726,4.2,50352
4,tt1060240,6.5,21


In [12]:
im_basics_df = pd.merge(im_basics_df, im_ratings_df, on='tconst')
im_basics_df = pd.merge(im_basics_df, im_crew_df, on='tconst')
im_basics_df.head()

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres,averagerating,numvotes,directors,writers
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama",7.0,77,nm0712540,"nm0023551,nm1194313,nm0347899,nm1391276"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama",7.2,43,nm0002411,
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama,6.9,4517,nm0000080,"nm0000080,nm0462648"
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama",6.1,13,nm0611531,nm0347899
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy",6.5,119,"nm0765384,nm0749914","nm1360635,nm0749914"


In [13]:
im_principals_df.info()
im_principals_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1028186 entries, 0 to 1028185
Data columns (total 6 columns):
tconst        1028186 non-null object
ordering      1028186 non-null int64
nconst        1028186 non-null object
category      1028186 non-null object
job           177684 non-null object
characters    393360 non-null object
dtypes: int64(1), object(5)
memory usage: 47.1+ MB


Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0111414,1,nm0246005,actor,,"[""The Man""]"
1,tt0111414,2,nm0398271,director,,
2,tt0111414,3,nm3739909,producer,producer,
3,tt0323808,10,nm0059247,editor,,
4,tt0323808,1,nm3579312,actress,,"[""Beth Boothby""]"


### Data from Rotten Tomatoes

In [14]:
rt_movie_info_df.info()
rt_movie_info_df['box_office'] = column_type_changer('box_office',rt_movie_info_df)
rt_movie_info_df['runtime'] = column_type_changer('runtime',rt_movie_info_df)
rt_movie_info_df.drop(columns='id', inplace=True)
rt_movie_info_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1560 entries, 0 to 1559
Data columns (total 12 columns):
id              1560 non-null int64
synopsis        1498 non-null object
rating          1557 non-null object
genre           1552 non-null object
director        1361 non-null object
writer          1111 non-null object
theater_date    1201 non-null object
dvd_date        1201 non-null object
currency        340 non-null object
box_office      340 non-null object
runtime         1530 non-null object
studio          494 non-null object
dtypes: int64(1), object(11)
memory usage: 146.4+ KB


Unnamed: 0,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104.0,
1,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000.0,108.0,Entertainment One
2,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116.0,
3,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128.0,
4,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200.0,


**Data to be used to identify reviewers for invitations. (Look for fresh and high ratings from top critics.)**

In [15]:
rt_reviews_df.info()#use this data to potentially invite specific reviewers
rt_reviews_df.drop(columns='id', inplace=True)
rt_reviews_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
id            54432 non-null int64
review        48869 non-null object
rating        40915 non-null object
fresh         54432 non-null object
critic        51710 non-null object
top_critic    54432 non-null int64
publisher     54123 non-null object
date          54432 non-null object
dtypes: int64(2), object(6)
memory usage: 3.3+ MB


Unnamed: 0,review,rating,fresh,critic,top_critic,publisher,date
0,A distinctly gallows take on contemporary fina...,3/5,fresh,PJ Nabarro,0,Patrick Nabarro,"November 10, 2018"
1,It's an allegory in search of a meaning that n...,,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
2,... life lived in a bubble in financial dealin...,,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
3,Continuing along a line introduced in last yea...,,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
4,... a perverse twist on neorealism...,,fresh,,0,Cinema Scope,"October 12, 2017"


### Data from TMDb

In [16]:
tmdb_movies_df.drop('Unnamed: 0', axis=1, inplace=True)
tmdb_movies_df.set_index('id')
tmdb_movies_df.head()

Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


In [17]:
tn_budgets_df['production_budget'] = column_type_changer('production_budget', tn_budgets_df)
tn_budgets_df['domestic_gross'] = column_type_changer('domestic_gross', tn_budgets_df)
tn_budgets_df['worldwide_gross'] = column_type_changer('worldwide_gross', tn_budgets_df)

# Creating a Database
It is necessary to create a database first to tie all of the previous data together and to allow for queries.

In [18]:
# Create the movies database
conn = sqlite3.connect('movies.sqlite')
cur = conn.cursor()

In the below cells, the dataframes will be converted to sqlite tables in order to be better queried. This will allow for joining on various columns/keys, as will be explained in the future.

In [19]:
engine = create_engine('sqlite://', echo=False)
im_basics_df.to_sql('movies', con=engine)

engine.execute("""SELECT * 
                from movies;""").fetchall()

[(0, 'tt0063540', 'Sunghursh', 'Sunghursh', 2013, 175.0, 'Action,Crime,Drama', 7.0, 77, 'nm0712540', 'nm0023551,nm1194313,nm0347899,nm1391276'),
 (1, 'tt0066787', 'One Day Before the Rainy Season', 'Ashad Ka Ek Din', 2019, 114.0, 'Biography,Drama', 7.2, 43, 'nm0002411', None),
 (2, 'tt0069049', 'The Other Side of the Wind', 'The Other Side of the Wind', 2018, 122.0, 'Drama', 6.9, 4517, 'nm0000080', 'nm0000080,nm0462648'),
 (3, 'tt0069204', 'Sabse Bada Sukh', 'Sabse Bada Sukh', 2018, None, 'Comedy,Drama', 6.1, 13, 'nm0611531', 'nm0347899'),
 (4, 'tt0100275', 'The Wandering Soap Opera', 'La Telenovela Errante', 2017, 80.0, 'Comedy,Drama,Fantasy', 6.5, 119, 'nm0765384,nm0749914', 'nm1360635,nm0749914'),
 (5, 'tt0112502', 'Bigfoot', 'Bigfoot', 2017, None, 'Horror,Thriller', 4.1, 32, 'nm6883878', None),
 (6, 'tt0137204', 'Joe Finds Grace', 'Joe Finds Grace', 2017, 83.0, 'Adventure,Animation,Comedy', 8.1, 263, 'nm0365480', 'nm0365480'),
 (7, 'tt0146592', 'Pál Adrienn', 'Pál Adrienn', 2010, 1

In [20]:
im_n_basics_df.to_sql('workers',con=engine)
engine.execute("""SELECT * 
                from workers;""").fetchall()

[(0, 'nm0061671', 'Mary Ellen Bauder', None, None, 'miscellaneous,production_manager,producer', 'tt0837562,tt2398241,tt0844471,tt0118553'),
 (1, 'nm0061865', 'Joseph Bauer', None, None, 'composer,music_department,sound_department', 'tt0896534,tt6791238,tt0287072,tt1682940'),
 (2, 'nm0062070', 'Bruce Baum', None, None, 'miscellaneous,actor,writer', 'tt1470654,tt0363631,tt0104030,tt0102898'),
 (3, 'nm0062195', 'Axel Baumann', None, None, 'camera_department,cinematographer,art_department', 'tt0114371,tt2004304,tt1618448,tt1224387'),
 (4, 'nm0062798', 'Pete Baxter', None, None, 'production_designer,art_department,set_decorator', 'tt0452644,tt0452692,tt3458030,tt2178256'),
 (5, 'nm0062879', 'Ruel S. Bayani', None, None, 'director,production_manager,miscellaneous', 'tt2590280,tt0352080,tt0216559,tt2057445'),
 (6, 'nm0063198', 'Bayou', None, None, 'actor', 'tt6579724,tt0093116'),
 (7, 'nm0063432', 'Stevie Be-Zet', None, None, 'composer,soundtrack', 'tt3106212,tt0478239,tt0264917,tt1626606'),


In [21]:
im_akas_df.to_sql('movieNames',con=engine)
engine.execute("""SELECT * 
                from movieNames;""").fetchall()

[(0, 'tt0369610', 10, 'Джурасик свят', 'BG', 'bg', None, None, 0.0),
 (1, 'tt0369610', 11, 'Jurashikku warudo', 'JP', None, 'imdbDisplay', None, 0.0),
 (2, 'tt0369610', 12, 'Jurassic World: O Mundo dos Dinossauros', 'BR', None, 'imdbDisplay', None, 0.0),
 (3, 'tt0369610', 13, 'O Mundo dos Dinossauros', 'BR', None, None, 'short title', 0.0),
 (4, 'tt0369610', 14, 'Jurassic World', 'FR', None, 'imdbDisplay', None, 0.0),
 (5, 'tt0369610', 15, 'Jurassic World', 'GR', None, 'imdbDisplay', None, 0.0),
 (6, 'tt0369610', 16, 'Jurassic World', 'IT', None, 'imdbDisplay', None, 0.0),
 (7, 'tt0369610', 17, 'Jurski svijet', 'HR', None, 'imdbDisplay', None, 0.0),
 (8, 'tt0369610', 18, "Olam ha'Yura", 'IL', 'he', 'imdbDisplay', None, 0.0),
 (9, 'tt0369610', 19, 'Jurassic World: Mundo Jurásico', 'MX', None, 'imdbDisplay', None, 0.0),
 (10, 'tt0369610', 1, 'Jurassic World: Sauruste maailm', 'EE', None, 'imdbDisplay', None, 0.0),
 (11, 'tt0369610', 20, 'Jurassic World', 'SE', None, 'imdbDisplay', None, 

In [22]:
im_principals_df.to_sql('principals',con=engine)
engine.execute("""SELECT * 
                from principals;""").fetchall()

[(0, 'tt0111414', 1, 'nm0246005', 'actor', None, '["The Man"]'),
 (1, 'tt0111414', 2, 'nm0398271', 'director', None, None),
 (2, 'tt0111414', 3, 'nm3739909', 'producer', 'producer', None),
 (3, 'tt0323808', 10, 'nm0059247', 'editor', None, None),
 (4, 'tt0323808', 1, 'nm3579312', 'actress', None, '["Beth Boothby"]'),
 (5, 'tt0323808', 2, 'nm2694680', 'actor', None, '["Steve Thomson"]'),
 (6, 'tt0323808', 3, 'nm0574615', 'actor', None, '["Sir Lachlan Morrison"]'),
 (7, 'tt0323808', 4, 'nm0502652', 'actress', None, '["Lady Delia Morrison"]'),
 (8, 'tt0323808', 5, 'nm0362736', 'director', None, None),
 (9, 'tt0323808', 6, 'nm0811056', 'producer', 'producer', None),
 (10, 'tt0323808', 7, 'nm0914939', 'producer', 'producer', None),
 (11, 'tt0323808', 8, 'nm0779346', 'composer', None, None),
 (12, 'tt0323808', 9, 'nm0676104', 'cinematographer', None, None),
 (13, 'tt0417610', 10, 'nm0284261', 'composer', None, None),
 (14, 'tt0417610', 1, 'nm0532721', 'actor', None, '["Lucio"]'),
 (15, 'tt04

In [23]:
rt_reviews_df.to_sql('RTReviews',con=engine)
engine.execute("""SELECT * 
                from RTReviews;""").fetchall()

[(0, "A distinctly gallows take on contemporary financial mores, as one absurdly rich man's limo ride across town for a haircut functions as a state-of-the-nation discourse. ", '3/5', 'fresh', 'PJ Nabarro', 0, 'Patrick Nabarro', 'November 10, 2018'),
 (1, "It's an allegory in search of a meaning that never arrives...It's just old-fashioned bad storytelling.", None, 'rotten', 'Annalee Newitz', 0, 'io9.com', 'May 23, 2018'),
 (2, '... life lived in a bubble in financial dealings and digital communications and brief face-to-face conversations and sexual intermissions in a space shuttle of a limousine creeping through the gridlock of an anonymous New York City.', None, 'fresh', 'Sean Axmaker', 0, 'Stream on Demand', 'January 4, 2018'),
 (3, 'Continuing along a line introduced in last year\'s "A Dangerous Method", David Cronenberg pushes his cinema towards a talky abstraction in his uncanny, perversely funny and frighteningly insular adaptation of Don DeLillo, "Cosmopolis".', None, 'fresh',

In [24]:
tmdb_movies_df.to_sql('TMDbMovies',con=engine)
engine.execute("""SELECT * 
                from TMDbMovies;""").fetchall()

[(0, '[12, 14, 10751]', 12444, 'en', 'Harry Potter and the Deathly Hallows: Part 1', 33.533, '2010-11-19', 'Harry Potter and the Deathly Hallows: Part 1', 7.7, 10788),
 (1, '[14, 12, 16, 10751]', 10191, 'en', 'How to Train Your Dragon', 28.734, '2010-03-26', 'How to Train Your Dragon', 7.7, 7610),
 (2, '[12, 28, 878]', 10138, 'en', 'Iron Man 2', 28.515, '2010-05-07', 'Iron Man 2', 6.8, 12368),
 (3, '[16, 35, 10751]', 862, 'en', 'Toy Story', 28.005, '1995-11-22', 'Toy Story', 7.9, 10174),
 (4, '[28, 878, 12]', 27205, 'en', 'Inception', 27.92, '2010-07-16', 'Inception', 8.3, 22186),
 (5, '[12, 14, 10751]', 32657, 'en', 'Percy Jackson & the Olympians: The Lightning Thief', 26.691, '2010-02-11', 'Percy Jackson & the Olympians: The Lightning Thief', 6.1, 4229),
 (6, '[28, 12, 14, 878]', 19995, 'en', 'Avatar', 26.526, '2009-12-18', 'Avatar', 7.4, 18676),
 (7, '[16, 10751, 35]', 10193, 'en', 'Toy Story 3', 24.445, '2010-06-17', 'Toy Story 3', 7.7, 8340),
 (8, '[16, 10751, 35]', 20352, 'en', '

In [25]:
tn_budgets_df.to_sql('TNBudgets',con=engine)
engine.execute("""SELECT * 
                from TNBudgets;""").fetchall()

[(0, 1, 'Dec 18, 2009', 'Avatar', 425000000, 760507625, 2776345279),
 (1, 2, 'May 20, 2011', 'Pirates of the Caribbean: On Stranger Tides', 410600000, 241063875, 1045663875),
 (2, 3, 'Jun 7, 2019', 'Dark Phoenix', 350000000, 42762350, 149762350),
 (3, 4, 'May 1, 2015', 'Avengers: Age of Ultron', 330600000, 459005868, 1403013963),
 (4, 5, 'Dec 15, 2017', 'Star Wars Ep. VIII: The Last Jedi', 317000000, 620181382, 1316721747),
 (5, 6, 'Dec 18, 2015', 'Star Wars Ep. VII: The Force Awakens', 306000000, 936662225, 2053311220),
 (6, 7, 'Apr 27, 2018', 'Avengers: Infinity War', 300000000, 678815482, 2048134200),
 (7, 8, 'May 24, 2007', 'Pirates of the Caribbean: At Worldâ\x80\x99s End', 300000000, 309420425, 963420425),
 (8, 9, 'Nov 17, 2017', 'Justice League', 300000000, 229024295, 655945209),
 (9, 10, 'Nov 6, 2015', 'Spectre', 300000000, 200074175, 879620923),
 (10, 11, 'Jul 20, 2012', 'The Dark Knight Rises', 275000000, 448139099, 1084439099),
 (11, 12, 'May 25, 2018', 'Solo: A Star Wars St

# Visualizations
