# Project 3 - Part 3 (Core)

The project assignment is at the beginning of this week because you already have all of the background to complete project part 3 based on the first two weeks of the course!
Business Problem

    For this project, you have been hired to produce a MySQL database on Movies from a subset of IMDB's publicly available dataset. Ultimately, you will use this database to analyze what makes a movie successful, and will provide recommendations to the stakeholder on how to make a successful movie.

Over the course of this project, you will:

    Part 1: Download several files from IMDB’s movie data set and filter out the subset of moves requested by the stakeholder.
    Part 2: Use an API to extract box office revenue and profit data to add to your IMDB data and perform exploratory data analysis.
    Part 3: Construct and export a MySQL database using your data.
    Part 4: Apply hypothesis testing to explore what makes a movie successful.
    Part 5 (Optional): Produce a Linear Regression model to predict movie performance.

## Part 3

    For part 3 of the project you will be practicing applying an E.T.L process on your previously saved movie data. Specifically, you will create a new MySQL database after preparing the data for a relational database. You will export your database to a .sql file in your repository using MySQL Workbench.

Specifications - Database

    Your stakeholder wants you to take the data you have been cleaning and collecting in Parts 1 & 2 of the project, and wants you to create a MySQL database for them.

    Specifically, they want the data from the following files included in your database:
        Title Basics:
            Movie ID (tconst)
            Primary Title
            Start Year
            Runtime (in Minutes)
            Genres
        Title Ratings
            Movie ID (tconst)
            Average Movie Rating
            Number of Votes
        The TMDB API Results (multiple files)
            Movie ID
            Revenue
            Budget
            Certification (MPAA Rating)

    You should normalize the tables as best you can before adding them to your new database.
        Note: an important exception to their request is that they would like you to keep all of the data from the TMDB API in 1 table together (even though it will not be perfectly normalized).
        You only need to keep the imdb_id, revenue, budget, and certification columns


## Required Transformation Steps for Title Basics:

    Normalize Genre:
        Convert the single string of genres from title basics into 2 new tables.

            title_genres: with the columns:
                tconst
                genre_id

            genres:
                genre_id
                genre_name

    Discard unnecessary information:
        For the title basics table, drop the following columns:
            "original_title" (we will use the primary title column instead)
            "isAdult" ("Adult" will show up in the genres so this is redundant information).
            "titleType" (every row will be a movie).
            "genres" and other variants of genre (genre is now represented in the 2 new tables described above.
        Do not include the title_akas table in your SQL database.
            You have already filtered out the desired movies using this table and the remaining data is mostly nulls and not of-interest to the stakeholder.

MySQL Database Requirements

    Use sqlalchemy with pandas to execute your SQL queries inside your notebook.

    Create a new database on your MySQL server and call it "movies".

    Make sure to have the following tables in your "movies" database:
        title_basics
        title_ratings
        title_genres
        genres
        tmdb_data

    Make sure to set a Primary Key for each table that isn't a joiner table (e.g. title_genres is a joiner table).

    After creating each table, show the first 5 rows of that table using a SQL query.

    Make sure to run the "SHOW TABLES" SQL query at the end of your notebook to show that all required tables have been created.

## Title Basics:

    Movie ID (tconst)
    Primary Title
    Start Year
    Runtime (in Minutes)
    Genres

In [1]:
import os, time,json
import tmdbsimple as tmdb 
FOLDER = "Data/"
os.makedirs(FOLDER, exist_ok=True)
os.listdir(FOLDER)
import pandas as pd
from tqdm.notebook import tqdm_notebook
import tmdbsimple as tmdb
from yelpapi import YelpAPI

In [2]:
with open('C:/Users/tulan/.secret/TMDB_api.json', 'r') as f:
    login = json.load(f)

tmdb.API_KEY =  login['api-key']

In [3]:
basics = pd.read_csv('Data/final_basics.csv.gz')
basics

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0035423,movie,Kate & Leopold,Kate & Leopold,0,2001,,118,"Comedy,Fantasy,Romance"
1,tt0062336,movie,The Tango of the Widower and Its Distorting Mi...,El Tango del Viudo y Su Espejo Deformante,0,2020,,70,Drama
2,tt0069049,movie,The Other Side of the Wind,The Other Side of the Wind,0,2018,,122,Drama
3,tt0088751,movie,The Naked Monster,The Naked Monster,0,2005,,100,"Comedy,Horror,Sci-Fi"
4,tt0096056,movie,Crime and Punishment,Crime and Punishment,0,2002,,126,Drama
...,...,...,...,...,...,...,...,...,...
84554,tt9914942,movie,Life Without Sara Amat,La vida sense la Sara Amat,0,2019,,74,Drama
84555,tt9915872,movie,The Last White Witch,My Girlfriend is a Wizard,0,2019,,97,"Comedy,Drama,Fantasy"
84556,tt9916170,movie,The Rehearsal,O Ensaio,0,2019,,51,Drama
84557,tt9916190,movie,Safeguard,Safeguard,0,2020,,95,"Action,Adventure,Thriller"


In [4]:
basics.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84559 entries, 0 to 84558
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   tconst          84559 non-null  object 
 1   titleType       84559 non-null  object 
 2   primaryTitle    84559 non-null  object 
 3   originalTitle   84559 non-null  object 
 4   isAdult         84559 non-null  int64  
 5   startYear       84559 non-null  int64  
 6   endYear         0 non-null      float64
 7   runtimeMinutes  84559 non-null  int64  
 8   genres          84559 non-null  object 
dtypes: float64(1), int64(3), object(5)
memory usage: 5.8+ MB


### Drop unnecessary columns

In [5]:
basics = basics.drop(['originalTitle','isAdult','titleType'], axis=1)
basics

Unnamed: 0,tconst,primaryTitle,startYear,endYear,runtimeMinutes,genres
0,tt0035423,Kate & Leopold,2001,,118,"Comedy,Fantasy,Romance"
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020,,70,Drama
2,tt0069049,The Other Side of the Wind,2018,,122,Drama
3,tt0088751,The Naked Monster,2005,,100,"Comedy,Horror,Sci-Fi"
4,tt0096056,Crime and Punishment,2002,,126,Drama
...,...,...,...,...,...,...
84554,tt9914942,Life Without Sara Amat,2019,,74,Drama
84555,tt9915872,The Last White Witch,2019,,97,"Comedy,Drama,Fantasy"
84556,tt9916170,The Rehearsal,2019,,51,Drama
84557,tt9916190,Safeguard,2020,,95,"Action,Adventure,Thriller"


### 1. Getting List of Unique Genres

In [6]:
## create a col with a list of genres
basics['genres_split'] = basics['genres'].str.split(',')
basics



Unnamed: 0,tconst,primaryTitle,startYear,endYear,runtimeMinutes,genres,genres_split
0,tt0035423,Kate & Leopold,2001,,118,"Comedy,Fantasy,Romance","[Comedy, Fantasy, Romance]"
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020,,70,Drama,[Drama]
2,tt0069049,The Other Side of the Wind,2018,,122,Drama,[Drama]
3,tt0088751,The Naked Monster,2005,,100,"Comedy,Horror,Sci-Fi","[Comedy, Horror, Sci-Fi]"
4,tt0096056,Crime and Punishment,2002,,126,Drama,[Drama]
...,...,...,...,...,...,...,...
84554,tt9914942,Life Without Sara Amat,2019,,74,Drama,[Drama]
84555,tt9915872,The Last White Witch,2019,,97,"Comedy,Drama,Fantasy","[Comedy, Drama, Fantasy]"
84556,tt9916170,The Rehearsal,2019,,51,Drama,[Drama]
84557,tt9916190,Safeguard,2020,,95,"Action,Adventure,Thriller","[Action, Adventure, Thriller]"


In [7]:
exploded_genres = basics.explode('genres_split')
exploded_genres



Unnamed: 0,tconst,primaryTitle,startYear,endYear,runtimeMinutes,genres,genres_split
0,tt0035423,Kate & Leopold,2001,,118,"Comedy,Fantasy,Romance",Comedy
0,tt0035423,Kate & Leopold,2001,,118,"Comedy,Fantasy,Romance",Fantasy
0,tt0035423,Kate & Leopold,2001,,118,"Comedy,Fantasy,Romance",Romance
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020,,70,Drama,Drama
2,tt0069049,The Other Side of the Wind,2018,,122,Drama,Drama
...,...,...,...,...,...,...,...
84557,tt9916190,Safeguard,2020,,95,"Action,Adventure,Thriller",Action
84557,tt9916190,Safeguard,2020,,95,"Action,Adventure,Thriller",Adventure
84557,tt9916190,Safeguard,2020,,95,"Action,Adventure,Thriller",Thriller
84558,tt9916362,Coven,2020,,92,"Drama,History",Drama


In [8]:
genres_split = basics['genres'].str.split(",")

unique_genres = genres_split.explode().unique()
unique_genres

array(['Comedy', 'Fantasy', 'Romance', 'Drama', 'Horror', 'Sci-Fi',
       'Mystery', 'Musical', 'Action', 'Adventure', 'Crime', 'Thriller',
       'Music', 'Animation', 'Family', 'History', 'War', 'Biography',
       'Sport', 'Western', 'Adult', 'Short', 'Reality-TV', 'News',
       'Talk-Show', 'Game-Show'], dtype=object)

In [9]:
unique_genres = sorted(exploded_genres['genres_split'].unique())
unique_genres


['Action',
 'Adult',
 'Adventure',
 'Animation',
 'Biography',
 'Comedy',
 'Crime',
 'Drama',
 'Family',
 'Fantasy',
 'Game-Show',
 'History',
 'Horror',
 'Music',
 'Musical',
 'Mystery',
 'News',
 'Reality-TV',
 'Romance',
 'Sci-Fi',
 'Short',
 'Sport',
 'Talk-Show',
 'Thriller',
 'War',
 'Western']

### 2. Create a new title_genres table

In [10]:
title_genres = exploded_genres[['tconst','genres_split']].copy()

In [11]:
title_genres

Unnamed: 0,tconst,genres_split
0,tt0035423,Comedy
0,tt0035423,Fantasy
0,tt0035423,Romance
1,tt0062336,Drama
2,tt0069049,Drama
...,...,...
84557,tt9916190,Action
84557,tt9916190,Adventure
84557,tt9916190,Thriller
84558,tt9916362,Drama


In [12]:
## 3. Create a genre mapper dictionary to replace string genres with integers
## Making the genre mapper dictionary
genre_ints = range(len(unique_genres))
genre_map = dict(zip(unique_genres, genre_ints))
genre_map



{'Action': 0,
 'Adult': 1,
 'Adventure': 2,
 'Animation': 3,
 'Biography': 4,
 'Comedy': 5,
 'Crime': 6,
 'Drama': 7,
 'Family': 8,
 'Fantasy': 9,
 'Game-Show': 10,
 'History': 11,
 'Horror': 12,
 'Music': 13,
 'Musical': 14,
 'Mystery': 15,
 'News': 16,
 'Reality-TV': 17,
 'Romance': 18,
 'Sci-Fi': 19,
 'Short': 20,
 'Sport': 21,
 'Talk-Show': 22,
 'Thriller': 23,
 'War': 24,
 'Western': 25}

In [13]:
## 4. Replace the string genres in title_genres with the new integer ids.
## make new integer genre_id and drop string genres
title_genres['genre_id'] = title_genres['genres_split'].map(genre_map)
title_genres = title_genres.drop(columns='genres_split')



In [14]:
title_genres

Unnamed: 0,tconst,genre_id
0,tt0035423,5
0,tt0035423,9
0,tt0035423,18
1,tt0062336,7
2,tt0069049,7
...,...,...
84557,tt9916190,0
84557,tt9916190,2
84557,tt9916190,23
84558,tt9916362,7


### Convert the genre map dictionary into a dataframe.

In [15]:
genre_lookup = pd.DataFrame({'Genre_name': genre_map.keys(), 'Genre_ID':genre_map.values()})
genre_lookup

Unnamed: 0,Genre_name,Genre_ID
0,Action,0
1,Adult,1
2,Adventure,2
3,Animation,3
4,Biography,4
5,Comedy,5
6,Crime,6
7,Drama,7
8,Family,8
9,Fantasy,9


## Title Ratings

In [16]:
ratings = pd.read_csv('Data/final_ratings.csv.gz')
ratings

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,1930
1,tt0000002,5.8,261
2,tt0000005,6.2,2560
3,tt0000006,5.1,176
4,tt0000007,5.4,798
...,...,...,...
479650,tt9916204,8.2,251
479651,tt9916348,8.5,17
479652,tt9916362,6.4,5073
479653,tt9916428,3.8,14


In [17]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479655 entries, 0 to 479654
Data columns (total 3 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   tconst         479655 non-null  object 
 1   averageRating  479655 non-null  float64
 2   numVotes       479655 non-null  int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 11.0+ MB


No changes needed.

## The TMDB API Results

In [18]:
results = pd.read_csv('Data/tmdb_results_combined.csv.gz')
results

Unnamed: 0,imdb_id,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,original_language,original_title,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,certification
0,0,,,,,,,,,,...,,,,,,,,,,
1,tt0035423,0.0,/hfeiSfWYujh6MKhtGTXyK3DD4nN.jpg,,48000000.0,"[{'id': 10749, 'name': 'Romance'}, {'id': 14, ...",,11232.0,en,Kate & Leopold,...,76019048.0,118.0,"[{'english_name': 'French', 'iso_639_1': 'fr',...",Released,"If they lived in the same century, they'd be p...",Kate & Leopold,0.0,6.3,1170.0,PG-13
2,tt0114447,0.0,,,0.0,"[{'id': 53, 'name': 'Thriller'}, {'id': 28, 'n...",,151007.0,en,The Silent Force,...,0.0,90.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,They left him for dead... They should have fin...,The Silent Force,0.0,5.0,3.0,
3,tt0118589,0.0,/9NZAirJahVilTiDNCHLFcdkwkiy.jpg,,22000000.0,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",,10696.0,en,Glitter,...,5271666.0,104.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,"In music she found her dream, her love, herself.",Glitter,0.0,4.6,122.0,PG-13
4,tt0118652,0.0,/mWxJEFRMvkG4UItYJkRDMgWQ08Y.jpg,,1000000.0,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",,17140.0,en,The Attic Expeditions,...,0.0,100.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,His search for peace of mind... will leave his...,The Attic Expeditions,0.0,5.1,29.0,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2502,tt6174238,0.0,,,0.0,"[{'id': 80, 'name': 'Crime'}]",,223878.0,cn,冷战,...,0.0,0.0,"[{'english_name': 'Cantonese', 'iso_639_1': 'c...",Released,,Cold War,0.0,2.0,2.0,
2503,tt7029820,0.0,,,7000.0,[],,604889.0,en,Scream For Christmas,...,0.0,80.0,[],Released,,Scream For Christmas,0.0,0.0,0.0,
2504,tt7197642,0.0,,,0.0,"[{'id': 35, 'name': 'Comedy'}]",,872676.0,en,"Goodbye, Merry-Go-Round",...,0.0,90.0,[],Released,,"Goodbye, Merry-Go-Round",0.0,0.0,0.0,
2505,tt7631368,0.0,/sF0gUHE0YzZNXYugTB2LFxJIppf.jpg,,10000000.0,"[{'id': 27, 'name': 'Horror'}]",,97186.0,fr,"I, Vampire",...,0.0,85.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,,"I, Vampire",0.0,6.4,4.0,NR


In [19]:
results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2507 entries, 0 to 2506
Data columns (total 26 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   imdb_id                2507 non-null   object 
 1   adult                  2505 non-null   float64
 2   backdrop_path          1334 non-null   object 
 3   belongs_to_collection  201 non-null    object 
 4   budget                 2505 non-null   float64
 5   genres                 2505 non-null   object 
 6   homepage               169 non-null    object 
 7   id                     2505 non-null   float64
 8   original_language      2505 non-null   object 
 9   original_title         2505 non-null   object 
 10  overview               2455 non-null   object 
 11  popularity             2505 non-null   float64
 12  poster_path            2242 non-null   object 
 13  production_companies   2505 non-null   object 
 14  production_countries   2505 non-null   object 
 15  rele

In [20]:
results = results[['imdb_id','revenue','budget','certification']]

In [21]:
results

Unnamed: 0,imdb_id,revenue,budget,certification
0,0,,,
1,tt0035423,76019048.0,48000000.0,PG-13
2,tt0114447,0.0,0.0,
3,tt0118589,5271666.0,22000000.0,PG-13
4,tt0118652,0.0,1000000.0,R
...,...,...,...,...
2502,tt6174238,0.0,0.0,
2503,tt7029820,0.0,7000.0,
2504,tt7197642,0.0,0.0,
2505,tt7631368,0.0,10000000.0,NR


In [22]:
results = results.rename(columns={"imdb_id": "tconst"})

In [23]:
results

Unnamed: 0,tconst,revenue,budget,certification
0,0,,,
1,tt0035423,76019048.0,48000000.0,PG-13
2,tt0114447,0.0,0.0,
3,tt0118589,5271666.0,22000000.0,PG-13
4,tt0118652,0.0,1000000.0,R
...,...,...,...,...
2502,tt6174238,0.0,0.0,
2503,tt7029820,0.0,7000.0,
2504,tt7197642,0.0,0.0,
2505,tt7631368,0.0,10000000.0,NR


## Re-creating Basics 

In [24]:
basics = pd.read_csv('Data/final_basics.csv.gz')
basics

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0035423,movie,Kate & Leopold,Kate & Leopold,0,2001,,118,"Comedy,Fantasy,Romance"
1,tt0062336,movie,The Tango of the Widower and Its Distorting Mi...,El Tango del Viudo y Su Espejo Deformante,0,2020,,70,Drama
2,tt0069049,movie,The Other Side of the Wind,The Other Side of the Wind,0,2018,,122,Drama
3,tt0088751,movie,The Naked Monster,The Naked Monster,0,2005,,100,"Comedy,Horror,Sci-Fi"
4,tt0096056,movie,Crime and Punishment,Crime and Punishment,0,2002,,126,Drama
...,...,...,...,...,...,...,...,...,...
84554,tt9914942,movie,Life Without Sara Amat,La vida sense la Sara Amat,0,2019,,74,Drama
84555,tt9915872,movie,The Last White Witch,My Girlfriend is a Wizard,0,2019,,97,"Comedy,Drama,Fantasy"
84556,tt9916170,movie,The Rehearsal,O Ensaio,0,2019,,51,Drama
84557,tt9916190,movie,Safeguard,Safeguard,0,2020,,95,"Action,Adventure,Thriller"


In [25]:
basics = basics.drop(['originalTitle','isAdult','titleType','genres'], axis=1)
basics

Unnamed: 0,tconst,primaryTitle,startYear,endYear,runtimeMinutes
0,tt0035423,Kate & Leopold,2001,,118
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020,,70
2,tt0069049,The Other Side of the Wind,2018,,122
3,tt0088751,The Naked Monster,2005,,100
4,tt0096056,Crime and Punishment,2002,,126
...,...,...,...,...,...
84554,tt9914942,Life Without Sara Amat,2019,,74
84555,tt9915872,The Last White Witch,2019,,97
84556,tt9916170,The Rehearsal,2019,,51
84557,tt9916190,Safeguard,2020,,95


## Saving the MySQL tables with tconst as the primary key.

In [26]:
results.dtypes

tconst            object
revenue          float64
budget           float64
certification     object
dtype: object

In [27]:
## get max string length
max_str_len = results['tconst'].fillna('').map(len).max()
max_str_len


10

## Schema for DFs

In [30]:
import pymysql
pymysql.install_as_MySQLdb()

from sqlalchemy import create_engine
import pandas as pd

# Create connection string using credentials following this format
# connection = "dialect+driver://username:password@host:port/database"
username = "root"
password = "root" # (or whatever password you chose during mysql installation)
db_name = "movies"
connection = f"mysql+pymysql://{username}:{password}@localhost/{db_name}"

engine = create_engine(connection)

engine



Engine(mysql+pymysql://root:***@localhost/movies)

### Basics Table

In [31]:
## Example
from sqlalchemy.types import *
## Calculate max string lengths for object columns
key_len = basics['tconst'].fillna('').map(len).max()
title_len = basics['primaryTitle'].fillna('').map(len).max()
## Create a schema dictonary using Sqlalchemy datatype objects
basics_schema = {
    "tconst": String(key_len+1), 
    "primaryTitle": Text(title_len+1),
    'startYear':Float(),
    'endYear':Float(),
    'runtimeMinutes':Integer()}



In [33]:
# Save to sql with dtype and index=False
basics.to_sql('title_basics',engine,dtype=basics_schema,if_exists='replace',index=False)

84559

In [34]:
engine.execute('ALTER TABLE title_basics ADD PRIMARY KEY (`tconst`);')

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x1d8ba61bd00>

In [42]:
title_basics = """SELECT * FROM title_basics
LIMIT 5;"""
pd.read_sql(title_basics, engine)

Unnamed: 0,tconst,primaryTitle,startYear,endYear,runtimeMinutes
0,tt0035423,Kate & Leopold,2001.0,,118
1,tt0062336,The Tango of the Widower and Its Distorting Mi...,2020.0,,70
2,tt0069049,The Other Side of the Wind,2018.0,,122
3,tt0088751,The Naked Monster,2005.0,,100
4,tt0096056,Crime and Punishment,2002.0,,126


### results Table

In [61]:
results.dtypes

tconst            object
revenue          float64
budget           float64
certification     object
dtype: object

In [62]:
## Example
from sqlalchemy.types import *
## Calculate max string lengths for object columns
key_len = results['tconst'].fillna('').map(len).max()
title_len = results['certification'].fillna('').map(len).max()
## Create a schema dictonary using Sqlalchemy datatype objects
results_schema = {
    "tconst": String(key_len+1), 
    'revenue':Float(),
    'budget':Float(),
    "certification": Text(title_len+1)}

In [63]:
# Save to sql with dtype and index=False
results.to_sql('tmdb_data',engine,dtype=results_schema,if_exists='replace',index=False)

2507

In [64]:
title_results = """SELECT * FROM tmdb_data
LIMIT 5;"""
pd.read_sql(title_results, engine)




Unnamed: 0,tconst,revenue,budget,certification
0,0,,,
1,tt0035423,76019000.0,48000000.0,PG-13
2,tt0114447,0.0,0.0,
3,tt0118589,5271670.0,22000000.0,PG-13
4,tt0118652,0.0,1000000.0,R


### Ratings Table

In [66]:
ratings.dtypes

tconst            object
averageRating    float64
numVotes           int64
dtype: object

In [67]:
## Calculate max string lengths for object columns
key_len = ratings['tconst'].fillna('').map(len).max()
## Create a schema dictonary using Sqlalchemy datatype objects
ratings_schema = {
    "tconst": String(key_len+1), 
    'averageRating':Float(),
    'numVotes':Float(),}

In [68]:
# Save to sql with dtype and index=False
ratings.to_sql('title_ratings',engine,dtype=ratings_schema,if_exists='replace',index=False)

479655

In [69]:
ratings_results = """SELECT * FROM title_ratings
LIMIT 5;"""
pd.read_sql(ratings_results, engine)

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,1930.0
1,tt0000002,5.8,261.0
2,tt0000005,6.2,2560.0
3,tt0000006,5.1,176.0
4,tt0000007,5.4,798.0


### title_genres

In [51]:
title_genres.dtypes

tconst      object
genre_id     int64
dtype: object

In [52]:
## Calculate max string lengths for object columns
key_len = title_genres['tconst'].fillna('').map(len).max()
## Create a schema dictonary using Sqlalchemy datatype objects
title_genres_schema = {
    "tconst": String(key_len+1), 
    'genre_id':Float(),}

In [53]:
title_genres.to_sql('title_genres',engine,dtype=title_genres_schema,if_exists='replace',index=False)

157979

In [54]:
titlegenres_results = """SELECT * FROM title_genres
LIMIT 5;"""
pd.read_sql(titlegenres_results, engine)

Unnamed: 0,tconst,genre_id
0,tt0035423,5.0
1,tt0035423,9.0
2,tt0035423,18.0
3,tt0062336,7.0
4,tt0069049,7.0


### genres

In [50]:
genre_lookup.dtypes

Genre_name    object
Genre_ID       int64
dtype: object

In [55]:
## Calculate max string lengths for object columns
key_len = genre_lookup['Genre_name'].fillna('').map(len).max()
## Create a schema dictonary using Sqlalchemy datatype objects
genre_lookup_schema = {
    "Genre_name": String(key_len+1), 
    'genre_id':Float(),}

In [57]:
genre_lookup.to_sql('genres',engine,dtype=genre_lookup_schema,if_exists='replace',index=False)

26

In [59]:
genre_lookup_results = """SELECT * FROM genres
LIMIT 5;"""
pd.read_sql(genre_lookup_results, engine)

Unnamed: 0,Genre_name,Genre_ID
0,Action,0
1,Adult,1
2,Adventure,2
3,Animation,3
4,Biography,4


## Showing Tables

In [70]:
q = """SHOW TABLES;"""
pd.read_sql(q, engine)


Unnamed: 0,Tables_in_movies
0,genres
1,title_basics
2,title_genres
3,title_ratings
4,tmdb_data
