# ADS-508-01-SP23 Team 8: Final Project

# Setup Database and Athena Tables

Much of the code is modified from `Fregly, C., & Barth, A. (2021). Data science on AWS: Implementing end-to-end, continuous AI and machine learning pipelines. O’Reilly.`

## Install missing dependencies

[PyAthena](https://pypi.org/project/PyAthena/) is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena.

In [1]:
!pip install --disable-pip-version-check -q PyAthena==2.1.0

[0m

## Globally import libraries

In [2]:
import boto3
from botocore.client import ClientError
import sagemaker
import pandas as pd
from pyathena import connect
from IPython.core.display import display, HTML

## Instantiate AWS SageMaker session

In [3]:
session = boto3.session.Session()
region = session.region_name
sagemaker_session = sagemaker.Session()
def_bucket = sagemaker_session.default_bucket()
bucket = 'sagemaker-us-east-ads508-sp23-t8'

s3 = boto3.Session().client(service_name="s3",
                            region_name=region)

role = sagemaker.get_execution_role()
account_id = boto3.client("sts").get_caller_identity().get("Account")

sm = boto3.Session().client(service_name="sagemaker",
                            region_name=region)

In [4]:
setup_s3_bucket_passed = False
ingest_create_athena_db_passed = False
ingest_create_athena_table_tsv_passed = False

In [5]:
print(f"Default bucket: {def_bucket}")
print(f"Public T8 bucket: {bucket}")

Default bucket: sagemaker-us-east-1-657724983756
Public T8 bucket: sagemaker-us-east-ads508-sp23-t8


## Verify S3 Bucket Creation

In [6]:
%%bash

aws s3 ls s3://${bucket}/

2023-03-16 17:05:02 aws-athena-query-results-657724983756-us-east-1
2023-03-02 16:56:48 sagemaker-studio-657724983756-5nh7ydsouq7
2023-03-02 17:25:41 sagemaker-studio-657724983756-7yc8bp8xk0b
2023-03-02 17:01:51 sagemaker-us-east-1-657724983756
2023-03-17 05:19:31 sagemaker-us-east-ads508-sp23-t8


In [7]:
response = None

try:
    response = s3.head_bucket(Bucket=bucket)
    print(response)
    setup_s3_bucket_passed = True
except ClientError as e:
    print(f"[ERROR] Cannot find bucket {bucket} in {response} due to {e}.")

{'ResponseMetadata': {'RequestId': 'G71EYFCPF5WGCS7F', 'HostId': '7BBsoOUK8lghqHaVwacD82OHE12jU+PHjvo+1Y/hcCFpZLjESs71O5PB2ESSZrG/h8W3+5rfQCQ=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '7BBsoOUK8lghqHaVwacD82OHE12jU+PHjvo+1Y/hcCFpZLjESs71O5PB2ESSZrG/h8W3+5rfQCQ=', 'x-amz-request-id': 'G71EYFCPF5WGCS7F', 'date': 'Wed, 22 Mar 2023 00:30:43 GMT', 'x-amz-bucket-region': 'us-east-1', 'x-amz-access-point-alias': 'false', 'content-type': 'application/xml', 'server': 'AmazonS3'}, 'RetryAttempts': 0}}


In [8]:
%store setup_s3_bucket_passed

Stored 'setup_s3_bucket_passed' (bool)


# Set S3 Source Location (Public S3 Bucket)

In [9]:
s3_public_path_tsv = f"s3://{bucket}"

In [10]:
%store s3_public_path_tsv

Stored 's3_public_path_tsv' (str)


# Set S3 Destination Location (Our Private S3 Bucket)

In [11]:
s3_private_path_tsv = f"s3://{def_bucket}/team_8_data"
print(s3_private_path_tsv)

s3://sagemaker-us-east-1-657724983756/team_8_data


In [12]:
%store s3_private_path_tsv

Stored 's3_private_path_tsv' (str)


# Copy Data From the Public S3 Bucket to our Private S3 Bucket in this Account
As the full dataset is pretty large, let's just copy 3 files into our bucket to speed things up later. 

In [13]:
!aws s3 cp --recursive $s3_public_path_tsv/ $s3_private_path_tsv/

copy: s3://sagemaker-us-east-ads508-sp23-t8/athena/staging/009a0269-b5fd-45e7-88fb-114187fbfc6b.txt to s3://sagemaker-us-east-1-657724983756/team_8_data/athena/staging/009a0269-b5fd-45e7-88fb-114187fbfc6b.txt
copy: s3://sagemaker-us-east-ads508-sp23-t8/athena/staging/0067a694-8480-45bf-8435-6bce6a008b1b.txt to s3://sagemaker-us-east-1-657724983756/team_8_data/athena/staging/0067a694-8480-45bf-8435-6bce6a008b1b.txt
copy: s3://sagemaker-us-east-ads508-sp23-t8/athena/staging/00af0af8-9e09-4929-8a8e-df55f6acc4dc.csv.metadata to s3://sagemaker-us-east-1-657724983756/team_8_data/athena/staging/00af0af8-9e09-4929-8a8e-df55f6acc4dc.csv.metadata
copy: s3://sagemaker-us-east-ads508-sp23-t8/athena/staging/01c8b4bf-387b-4dc0-9288-4345378ef7a9.txt to s3://sagemaker-us-east-1-657724983756/team_8_data/athena/staging/01c8b4bf-387b-4dc0-9288-4345378ef7a9.txt
copy: s3://sagemaker-us-east-ads508-sp23-t8/athena/staging/00fc8bcb-7cff-45a6-a8c9-35eb6ea069f0.txt to s3://sagemaker-us-east-1-657724983756/team_

# _Make sure ^^^^ this ^^^^ S3 COPY command above runs succesfully. We will need those datafiles for the rest of this workshop._

# List Files in our Private S3 Bucket in this Account

In [14]:
print(s3_private_path_tsv)

s3://sagemaker-us-east-1-657724983756/team_8_data


In [15]:
!aws s3 ls $s3_private_path_tsv/

                           PRE ABT/
                           PRE athena/
                           PRE columnar/
                           PRE raw_data/


In [16]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://s3.console.aws.amazon.com/s3/buckets/sagemaker-{}-{}/amazon-reviews-pds/?region={}&tab=overview">S3 Bucket</a></b>'.format(
            region, account_id, region
        )
    )
)

## Create Athena Database

In [17]:
database_name = "ads508_t8"

Note: The databases and tables that we create in Athena use a data catalog service to store the metadata of your data. For example, schema information consisting of the column names and data type of each column in a table, together with the table name, is saved as metadata information in a data catalog. 

Athena natively supports the AWS Glue Data Catalog service. When we run `CREATE DATABASE` and `CREATE TABLE` queries in Athena with the AWS Glue Data Catalog as our source, we automatically see the database and table metadata entries being created in the AWS Glue Data Catalog.

In [18]:
# Set S3 staging directory -- this is a temporary directory used for Athena queries
s3_staging_dir = f"s3://{def_bucket}/athena/staging"
print(s3_staging_dir)

s3://sagemaker-us-east-1-657724983756/athena/staging


In [19]:
conn = connect(region_name=region,
               s3_staging_dir=s3_staging_dir)

In [20]:
create_db_stmnt = f"CREATE DATABASE IF NOT EXISTS {database_name}"
print(create_db_stmnt)

CREATE DATABASE IF NOT EXISTS ads508_t8


In [21]:
pd.read_sql(create_db_stmnt,
            conn)

### Verify The Database Has Been Created Succesfully

In [22]:
show_db_stmnt = "SHOW DATABASES"

df_show = pd.read_sql(show_db_stmnt,
                      conn)
df_show.head(17)

Unnamed: 0,database_name
0,ads508_t8
1,default
2,dsoaws


In [23]:
if database_name in df_show.values:
    ingest_create_athena_db_passed = True

In [24]:
%store ingest_create_athena_db_passed

Stored 'ingest_create_athena_db_passed' (bool)


## Define custom function to create tables in existing database

In [25]:
def create_athena_tbl_tsv(conn=None,
                          db=None,
                          tbl_name=None,
                          fields='',
                          s3_path=None,
                          delim=',',
                          ret='',
                          comp='',
                          skip=''):
    # Set Athena parameters

    # SQL statement to execute
    drop_tsv_tbl_stmnt = f"""DROP TABLE IF EXISTS {db}.{tbl_name}"""

    create_tsv_tbl_stmnt = f"""
        CREATE EXTERNAL TABLE IF NOT EXISTS {db}.{tbl_name}({fields})
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY '{delim}'
            LINES
                TERMINATED BY '{ret}\\n'
        LOCATION '{s3_path}'
        TBLPROPERTIES ({comp}{skip})
        """

    print(f'Create table statement:\n{create_tsv_tbl_stmnt}')

    pd.read_sql(drop_tsv_tbl_stmnt,
                conn)

    pd.read_sql(create_tsv_tbl_stmnt,
                conn)
    
    # Verify The Table Has Been Created Succesfully
    show_tsv_tbl_stmnt = f"SHOW TABLES IN {db}"

    df_show = pd.read_sql(show_tsv_tbl_stmnt,
                          conn)
    display(df_show.head(17))

    if tbl_name in df_show.values:
        ingest_create_athena_table_tsv_passed = True

    print(f'\nDataframe contains records: {ingest_create_athena_table_tsv_passed}')

## Create Athena Table from Local TSV File - `2005-2010_Graduation_Outcomes_-_School_Level.tsv`

### Dataset columns

- `demographic`: ,
- `dbn`: ,
- `school_name`: ,
- `cohort`: ,
- `total_cohort`: ,
- `total_grads_n`: ,
- `total_grads_perc_cohort`: ,
- `total_regents_n`: ,
- `total_regents_perc_cohort`: ,
- `total_regents_perc_grads`: ,
- `advanced_regents_n`: ,
- `advanced_regents_perc_cohort`: ,
- `advanced_regents_perc_grads`: ,
- `regents_wo_advanced_n`: ,
- `regents_wo_advanced_perc_cohort`: ,
- `regents_wo_advanced_perc_grads`: ,
- `local_n`: ,
- `local_perc_cohort`: ,
- `local_perc_grads`: ,
- `still_enrolled_n`: ,
- `still_enrolled_perc_cohort`: ,
- `dropped_out_n`: ,
- `dropped_out_perc_cohort`: 

In [26]:
grd_tsv_tbl_name = 'grad_outcomes'
grd_tsv_field_list = """
demographic string,
dbn string,
school_name string,
cohort string,
total_cohort string,
total_grads_n string,
total_grads_perc_cohort string,
total_regents_n string,
total_regents_perc_cohort string,
total_regents_perc_grads string,
advanced_regents_n string,
advanced_regents_perc_cohort string,
advanced_regents_perc_grads string,
regents_wo_advanced_n string,
regents_wo_advanced_perc_cohort string,
regents_wo_advanced_perc_grads string,
local_n string,
local_perc_cohort string,
local_perc_grads string,
still_enrolled_n string,
still_enrolled_perc_cohort string,
dropped_out_n string,
dropped_out_perc_cohort string
"""
grd_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/grad_outcomes"
print(grd_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=grd_tsv_tbl_name,
                      fields=grd_tsv_field_list,
                      s3_path=grd_tsv_s3_raw_data_path,
                      delim='\\t',
                      comp='',
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/grad_outcomes
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.grad_outcomes(
demographic string,
dbn string,
school_name string,
cohort string,
total_cohort string,
total_grads_n string,
total_grads_perc_cohort string,
total_regents_n string,
total_regents_perc_cohort string,
total_regents_perc_grads string,
advanced_regents_n string,
advanced_regents_perc_cohort string,
advanced_regents_perc_grads string,
regents_wo_advanced_n string,
regents_wo_advanced_perc_cohort string,
regents_wo_advanced_perc_grads string,
local_n string,
local_perc_cohort string,
local_perc_grads string,
still_enrolled_n string,
still_enrolled_perc_cohort string,
dropped_out_n string,
dropped_out_perc_cohort string
)
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY '\t'
            LINES
                TERMINATED BY '\n'
        LOCATION 's3://sagemaker-us-east-1-657724983756/raw_data/grad_outcomes'
     

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [27]:
grd_dbn_id01 = "01M448"

grd_select_dbn_stmnt = f"""
    SELECT * FROM {database_name}.{grd_tsv_tbl_name}
    WHERE dbn = '{grd_dbn_id01}' LIMIT 17
    """

print(grd_select_dbn_stmnt)


    SELECT * FROM ads508_t8.grad_outcomes
    WHERE dbn = '01M448' LIMIT 17
    


In [28]:
grd_df01_s01 = pd.read_sql(grd_select_dbn_stmnt,
                           conn)
grd_df01_s01.head(17)

Unnamed: 0,demographic,dbn,school_name,cohort,total_cohort,total_grads_n,total_grads_perc_cohort,total_regents_n,total_regents_perc_cohort,total_regents_perc_grads,...,regents_wo_advanced_n,regents_wo_advanced_perc_cohort,regents_wo_advanced_perc_grads,local_n,local_perc_cohort,local_perc_grads,still_enrolled_n,still_enrolled_perc_cohort,dropped_out_n,dropped_out_perc_cohort


In [29]:
if not grd_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Create Athena Table from Local TSV File - `2014_-_2015_DOE_High_School_Directory.tsv`

### Dataset columns

- `dbn`: ,
- `school_name`: ,
- `borough`: ,
- `building_code`: ,
- `phone_number`: ,
- `fax_number`: ,
- `grade_span_min`: ,
- `grade_span_max`: ,
- `expgrade_span_min`: ,
- `expgrade_span_max`: ,
- `bus`: ,
- `subway`: ,
- `primary_address_line_1`: ,
- `city`: ,
- `state_code`: ,
- `postcode`: ,
- `website`: ,
- `total_students`: ,
- `campus_name`: ,
- `school_type`: ,
- `overview_paragraph`: ,
- `program_highlights`: ,
- `language_classes`: ,
- `advancedplacement_courses`: ,
- `online_ap_courses`: ,
- `online_language_courses`: ,
- `extracurricular_activities`: ,
- `psal_sports_boys`: ,
- `psal_sports_girls`: ,
- `psal_sports_coed`: ,
- `school_sports`: ,
- `partner_cbo`: ,
- `partner_hospital`: ,
- `partner_highered`: ,
- `partner_cultural`: ,
- `partner_nonprofit`: ,
- `partner_corporate`: ,
- `partner_financial`: ,
- `partner_other`: ,
- `addtl_info1`: ,
- `addtl_info2`: ,
- `start_time`: ,
- `end_time`: ,
- `se_services`: ,
- `ell_programs`: ,
- `school_accessibility_description`: ,
- `number_programs`: ,
- `priority01`: ,
- `priority02`: ,
- `priority03`: ,
- `priority04`: ,
- `priority05`: ,
- `priority06`: ,
- `priority07`: ,
- `priority08`: ,
- `priority09`: ,
- `priority10`: ,
- `location_1`: ,
- `community_board`: ,
- `council_district`: ,
- `census_tract`: ,
- `bin`: ,
- `bbl`: ,
- `nta`: 

In [30]:
hsi_tsv_tbl_name = 'hs_info'
hsi_tsv_field_list = """
dbn string,
school_name string,
borough string,
building_code string,
phone_number string,
fax_number string,
grade_span_min string,
grade_span_max string,
expgrade_span_min string,
expgrade_span_max string,
bus string,
subway string,
primary_address_line_1 string,
city string,
state_code string,
postcode string,
website string,
total_students string,
campus_name string,
school_type string,
overview_paragraph string,
program_highlights string,
language_classes string,
advancedplacement_courses string,
online_ap_courses string,
online_language_courses string,
extracurricular_activities string,
psal_sports_boys string,
psal_sports_girls string,
psal_sports_coed string,
school_sports string,
partner_cbo string,
partner_hospital string,
partner_highered string,
partner_cultural string,
partner_nonprofit string,
partner_corporate string,
partner_financial string,
partner_other string,
addtl_info1 string,
addtl_info2 string,
start_time string,
end_time string,
se_services string,
ell_programs string,
school_accessibility_description string,
number_programs string,
priority01 string,
priority02 string,
priority03 string,
priority04 string,
priority05 string,
priority06 string,
priority07 string,
priority08 string,
priority09 string,
priority10 string,
location_1 string,
community_board string,
council_district string,
census_tract string,
bin string,
bbl string,
nta string
"""
hsi_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/hs_dir"
print(hsi_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=hsi_tsv_tbl_name,
                      fields=hsi_tsv_field_list,
                      s3_path=hsi_tsv_s3_raw_data_path,
                      delim='\\t',
                      comp='',
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/hs_dir
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.hs_info(
dbn string,
school_name string,
borough string,
building_code string,
phone_number string,
fax_number string,
grade_span_min string,
grade_span_max string,
expgrade_span_min string,
expgrade_span_max string,
bus string,
subway string,
primary_address_line_1 string,
city string,
state_code string,
postcode string,
website string,
total_students string,
campus_name string,
school_type string,
overview_paragraph string,
program_highlights string,
language_classes string,
advancedplacement_courses string,
online_ap_courses string,
online_language_courses string,
extracurricular_activities string,
psal_sports_boys string,
psal_sports_girls string,
psal_sports_coed string,
school_sports string,
partner_cbo string,
partner_hospital string,
partner_highered string,
partner_cultural string,
partner_nonprofit string,
partner_corporate string,
partner_finan

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [31]:
hsi_dbn_id01 = "01M448"

hsi_select_dbn_stmnt = f"""
    SELECT * FROM {database_name}.{hsi_tsv_tbl_name}
    WHERE dbn = '{hsi_dbn_id01}'
    LIMIT 17
    """

print(hsi_select_dbn_stmnt)


    SELECT * FROM ads508_t8.hs_info
    WHERE dbn = '01M448'
    LIMIT 17
    


In [32]:
hsi_df01_s01 = pd.read_sql(hsi_select_dbn_stmnt,
                           conn)
hsi_df01_s01.head(17)

Unnamed: 0,dbn,school_name,borough,building_code,phone_number,fax_number,grade_span_min,grade_span_max,expgrade_span_min,expgrade_span_max,...,priority08,priority09,priority10,location_1,community_board,council_district,census_tract,bin,bbl,nta


In [33]:
if not hsi_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Create Athena Table from Local CSV File - `nyc_census_tracts.csv`

### Dataset columns

- `censustract`: ,
- `county`: ,
- `borough`: ,
- `totalpop`: ,
- `men`: ,
- `women`: ,
- `hispanic`: ,
- `white`: ,
- `black`: ,
- `native`: ,
- `asian`: ,
- `citizen`: ,
- `income`: ,
- `incomeerr`: ,
- `incomepercap`: ,
- `incomepercaperr`: ,
- `poverty`: ,
- `childpoverty`: ,
- `professional`: ,
- `service`: ,
- `office`: ,
- `construction`: ,
- `production`: ,
- `drive`: ,
- `carpool`: ,
- `transit`: ,
- `walk`: ,
- `othertransp`: ,
- `workathome`: ,
- `meancommute`: ,
- `employed`: ,
- `privatework`: ,
- `publicwork`: ,
- `selfemployed`: ,
- `familywork`: ,
- `unemployment`: 

In [34]:
cen_tsv_tbl_name = 'census'
cen_tsv_field_list = """
censustract string,
county string,
borough string,
totalpop int,
men int,
women int,
hispanic double,
white double,
black double,
native double,
asian double,
citizen int,
income int,
incomeerr int,
incomepercap int,
incomepercaperr int,
poverty double,
childpoverty double,
professional double,
service double,
office double,
construction double,
production double,
drive double,
carpool double,
transit double,
walk double,
othertransp double,
workathome double,
meancommute double,
employed int,
privatework double,
publicwork double,
selfemployed double,
familywork double,
unemployment double
"""
cen_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/census"
print(cen_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=cen_tsv_tbl_name,
                      fields=cen_tsv_field_list,
                      s3_path=cen_tsv_s3_raw_data_path,
                      comp='',
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/census
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.census(
censustract string,
county string,
borough string,
totalpop int,
men int,
women int,
hispanic double,
white double,
black double,
native double,
asian double,
citizen int,
income int,
incomeerr int,
incomepercap int,
incomepercaperr int,
poverty double,
childpoverty double,
professional double,
service double,
office double,
construction double,
production double,
drive double,
carpool double,
transit double,
walk double,
othertransp double,
workathome double,
meancommute double,
employed int,
privatework double,
publicwork double,
selfemployed double,
familywork double,
unemployment double
)
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY ','
            LINES
                TERMINATED BY '\n'
        LOCATION 's3://sagemaker-us-east-1-657724983756/raw_data/census'
        TBLPROPERTIES ('skip.header.line.count'='1'

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [35]:
cen_bourough_id01 = "Bronx"

cen_select_dbn_stmnt = f"""
    SELECT * FROM {database_name}.{cen_tsv_tbl_name}
    WHERE borough = '{cen_bourough_id01}'
    LIMIT 17
    """

print(cen_select_dbn_stmnt)


    SELECT * FROM ads508_t8.census
    WHERE borough = 'Bronx'
    LIMIT 17
    


In [36]:
cen_df01_s01 = pd.read_sql(cen_select_dbn_stmnt,
                           conn)
cen_df01_s01.head(17)

Unnamed: 0,censustract,county,borough,totalpop,men,women,hispanic,white,black,native,...,walk,othertransp,workathome,meancommute,employed,privatework,publicwork,selfemployed,familywork,unemployment


In [37]:
if not cen_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Create Athena Table from Local TSV File - `NYPD_Complaint_Data_Historic (1).csv`

### Dataset columns

- `cmplnt_num`: ,
- `cmplnt_fr_dt`: ,
- `cmplnt_fr_tm`: ,
- `cmplnt_to_dt`: ,
- `cmplnt_to_tm`: ,
- `addr_pct_cd`: ,
- `rpt_dt`: ,
- `ky_cd`: ,
- `ofns_desc`: ,
- `pd_cd`: ,
- `pd_desc`: ,
- `crm_atpt_cptd_cd`: ,
- `law_cat_cd`: ,
- `boro_nm`: ,
- `loc_of_occur_desc`: ,
- `prem_typ_desc`: ,
- `juris_desc`: ,
- `jurisdiction_code`: ,
- `parks_nm`: ,
- `hadevelopt`: ,
- `housing_psa`: ,
- `x_coord_cd`: ,
- `y_coord_cd`: ,
- `susp_age_group`: ,
- `susp_race`: ,
- `susp_sex`: ,
- `transit_district`: ,
- `latitude`: ,
- `longitude`: ,
- `lat_lon`: ,
- `patrol_boro`: ,
- `station_name`: ,
- `vic_age_group`: ,
- `vic_race`: ,
- `vic_sex`: 

In [38]:
cri_tsv_tbl_name = 'crime'
cri_tsv_field_list = """
cmplnt_num string,
cmplnt_fr_dt string,
cmplnt_fr_tm string,
cmplnt_to_dt string,
cmplnt_to_tm string,
addr_pct_cd string,
rpt_dt string,
ky_cd string,
ofns_desc string,
pd_cd string,
pd_desc string,
crm_atpt_cptd_cd string,
law_cat_cd string,
borough string,
loc_of_occur_desc string,
prem_typ_desc string,
juris_desc string,
jurisdiction_code string,
parks_nm string,
hadevelopt string,
housing_psa string,
x_coord_cd string,
y_coord_cd string,
susp_age_group string,
susp_race string,
susp_sex string,
transit_district string,
latitude string,
longitude string,
lat_lon string,
patrol_boro string,
station_name string,
vic_age_group string,
vic_race string,
vic_sex string
"""
cri_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/crime"
print(cri_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=cri_tsv_tbl_name,
                      fields=cri_tsv_field_list,
                      s3_path=cri_tsv_s3_raw_data_path,
                      delim='\\t',
                      comp="'compressionType'='gzip', ",
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/crime
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.crime(
cmplnt_num string,
cmplnt_fr_dt string,
cmplnt_fr_tm string,
cmplnt_to_dt string,
cmplnt_to_tm string,
addr_pct_cd string,
rpt_dt string,
ky_cd string,
ofns_desc string,
pd_cd string,
pd_desc string,
crm_atpt_cptd_cd string,
law_cat_cd string,
borough string,
loc_of_occur_desc string,
prem_typ_desc string,
juris_desc string,
jurisdiction_code string,
parks_nm string,
hadevelopt string,
housing_psa string,
x_coord_cd string,
y_coord_cd string,
susp_age_group string,
susp_race string,
susp_sex string,
transit_district string,
latitude string,
longitude string,
lat_lon string,
patrol_boro string,
station_name string,
vic_age_group string,
vic_race string,
vic_sex string
)
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY '\t'
            LINES
                TERMINATED BY '\n'
        LOCATION 's3://sagemaker-us-east-1-6577

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [39]:
cri_law_cat_cd01 = "misdemeanor"
cri_borough01 = "bronx"

cri_select_dbn_stmnt01 = f"""
    SELECT * FROM {database_name}.{cri_tsv_tbl_name}
    WHERE lower(law_cat_cd) = '{cri_law_cat_cd01}' AND lower(borough) = '{cri_borough01}'
    LIMIT 17
    """

print(cri_select_dbn_stmnt01)


    SELECT * FROM ads508_t8.crime
    WHERE lower(law_cat_cd) = 'misdemeanor' AND lower(borough) = 'bronx'
    LIMIT 17
    


In [40]:
cri_df01_s01 = pd.read_sql(cri_select_dbn_stmnt01,
                           conn)
cri_df01_s01.head(17)

Unnamed: 0,cmplnt_num,cmplnt_fr_dt,cmplnt_fr_tm,cmplnt_to_dt,cmplnt_to_tm,addr_pct_cd,rpt_dt,ky_cd,ofns_desc,pd_cd,...,susp_sex,transit_district,latitude,longitude,lat_lon,patrol_boro,station_name,vic_age_group,vic_race,vic_sex


In [41]:
if not cri_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Create Athena Table from Local TSV File - `Evictions.tsv`

### Dataset columns

- `court_index_number`: ,
- `docket_number`: ,
- `eviction_address`: ,
- `eviction_apartment_number`: ,
- `executed_date`: ,
- `marshal_first_name`: ,
- `marshal_last_name`: ,
- `residential_or_commercial`: ,
- `borough`: ,
- `eviction_postcode`: ,
- `ejectment`: ,
- `eviction_or_legal_possession`: ,
- `latitude`: ,
- `longitude`: ,
- `community_board`: ,
- `council_district`: ,
- `census_tract`: ,
- `bin`: ,
- `bbl`: ,
- `nta`: 

In [42]:
evi_tsv_tbl_name = 'evictions'
evi_tsv_field_list = """
court_index_number string,
docket_number string,
eviction_address string,
eviction_apartment_number string,
executed_date string,
marshal_first_name string,
marshal_last_name string,
residential_or_commercial string,
borough string,
eviction_postcode string,
ejectment string,
eviction_or_legal_possession string,
latitude string,
longitude string,
community_board string,
council_district string,
census_tract string,
bin string,
bbl string,
nta string
"""
evi_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/evictions"
print(evi_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=evi_tsv_tbl_name,
                      fields=evi_tsv_field_list,
                      s3_path=evi_tsv_s3_raw_data_path,
                      delim='\\t',
                      comp='',
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/evictions
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.evictions(
court_index_number string,
docket_number string,
eviction_address string,
eviction_apartment_number string,
executed_date string,
marshal_first_name string,
marshal_last_name string,
residential_or_commercial string,
borough string,
eviction_postcode string,
ejectment string,
eviction_or_legal_possession string,
latitude string,
longitude string,
community_board string,
council_district string,
census_tract string,
bin string,
bbl string,
nta string
)
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY '\t'
            LINES
                TERMINATED BY '\n'
        LOCATION 's3://sagemaker-us-east-1-657724983756/raw_data/evictions'
        TBLPROPERTIES ('skip.header.line.count'='1')
        


Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [43]:
evi_borough01 = "BRONX"

evi_select_dbn_stmnt = f"""
    SELECT * FROM {database_name}.{evi_tsv_tbl_name}
    WHERE borough = '{evi_borough01}'
    LIMIT 17
    """

print(evi_select_dbn_stmnt)


    SELECT * FROM ads508_t8.evictions
    WHERE borough = 'BRONX'
    LIMIT 17
    


In [44]:
evi_df01_s01 = pd.read_sql(evi_select_dbn_stmnt,
                           conn)
evi_df01_s01.head(17)

Unnamed: 0,court_index_number,docket_number,eviction_address,eviction_apartment_number,executed_date,marshal_first_name,marshal_last_name,residential_or_commercial,borough,eviction_postcode,ejectment,eviction_or_legal_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta


In [45]:
if not evi_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Create Athena Table from Local TSV File - `NYC _Jobs.tsv`

### Dataset columns

- `job_id`: ,
- `agency`: ,
- `posting_type`: ,
- `num_of_positions`: ,
- `business_title`: ,
- `civil_service_title`: ,
- `title_classification`: ,
- `title_code_no`: ,
- `level`: ,
- `job_category`: ,
- `fulltime_or_parttime_indicator`: ,
- `career_level`: ,
- `salary_range_from`: ,
- `salary_range_to`: ,
- `salary_frequency`: ,
- `work_location`: ,
- `division_or_work_unit`: ,
- `job_description`: ,
- `minimum_qual_requirements`: ,
- `preferred_skills`: ,
- `additional_information`: ,
- `to_apply`: ,
- `hours_or_shift`: ,
- `work_location_1`: ,
- `recruitment_contact`: ,
- `residency_requirement`: ,
- `posting_date`: ,
- `post_until`: ,
- `posting_updated`: ,
- `process_date`: 

In [46]:
job_tsv_tbl_name = 'jobs'
job_tsv_field_list = """
job_id string,
agency string,
posting_type string,
num_of_positions string,
business_title string,
civil_service_title string,
title_classification string,
title_code_no string,
level string,
job_category string,
fulltime_or_parttime_indicator string,
career_level string,
salary_range_from string,
salary_range_to string,
salary_frequency string,
work_location string,
division_or_work_unit string,
job_description string,
minimum_qual_requirements string,
preferred_skills string,
additional_information string,
to_apply string,
hours_or_shift string,
work_location_1 string,
recruitment_contact string,
residency_requirement string,
posting_date string,
post_until string,
posting_updated string,
process_date string
"""
job_tsv_s3_raw_data_path = f"s3://{def_bucket}/raw_data/jobs"
print(job_tsv_s3_raw_data_path)

create_athena_tbl_tsv(conn=conn,
                      db=database_name,
                      tbl_name=job_tsv_tbl_name,
                      fields=job_tsv_field_list,
                      s3_path=job_tsv_s3_raw_data_path,
                      delim='\\t',
                      comp='',
                      skip="'skip.header.line.count'='1'")

s3://sagemaker-us-east-1-657724983756/raw_data/jobs
Create table statement:

        CREATE EXTERNAL TABLE IF NOT EXISTS ads508_t8.jobs(
job_id string,
agency string,
posting_type string,
num_of_positions string,
business_title string,
civil_service_title string,
title_classification string,
title_code_no string,
level string,
job_category string,
fulltime_or_parttime_indicator string,
career_level string,
salary_range_from string,
salary_range_to string,
salary_frequency string,
work_location string,
division_or_work_unit string,
job_description string,
minimum_qual_requirements string,
preferred_skills string,
additional_information string,
to_apply string,
hours_or_shift string,
work_location_1 string,
recruitment_contact string,
residency_requirement string,
posting_date string,
post_until string,
posting_updated string,
process_date string
)
        ROW FORMAT DELIMITED
            FIELDS
                TERMINATED BY '\t'
            LINES
                TERMINATED BY '\n'
     

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs



Dataframe contains records: True


### Run A Sample Query

In [47]:
job_agency01 = "HOUSING"

job_select_dbn_stmnt = f"""
    SELECT * FROM {database_name}.{job_tsv_tbl_name}
    WHERE agency LIKE '%{job_agency01}%'
    LIMIT 17
    """

print(job_select_dbn_stmnt)


    SELECT * FROM ads508_t8.jobs
    WHERE agency LIKE '%HOUSING%'
    LIMIT 17
    


In [48]:
job_df01_s01 = pd.read_sql(job_select_dbn_stmnt,
                           conn)
job_df01_s01.head(17)

Unnamed: 0,job_id,agency,posting_type,num_of_positions,business_title,civil_service_title,title_classification,title_code_no,level,job_category,...,additional_information,to_apply,hours_or_shift,work_location_1,recruitment_contact,residency_requirement,posting_date,post_until,posting_updated,process_date


In [49]:
if not job_df01_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN REGISTERED WITH ATHENA. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


# Create Parquet Files from TSV Table

As you can see from the query below, we’re also adding a new `year` column to our dataset by converting the `review_date` string to a date format, and then cast the year out of the date. Let’s store the year value as an integer. And let's partition the Parquet data by `Product Category`.

In [50]:
ingest_create_athena_table_parquet_passed = False

In [51]:
%store -r ingest_create_athena_table_tsv_passed

In [52]:
try:
    ingest_create_athena_table_tsv_passed
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN ALL PREVIOUS NOTEBOOKS.  You did not register the TSV Data.")
    print("++++++++++++++++++++++++++++++++++++++++++++++")

In [53]:
print(ingest_create_athena_table_tsv_passed)

True


In [54]:
if not ingest_create_athena_table_tsv_passed:
    print("++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN ALL PREVIOUS NOTEBOOKS.  You did not register the TSV Data.")
    print("++++++++++++++++++++++++++++++++++++++++++++++")
else:
    print("[OK]")

[OK]


In [55]:
# Set S3 path to Parquet data
cri_pqt_s3_data_path = f"s3://{def_bucket}/columnar"

# Execute Statement
_This can take a few minutes.  Please be patient._

In [56]:
cri_pqt_tbl_name = 'crime_pqt'
drop_pqt_tbl_stmnt = f"""DROP TABLE IF EXISTS {database_name}.{cri_pqt_tbl_name}"""

# SQL statement to execute
create_pqt_tble_stmnt = f"""
CREATE TABLE IF NOT EXISTS {database_name}.{cri_pqt_tbl_name}
WITH (
    format = 'PARQUET',
    external_location = '{cri_pqt_s3_data_path}',
    partitioned_by = ARRAY['law_cat_cd', 'borough']
    )
AS
SELECT
    cmplnt_num,
    cmplnt_fr_dt,
    cmplnt_fr_tm,
    cmplnt_to_dt,
    cmplnt_to_tm,
    addr_pct_cd,
    rpt_dt,
    ky_cd,
    ofns_desc,
    pd_cd,
    pd_desc,
    crm_atpt_cptd_cd,
    loc_of_occur_desc,
    prem_typ_desc,
    juris_desc,
    jurisdiction_code,
    parks_nm,
    hadevelopt,
    housing_psa,
    x_coord_cd,
    y_coord_cd,
    susp_age_group,
    susp_race,
    susp_sex,
    transit_district,
    latitude,
    longitude,
    lat_lon,
    patrol_boro,
    station_name,
    vic_age_group,
    vic_race,
    vic_sex,
    law_cat_cd,
    borough
FROM {database_name}.{cri_tsv_tbl_name}
"""

print(f'Create table statement:\n{create_pqt_tble_stmnt}')

pd.read_sql(drop_pqt_tbl_stmnt,
            conn)

pd.read_sql(create_pqt_tble_stmnt,
            conn)

Create table statement:

CREATE TABLE IF NOT EXISTS ads508_t8.crime_pqt
WITH (
    format = 'PARQUET',
    external_location = 's3://sagemaker-us-east-1-657724983756/columnar',
    partitioned_by = ARRAY['law_cat_cd', 'borough']
    )
AS
SELECT
    cmplnt_num,
    cmplnt_fr_dt,
    cmplnt_fr_tm,
    cmplnt_to_dt,
    cmplnt_to_tm,
    addr_pct_cd,
    rpt_dt,
    ky_cd,
    ofns_desc,
    pd_cd,
    pd_desc,
    crm_atpt_cptd_cd,
    loc_of_occur_desc,
    prem_typ_desc,
    juris_desc,
    jurisdiction_code,
    parks_nm,
    hadevelopt,
    housing_psa,
    x_coord_cd,
    y_coord_cd,
    susp_age_group,
    susp_race,
    susp_sex,
    transit_district,
    latitude,
    longitude,
    lat_lon,
    patrol_boro,
    station_name,
    vic_age_group,
    vic_race,
    vic_sex,
    law_cat_cd,
    borough
FROM ads508_t8.crime



Unnamed: 0,rows


# Load partitions by running `MSCK REPAIR TABLE`

As a last step, we need to load the Parquet partitions. To do so, just issue the following SQL command: 

In [57]:
partition_pqt_stmnt = f"MSCK REPAIR TABLE {database_name}.{cri_pqt_tbl_name}"

print(partition_pqt_stmnt)

MSCK REPAIR TABLE ads508_t8.crime_pqt


In [58]:
cri_df02 = pd.read_sql(partition_pqt_stmnt,
                       conn)
cri_df02.head(17)

# Show the Partitions

In [59]:
show_part_stmnt = f"SHOW PARTITIONS {database_name}.{cri_pqt_tbl_name}"

print(show_part_stmnt)

SHOW PARTITIONS ads508_t8.crime_pqt


In [60]:
cri_df02_part = pd.read_sql(show_part_stmnt,
                            conn)
cri_df02_part.head(31)

Unnamed: 0,partition


# Show the Tables

In [61]:
show_tbl_stmnt = f"SHOW TABLES in {database_name}"

In [62]:
df_tables = pd.read_sql(show_tbl_stmnt,
                        conn)
df_tables.head(17)

Unnamed: 0,tab_name
0,abt
1,census
2,census_block
3,crime
4,crime_pqt
5,evictions
6,grad_outcomes
7,hs_info
8,jobs


In [63]:
if cri_pqt_tbl_name in df_tables.values:
    ingest_create_athena_table_parquet_passed = True

In [64]:
%store ingest_create_athena_table_parquet_passed

Stored 'ingest_create_athena_table_parquet_passed' (bool)


# Run Sample Query

In [65]:
cri_select_dbn_stmnt02 = f"""
    SELECT * FROM {database_name}.{cri_pqt_tbl_name}
    WHERE law_cat_cd = '{cri_law_cat_cd01}' AND borough = '{cri_borough01}'
    LIMIT 17
    """

print(cri_select_dbn_stmnt02)


    SELECT * FROM ads508_t8.crime_pqt
    WHERE law_cat_cd = 'misdemeanor' AND borough = 'bronx'
    LIMIT 17
    


In [66]:
cri_df02_s01 = pd.read_sql(cri_select_dbn_stmnt02,
                           conn)
cri_df02_s01.head(17)

Unnamed: 0,cmplnt_num,cmplnt_fr_dt,cmplnt_fr_tm,cmplnt_to_dt,cmplnt_to_tm,addr_pct_cd,rpt_dt,ky_cd,ofns_desc,pd_cd,...,latitude,longitude,lat_lon,patrol_boro,station_name,vic_age_group,vic_race,vic_sex,law_cat_cd,borough


In [67]:
if not cri_df02_s01.empty:
    print("[OK]")
else:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOUR DATA HAS NOT BEEN CONVERTED TO PARQUET. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++")

++++++++++++++++++++++++++++++++++++++++++++++++++++++
[ERROR] YOUR DATA HAS NOT BEEN CONVERTED TO PARQUET. LOOK IN PREVIOUS CELLS TO FIND THE ISSUE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++


## Review the New Athena Table in the Glue Catalog

In [68]:
display(
    HTML(
        f'<b>Review <a target="top" href="https://console.aws.amazon.com/glue/home?region={region}#">AWS Glue Catalog</a></b>'
    )
)

## Store Variables for the Next Notebooks

In [69]:
%store

Stored variables and their in-db values:
balanced_bias_data_jsonlines_s3_uri                   -> 's3://sagemaker-us-east-1-657724983756/bias-detect
balanced_bias_data_s3_uri                             -> 's3://sagemaker-us-east-1-657724983756/bias-detect
bias_data_s3_uri                                      -> 's3://sagemaker-us-east-1-657724983756/bias-detect
ingest_create_athena_db_passed                        -> True
ingest_create_athena_table_parquet_passed             -> True
ingest_create_athena_table_tsv_passed                 -> True
s3_private_path_tsv                                   -> 's3://sagemaker-us-east-1-657724983756/team_8_data
s3_public_path_tsv                                    -> 's3://sagemaker-us-east-ads508-sp23-t8'
setup_dependencies_passed                             -> True
setup_iam_roles_passed                                -> True
setup_instance_check_passed                           -> True
setup_s3_bucket_passed                                -> T

## Release Resources

In [70]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [71]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}

<IPython.core.display.Javascript object>