# ANT404 Lab #3: Unload to Parquet

### Understand the choice of data lake export format
The external data you used in Lab #1 was stored in text file format. The external data you used in Lab #2 was stored in JSON format. 

These file formats are widely used, easy to create, and easy to share between different applications and contexts.  However, the structure of these files is not optimal if you will be frequently querying an external tables defined on them. 

Redshift Spectrum must read all of the data from every row in order to answer your query, even if your query only uses one or two columns. Reading this extra data costs you both time and money. Extra time is required for Spectrum to retrieve and discard the unused data. Extra money is spent because you are charged based on the amount of S3 data your queries scan.

In Redshift local tables your data is stored in columnar format, meaning each column of data is stored separately on disk. When you query data in Redshift only the columns referenced by your query are retrieved from disk. This is one of the ways that Redshift is able to deliver exceptionally fast query performance.

Apache Parquet is a columnar data format that is widely used across the big data ecosystem. Parquet files store data in a similar way to Redshift's internal storage. When you define a Redshift Spectrum external table on parquet data Redshift Spectrum is able to retrieve only the columns referenced by your query. This can significantly reduce the cost and execution time for queries on external data.

When you are creating a centralized Data Lake to be shared across multiple applications Parquet is a good choice for data that will be accessed on a regular basis. Redshift is now able to export data directly into Parquet format with the `UNLOAD` command.

## 1. Check for credentials file
Check for the credntials created in the `START_HERE` notebook.

In [1]:
%%bash
cat ant404-lab.creds

{
  "user_name": "ant404",
  "password": "Pp-86feeb76",
  "host_name": "10.0.54.136",
  "port_num": "5439",
  "db_name": "dev"
}


## 2. Set local variables from credentials file
Run this `cell` to import the credentials created in `START_HERE` notebook into this notebook. Later cells rely on these variables.

In [2]:
import simplejson
with open("ant404-lab.creds") as fh:
    creds = simplejson.loads(fh.read())
username=creds["user_name"]
password=creds["password"]
host_name=creds["host_name"]
port_num=creds["port_num"]
db_name=creds["db_name"]

# Example Account, Region, and Cluster values for this lab
log_account=123456789101
region="us-east-1"
cluster_name="reporting-cluster"

# Default date values used to get sample files
audit_year=2019
audit_month=11
audit_day=10

%set_env username={username}

env: username=ant404


## 3. Connect to your Redshift cluster

You will use the `sqlalchemy` and `ipython-sql` Python libraries to manage the Redshift connection. 

This cell creates a `%sql` element so we can use the connection in other cells in the notebook.

-------
_**Note:** Please ignore the pink error message that says: "UserWarning: The psycopg2 wheel package will be renamed from release 2.8"_'**Look for** 'Connected: ant404@dev' in the 'Out [ ]' section below the warning.

In [3]:
import sqlalchemy
import psycopg2
import simplejson

%reload_ext sql
%config SqlMagic.displaylimit = 25

connect_to_db = 'postgresql+psycopg2://'+username+':'+password+'@'+host_name+':'+port_num+'/'+db_name
%sql $connect_to_db

  """)


'Connected: ant404@dev'

## 4. Review your tables in the AWS Glue data catalog
Redshfit Spectrum uses AWS Glue as the external data catalog by default.

In [4]:
%%bash
aws glue get-tables --database-name logdata --query 'TableList[].{DbName:DatabaseName,TableName:Name}'

[
    {
        "DbName": "logdata",
        "TableName": "cloudtrail"
    },
    {
        "DbName": "logdata",
        "TableName": "connectionlog"
    },
    {
        "DbName": "logdata",
        "TableName": "useractivitylog"
    }
]


## 5. List the external tables in Redshift and compare
This queries should now return the tables created in Lab #1 and Lab #2

In [5]:
%%sql
SELECT * 
FROM svv_external_tables    
WHERE schemaname = 'rawlogs'   
AND tablename IN ('cloudtrail','connectionlog','useractivitylog') 
ORDER BY schemaname, tablename 
;

 * postgresql+psycopg2://ant404:***@10.0.54.136:5439/dev
3 rows affected.


schemaname,tablename,location,input_format,output_format,serialization_lib,serde_parameters,compressed,parameters
rawlogs,cloudtrail,s3://redshift-managed-spectrum-datasets-us-east-1/cloudtrail,org.apache.hadoop.mapred.TextInputFormat,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,org.openx.data.jsonserde.JsonSerDe,"{""serialization.format"":""1""}",0,"{""EXTERNAL"":""TRUE"",""compression_type"":""gzip"",""transient_lastDdlTime"":""1575504573""}"
rawlogs,connectionlog,s3://ant404-lab-86feeb76/manifests/connectionlog,org.apache.hadoop.mapred.TextInputFormat,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,"{""field.delim"":""|"",""serialization.format"":""|""}",0,"{""EXTERNAL"":""TRUE"",""transient_lastDdlTime"":""1575503846""}"
rawlogs,useractivitylog,s3://ant404-lab-86feeb76/manifests/useractivitylog,org.apache.hadoop.mapred.TextInputFormat,org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,org.apache.hadoop.hive.serde2.RegexSerDe,"{""input.regex"":""(')(.*) UTC \\[ db=(\\w+ )user=(\\w+) pid=(\\d+) userid=(\\d+) xid=(\\d+) \\]' LOG: (.*)$"",""serialization.format",0,"{""EXTERNAL"":""TRUE"",""transient_lastDdlTime"":""1575504055""}"


## 6. Review the columns in the external tables

In [6]:
%%sql
SELECT * 
FROM svv_external_columns   
WHERE schemaname = 'rawlogs'   
AND tablename IN ('cloudtrail','connectionlog','useractivitylog') 
ORDER BY schemaname, tablename, columnnum 
;

 * postgresql+psycopg2://ant404:***@10.0.54.136:5439/dev
37 rows affected.


schemaname,tablename,columnname,external_type,columnnum,part_key
rawlogs,cloudtrail,records,"array<struct<eventversion:varchar(8),useridentity:struct<type:varchar(16),principalid:varchar(128),arn:varchar(256),accountid:va",1,0
rawlogs,cloudtrail,region,varchar(32),2,1
rawlogs,cloudtrail,log_year,int,3,2
rawlogs,cloudtrail,log_month,int,4,3
rawlogs,cloudtrail,log_day,int,5,4
rawlogs,connectionlog,event,varchar(64),1,0
rawlogs,connectionlog,recordtime,varchar(32),2,0
rawlogs,connectionlog,remotehost,varchar(64),3,0
rawlogs,connectionlog,remoteport,int,4,0
rawlogs,connectionlog,pid,int,5,0


## 7. Compose a view  that "flattens" the `cloudtrail` data

In [7]:
%%sql
/* -- Escape autocommit with */END;/* -- */
CREATE OR REPLACE VIEW public.v_export_cloudtrail AS
SELECT log.region
     , log.log_year
     , log.log_month
     , log.log_day
     , rec.eventversion              AS event_version
     , rec.userIdentity.type         AS user_identity_type
     , rec.userIdentity.principalId  AS user_identity_principalId
     , rec.userIdentity.arn          AS user_identity_arn
     , rec.userIdentity.accountId    AS user_identity_accountId
     , rec.userIdentity.invokedBy    AS user_identity_invokedBy
     , rec.userIdentity.accessKeyId  AS user_identity_accessKeyId
     , rec.userIdentity.userName     AS user_identity_userName
     , rec.userIdentity.sessionContext.attributes.mfaAuthenticated AS session_context_mfa_authenticated
     , rec.userIdentity.sessionContext.attributes.creationDate     AS session_context_creation_date
     , rec.userIdentity.sessionContext.sessionIssuer.type          AS session_issuer_type
     , rec.userIdentity.sessionContext.sessionIssuer.principalId   AS session_issuer_principal_id
     , rec.userIdentity.sessionContext.sessionIssuer.arn           AS session_issuer_arn
     , rec.userIdentity.sessionContext.sessionIssuer.accountId     AS session_issuer_account_id
     , rec.userIdentity.sessionContext.sessionIssuer.userName      AS session_issuer_user_name
     , rec.eventtime                 AS event_time
     , rec.eventsource               AS event_source
     , rec.eventname                 AS event_name
     , rec.awsregion                 AS aws_region
     , rec.sourceipaddress           AS source_ipaddress
     , rec.useragent                 AS user_agent
     , rec.errorcode                 AS error_code
     , rec.errormessage              AS error_message
     , rec.requestparameters.durationseconds  AS request_param_duration_seconds
     , rec.requestparameters.rolearn          AS request_param_role_arn
     , rec.requestparameters.rolesessionname  AS request_param_role_session_name
     , rec.requestparameters.databaseName     AS request_param_database_name
     , rec.requestparameters.tableName        AS request_param_table_name
     , rec.responseelements.assumedRoleUser.arn           AS assumed_role_user_arn
     , rec.responseelements.assumedRoleUser.assumedRoleId AS assumed_role_user_assumed_role_id
     , rec.responseelements."credentials".accessKeyId     AS credentials_access_key_id
     , rec.responseelements."credentials".expiration      AS credentials_expiration
     , rec.responseelements."credentials".sessionToken    AS credentials_session_token
     , rec.additionaleventdata.lakeFormationPrincipal AS lake_formation_principal
     , rec.requestid                 AS request_id
     , rec.eventid                   AS event_id
     , arn.arn                       AS resource_arn
     , arn.accountid                 AS resource_accountid
     , arn.type                      AS resource_type
     , rec.eventtype                 AS event_type
     , rec.apiversion                AS api_version
     , rec.readonly                  AS read_only
     , rec.recipientaccountid        AS recipient_account_id
     , rec.serviceeventdetails       AS service_event_details
     , rec.sharedeventid             AS shared_event_id
     , rec.vpcendpointid             AS vpc_endpoint_id
FROM rawlogs.cloudtrail log 
JOIN log.records rec   ON true     -- # Top level inline "Records" array
LEFT JOIN rec.resources arn ON true -- # "Resources" inline table array
WITH NO SCHEMA BINDING

 * postgresql+psycopg2://ant404:***@10.0.54.136:5439/dev
Done.
Done.


[]

## 8. Confirm that the view works
**Note:** Please be patient. May take 90 secs or more to query the view over the raw JSON version

Not only scan the data but bring it all back so takes some time

In [8]:
%%sql
SELECT * 
FROM  public.v_export_cloudtrail
WHERE log_year = 2019 
  AND log_month = 11 
  AND log_day = 11 
LIMIT 10
;

 * postgresql+psycopg2://ant404:***@10.0.54.136:5439/dev
10 rows affected.


region,log_year,log_month,log_day,event_version,user_identity_type,user_identity_principalid,user_identity_arn,user_identity_accountid,user_identity_invokedby,user_identity_accesskeyid,user_identity_username,session_context_mfa_authenticated,session_context_creation_date,session_issuer_type,session_issuer_principal_id,session_issuer_arn,session_issuer_account_id,session_issuer_user_name,event_time,event_source,event_name,aws_region,source_ipaddress,user_agent,error_code,error_message,request_param_duration_seconds,request_param_role_arn,request_param_role_session_name,request_param_database_name,request_param_table_name,assumed_role_user_arn,assumed_role_user_assumed_role_id,credentials_access_key_id,credentials_expiration,credentials_session_token,lake_formation_principal,request_id,event_id,resource_arn,resource_accountid,resource_type,event_type,api_version,read_only,recipient_account_id,service_event_details,shared_event_id,vpc_endpoint_id
us-east-1,2019,11,11,1.05,AWSService,,,,application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:56:40Z,sts.amazonaws.com,AssumeRole,us-east-1,application-autoscaling.amazonaws.com,application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/DynamoDBAutoscaleRole/AutoScaling-RetrieveCurrentCapacity,AROAJ4CAOIWWKI4XXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVEOXXXXXX,"Nov 11, 2019 12:11:40 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIQDGRoJNnQgCyqam41UQeDX5eDERAP537JOy3H0mT0UcPQIgCoocTtp9LtOk7O4fB/A9abQpq1TWSiI2kmPhZkBZiEkqpwII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDCxx+sJiCt0CYfuSHCr7AXUYL6DcLykmBhL6AqyKEoD0/ybJXYYxkxjqkwY4ts56ICAYrCi63ukcqibOI9KwMzB9jiPnlbwYo6ceA3HbSdV39V7PXhwb4uyQFYp0OiIe8uKRlW29tnZiPxb9CUubM8/MEg4gEVsEMz0+sHhXOhqZyieKXg2m7VpOFImsOFVd76AmyJNypiOnpE3R02lljumNPYUDv0SK7N2yR+qWQ2G+a7vshk1h4cBhSbm6PUiAdLUI7tUMSgSXMbip+9r//cgyvCljNF1chC6bVRDLlrj/9r18uZoTgoc+QHaWEaAjfoVENgXKx88zLi6m18NOwvfLD5pG+6DvmHmAMLjHou4FOqkCD5BCwIL3W2ZxMzRi542yptb0vLXzysAIxkbFH0w6YIRSLwbIXMOjRNvdyreMItUsK0udU3M30hUNbifdrYseF/9s2rgKotVx+9j/YvnuERdgNPOqORUBl6v8LEYNMBLnE5WeDd4shMy4Dj2fUn3EJOL12/2/4o+qVFDenCuWpEStdoA608HKRvuDboBQLeBpvpmFutg3q3sKZiD2yAqX01m3Au6Lv6NJLjLnNtcEj2UQ6y0qW1TkJDeqj47sP+1FqT0iFY1NSLZtUR9Jg2/gz3nZcMrsJckWOxdURcFWUQLvHDYA03p/v56oYKAG9grv8j9vA8cex5vUBaDKBCpsIy5zj8TS3nLXdwv+L/s0Je1cDCKERGzajaFNQnEZfuyQIyVu43L9rGkD,,bd807c69-0415-11ea-b23b-b196aafbf859,2dce96af-0259-434c-9197-f0acd0b9b710,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,55b7e1c3-ce97-4490-9af8-cd0801b3b5d5,
us-east-1,2019,11,11,1.05,AWSService,,,,vpc-flow-logs.amazonaws.com,,,,,,,,,,2019-11-10T23:56:26Z,sts.amazonaws.com,AssumeRole,us-east-1,vpc-flow-logs.amazonaws.com,vpc-flow-logs.amazonaws.com,,,3600.0,arn:aws:iam::123456789101:role/VPCStack-VPCFlowLogsRole-8NX0DQGRDX3C,vpc-flow-logging+123456789101,,,arn:aws:sts::123456789101:assumed-role/VPCStack-VPCFlowLogsRole-8NX0DQGRDX3C/vpc-flow-logging+123456789101,AROAJGJZMNODK6XXXXXXX:vpc-flow-logging+123456789101,ASIA53Z5YAVVCCXXXXXX,"Nov 11, 2019 12:56:26 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIQC+muBCyAH/rhHJvfLyMfigIXeutcEIUyVG6yfmYXoPmAIgAnvoAI4dSWv0nSomnre6jjM+viUN39wVO6b6sB7Ft4kq2gEI4f//////////ARABGgw5NTMwNzUzNjkzMjIiDLvT2nCtjak1dWYGcyquAY598D1TcbbjYA05ZSsWNwNbm6MlqbLDmgp55P1Vv9g7YH5IiEJo2nu2QQxUIsFu42msRg17BPXA8Z4TImhLkEUAuMcqbe9wDBTl62+eigfACIbZpzss7Zu1BBLgLSAxv0vrx9kIERIJ1K34yy9eUycthscF1IDJ4zQ9ed4QElgnIZAFO/E8ge6E0rRenaKew/KQ+YRp6+s8bYgm63bJvx++msL6LeNgvg6k+3q0gzCqx6LuBTrVAR/mvzGbMm+YQW5A//Zz616+eQ7FACMjyhZlsrq/YSpxJa4PrhAx7YdDTIc4GoeuQfux2YwsnQt4Qm4SH3N1p3DrD5BGPsu2t1IjnD6x75ygtqV2Eh5kuTmKL0WMsuJDsGaqrx5xy8CW5D7uw0/Ao0+lAfiYZcQCLoLtsmNJUopi1/Hf+nsTDeL0GB5+waRc5xNm51zsRtHZN5QvuG3vl/a4sOEcp/8aPm//cxU2iw8BHWieo+K8fZoMgOjIdl8SkXh2pecbNkWWFMtxcQLjnH9eYLTb3Q==,,b4bb9ce3-0415-11ea-aaaa-9f1bb2b12734,11f814bb-4447-4855-9fd9-62425a6371c6,arn:aws:iam::123456789101:role/VPCStack-VPCFlowLogsRole-8NX0DQGRDX3C,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,9b58941d-8a99-4fe2-825d-736d6f1cf4ba,
us-east-1,2019,11,11,1.05,AssumedRole,AROAJ3TBAFTZPL2XXXXXX:aws-batch,arn:aws:sts::123456789101:assumed-role/AWSBatchServiceRole/aws-batch,123456789101.0,batch.amazonaws.com,ASIA53Z5YAVVBTXXXXXX,,False,2019-11-10T23:55:23Z,Role,AROAJ3TBAFTZPL2XXXXXX,arn:aws:iam::123456789101:role/service-role/AWSBatchServiceRole,123456789101.0,AWSBatchServiceRole,2019-11-10T23:55:23Z,ecs.amazonaws.com,RunTask,us-east-1,batch.amazonaws.com,batch.amazonaws.com,InvalidParameterException,No Container Instances were found in your cluster.,,,,,,,,,,,,8528d094-d67f-4107-88d8-4a7b03399115,b2fce0b0-e135-4a83-8b67-be946727985f,,,,AwsApiCall,,,123456789101,,,
us-east-1,2019,11,11,1.05,AssumedRole,AROAJAIHYEBTJTIXXXXXX:i-0184714a0a6b2165b,arn:aws:sts::123456789101:assumed-role/DBEngSweepETLAssistant-InstanceRole-1V82VUMCEQZMU/i-0184714a0a6b2165b,123456789101.0,,ASIA53Z5YAVVGVXXXXXX,,False,2019-11-10T19:59:47Z,Role,AROAJAIHYEBTJTIXXXXXX,arn:aws:iam::123456789101:role/ETLAssistantEC2Role/DBEngSweepETLAssistant-InstanceRole-1V82VUMCEQZMU,123456789101.0,DBEngSweepETLAssistant-InstanceRole-1V82VUMCEQZMU,2019-11-10T23:55:45Z,ssm.amazonaws.com,UpdateInstanceInformation,us-east-1,18.232.205.78,aws-sdk-go/1.12.20 (go1.10.3; linux; amd64) amazon-ssm-agent/,,,,,,,,,,,,,,28b4e333-da39-4d5d-bc23-53a887d40308,d64dcaaf-447c-45ea-9567-f8337fdd073f,,,,AwsApiCall,,False,123456789101,,,
us-east-1,2019,11,11,1.05,AWSService,,,,application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:56:08Z,sts.amazonaws.com,AssumeRole,us-east-1,application-autoscaling.amazonaws.com,application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/DynamoDBAutoscaleRole/AutoScaling-RetrieveCurrentCapacity,AROAJ4CAOIWWKI4XXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVH5XXXXXX,"Nov 11, 2019 12:11:08 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIGJo28IhwzR0zyor85DD5ljeBBh2M0jOoQTlIJ58abV4AiEA/uSIQ/nXVGRdwdyi12Je6zguAqOpidIR1WCy22m0B2IqpwII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDIlPBLKMhP6tpMZO0ir7AYzPDu/00iRlbtmzPRX13d6LrYIMMBDQEhm0jtW6/aptLyxULrVlsy9HYMblrEoZKuGO0RYxm7rwOcaSuDecoBEmcoPdn/bZgzQu0EF8QSHcnJQZ+kgc7CtL6mb+cetZv9WOvxZZUczwEW3rSp5l0gFWrqZrbIiyxwq01Yv29CYy65zvv72HtAXtlWefu8+didpWX+dUtnWBQlycCK8m9xRJW7bYWpiFEpeeP/+dIYkOnZXyyRAYFtt1h8bhG1oYYGyCYZGllMZoLR6W2SKfPcJ8Q2wHD8exebQrvbIsbyPX1dEyBEiCtfcOI3YrtVCRzpRhtPniuFrmtp/lMJjHou4FOqkCFHfjN9qojBPFvWi9k4hz+1MtZaL1NnTWPySdXBqjDZvVe0JS7yCZZlFazlmEfQOX1ZUb3ZuCX0EaC+zJbVlcR3xTpaEw5KxU1H6dzzimZU2CMOcAPbtghqYaBNox4hETysw+w3StwxqKv7Wm6cWX7fGITdguegiha8LDEA8KO4wsJn+403ALgo5fzjZK5vgsS9HxvRj07ZNYnjrbts5Lc/GRweEAq2X3VTzV6ackQae2BsxWv6a2CNoGFTxNSXv169dAZiZs/yJsL6M7DANoBYctA37ncGsAeNwFPQOoBz1YtgeVJRuU6BZvnT1NcYPW9Z7HyVtXe6MychlKWlj3gHS7HSSybb3SaECveHb87YtJZ6tppp6LXnuZpoVnxnEsDIkLcCvYnMlF,,aa3ed416-0415-11ea-ba01-d906bc6a041a,82027758-5ed0-4858-9001-f46cf6818079,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,969ef05a-d1ea-4c3e-818f-222d9a57d0d5,
us-east-1,2019,11,11,1.05,AWSService,,,,dynamodb.application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:55:38Z,sts.amazonaws.com,AssumeRole,us-east-1,dynamodb.application-autoscaling.amazonaws.com,dynamodb.application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable/AutoScaling-RetrieveCurrentCapacity,AROAJTJBAEN3ETJXXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVDMXXXXXX,"Nov 11, 2019 12:10:38 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIQDe9BxerUcOnV47X4Q3Fj9VkcZci9wP/Hzpjca6BYyItwIgBBVw7zr80C7Tm5gSUi9wMTezKV8X+mSums4cbRI+T5cqsAII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDCHTA16/kJ4j0GlcDSqEAr3dXZwUz/d4R1DOEilMYrtwgRXJr9pfrQMKhZfEfANBMRenHp5qa6/NAWUXd27SGdrWtsBdch4iUcohCd/5t32Hc2Tv1pwWkRsViptbe+fAa3giixb7MkPWp9Qvu0NDaszKmXO/H6STBpvH6RGn1TiPQoFv6KVapXduG4SSl3fjIMMZL99/3nsmEcYgOlJZwAL0R5pD1Y9Kfo+tIfCtsqnbHMh5b1WcGY2psL5KNdzNg3R7IuSbcY2oo2oFwraLfbb+/h7Wb57cSUFotjxDac29lT2urqMzIXfgW/spj4ZM2KwRqLe4BR+UME5yrkHfBir8e6hcbHP3R6wfNuE0PMBw2yp9MPrGou4FOrICov3InkHeIDzj5PALHSif6pN87fO75xAEjnBsT3mNJjVZaL96sb+oAoQdWGwdenl3xb+hxl0/hrWaOJ9VRIP+pO+0MG/+s3XEl9d9N0cSyQSWR+lMqaeXy8Usha+3T0x++v9++UNPR94c72+QSVod/I4PZuNFHZSiF13I6kP9p0acL5nCz3MagtHAq2dm9d/5hT6SC3XMlXjCIWSDQIF2fcITrFXB8+agS0KMwi0E2zc91sT54qLF6lDBi2DRIEFSCNIXEdswvV/UpbKfTjzI4OsDpmT10n5SwSbHP4/Ekq1LGzn0kROe6rt+1u+b42qVO5pLblEKaly0qReplEV+NFU1FtWkML16lIF7YsGXmC9mIauJb1DK1a7PCMQBwInhUWQZNiOS9Vo3KTTFbSqXcSZe,,98453cad-0415-11ea-a02c-5b651fd2f855,58e2ddff-1e83-4dd8-829e-263f8d752306,arn:aws:iam::123456789101:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,e916e11f-20b1-4726-92ff-f362b820e991,
us-east-1,2019,11,11,1.05,AWSService,,,,application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:55:40Z,sts.amazonaws.com,AssumeRole,us-east-1,application-autoscaling.amazonaws.com,application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/DynamoDBAutoscaleRole/AutoScaling-RetrieveCurrentCapacity,AROAJ4CAOIWWKI4XXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVHRXXXXXX,"Nov 11, 2019 12:10:40 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIC8HM8+zo7NxMLDLIn1Z+sMjl2vsLoxbEvT9MrDYJs0aAiEA+4XrCsqNE5u6qhNgeAh6Wn7d/k1fv/AZpbHM0iwlr0kqpwII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDNQQ/oxnjZqoHZNIaCr7Aeu5Mg54YZ1OT1koHrpnHkDjxRyT0XyMrWzvkVz777qsPiUlG3k+GeqLIOGOSav1PRVAlgzfanlxDgayt7c2M9UImsrAHPstYqLaFrHNYFQdyDvXTajZSTyXp7eVNiW8oDi5aj66cRYputWMrctEz8PJSCwvO+zbj++im7138V4g6Bx3sBfZhyxw3OMtLKGQED2MkfKtr6H3BZjIXldQJDKja5cYqhe16j3iX+6yQ+7C6bPO+NruMOO4A3fUDsxGCY/217rIRIi0YKrWYYDR4A5iH7n1Abvq3j8BiwL3Zu59UaZDOgS/4ig5yLnCYplo4FzNrHfbweYZ2k7RMPzGou4FOqkClKUesL9QtuRGoB7fzJizxoLrI6Dy8P1PEDNUxRkJsgBjOzj3vzKwxLZ6QGRisP6/rke3tTel8wliGwPfCkakWpJSpYqv92uglXTAQq512CK/yGIp5z85288vqxbMsGVNjQNTtXLEJm5fDrrotmfTvFlnoaI0wALJz4TGRcsR88VgjxBC4FKrjxW7j1SEeHRQYNGkQZwuR+J1oAny9YdietqBOLr0shrXYiFyVPX7kbH4iHBxGBq+HzL6OBLggLNwNQ4+uYqV1IxhdICk6AVGBj8IdpuP+6oGGD9DC5PAUb/iVjVp83Ub+KsMLI6Ey+gTkBQmcTxHwwgwyhWNc1nGV9wzK+JXnrJqRVMXSLYU+6BgvwjPbnEgA5wtPkGSCR0pGDQtzTS3mLSa,,99be9698-0415-11ea-83a9-656141379f7e,afe0793a-5356-41e0-a72d-d840418979be,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,04ffc6a6-2472-4201-95df-0eb696478e16,
us-east-1,2019,11,11,1.05,AWSService,,,,application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:55:08Z,sts.amazonaws.com,AssumeRole,us-east-1,application-autoscaling.amazonaws.com,application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/DynamoDBAutoscaleRole/AutoScaling-RetrieveCurrentCapacity,AROAJ4CAOIWWKI4XXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVPHXXXXXX,"Nov 11, 2019 12:10:08 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIQDl5rc+swcAi5hUuvcXsrE2Mvv8i0rdD3cb1tEFpZjOMwIgTvRS6OMfbq+EhjFXB+TtcZaaFLpujOOFawpg5RpvYicqpwII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDF5l/5vD28BHH8l/Oyr7Adv7GT4z1cyXeXWdzjoOU5WocdhdIcEADbN/IRNi6qLcPisHMbdpSR/F7Nirq/oJAKagkVvnNR7aYiS0bARsF/tdw2QM6DSmC3ACl6Y9QLATH/j+auVFuo6oWfr6swT45ry4TCBHh05SEocH6nClPY62Le20UsIbDEmvLpyEOb9DS8+a7AXEIw32a5Hje9ctiHx2bwo35hEO78XjWJjGZnXTNrJ3KLDWCIEtwDboHqMS7pX3YfeXzeKqdzZ2cWzm8ekOpNaN4HmcN7dGttC3tiQuJ0Xz/Fz2QfXTAIMIcxokG/Rlt9vN5A6qW3viGtioF7VEHY4R6tfzaiGVMNzGou4FOqkCu3cksWirHjfk//j6lNDrjQ/InXDhPUH5Aef0R533TlzhutoSXKLsybYrG1W9ULhUOLfL46A1JryaGDjWWOyOjNTuPz9XbjOyLs5jVhrqUnC6sGJ7IKvydCHwULAIYxyWCji001M+SOP4XhJDT82GSSCIWjT9T2h/EdE4/GOPV83UIG9ju9KUo+4pi/6QD5dhgaJomcl2XPwHf1iKa7TBd1ZVRwoJYlhwiPvK0Y8o+ZzFl2RbBCqmU4g/Y1XQr/t+XJ1kHMWTd4jBAPvwFfE7XfvjXk2GDj08VxnwrSWK8okYyAF2y83ByjQ4wZRvL4CW3ku96nVkUq5Q6ZSItrfGyOh1D1/bxVhXQs7BCF116FV3X0RLKMlbSWTC5ICl4wlbaZuvj4/0RNIH,,86815aa2-0415-11ea-a9c6-d180199e4f0c,ffe8356a-316a-4a4d-a4bd-88845962bae5,arn:aws:iam::123456789101:role/service-role/DynamoDBAutoscaleRole,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,f9fd4649-c226-4536-b921-4558e5460632,
us-east-1,2019,11,11,1.05,AWSService,,,,ec2.amazonaws.com,,,,,,,,,,2019-11-10T23:55:28Z,sts.amazonaws.com,AssumeRole,us-east-1,ec2.amazonaws.com,ec2.amazonaws.com,,,,arn:aws:iam::123456789101:role/BastionHostEC2Role/BastionStack-InstanceRole-1PZCPD0Q7AHM5,i-0fbae95f86dbe75d1,,,,,ASIA53Z5YAVVAZXXXXXX,"Nov 11, 2019 6:20:21 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIQC5OQxMxqurFYSA/3Q9cknF4+tojnxHgsl7gXFB+0Up9gIgJPb1JIc16vXo7DbbwgAUbGyGDZofAblqqqZNEAo8pLUq2gII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDImIKYU6f0pc7C/ZPSquAkZAWfNQRsKt1dxvQLQZxNhbOnzTrON/Z1x0AY6lGsqVV4PsSJYch2STV6JwvreIIIfs+a4Iiz4cTIK3RbYS0UyT4S6TkACdcP4/XRnsjiwdmaUyemKCpUyA2kB5F3/iSci0UmG78jAkUwuTV+ZEQfdYoBw2KH6BMw9G3FSWGzjD+tXLqMAYFIscvbdos4IgJmj/St5iNqB3TT8NZMKLj0PZ8R7iBTWEWhFE2XliEzD9yuClge5CGhmVb8q6rrLDGKzPpQsTht3GwE1X/42OuEVE8JtoJ7m1vEV2Gj4PUD1g4BS1ZjIVwwsbH4YmpfXMW9EHlWsy1IPw+U2z5ZDHAXBvQsrTQeGQYFeM1Rg2Ey3toMD/NQGhHakhADS2YIC2oMhTdYz7qiiH6utE2p9YMPDGou4FOs4CbwDb9ejqHO0NBWay8XIbZ9OnDEpYalXZoPGIiR4e0zEtIxKfoH90LW6DKEOCcPFx8Ey/3yKjq2QJosazzbLyG7mnD1gi2f3yr1JYwIkQ2QGOh9legaNUwPZuRheP76bN27XHnK6VPB2rLS4VyrHFzsFYeJFbcvNXK22K1obDy/7aImT2VPAAfW7hgG5/CC31qlVC94n+DS4pCMlrt0avSP2Y4OrXEgeod+SulIj6i7PlZ8arKnnAVW+bKbh8ZCHs7mjq9Qer5HwCb1FZsxUBfs22UkiCDstljQy0Sk4wlu/CavozOLhcXrEs19qaq3e389dqhlcdPyjhDkr0+WgU4d2YwhXaFo17lUY4IFAzZO+LULpunqEy1K2xlG7iTXeVIzOiVJZABf9oUVlEYjjMAGuoJeqQCW2DoryyBwn95Jnppeq5wMmyyeXQ5mibYw==,,5c373bc0-b066-4d58-96db-57aa44e8531b,3b9060d9-52ed-41fd-8632-ef3efe6ef0fd,arn:aws:iam::123456789101:role/BastionHostEC2Role/BastionStack-InstanceRole-1PZCPD0Q7AHM5,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,bee4421c-e2e1-40d1-9abd-af0dde42ce35,
us-east-1,2019,11,11,1.05,AWSService,,,,dynamodb.application-autoscaling.amazonaws.com,,,,,,,,,,2019-11-10T23:54:20Z,sts.amazonaws.com,AssumeRole,us-east-1,dynamodb.application-autoscaling.amazonaws.com,dynamodb.application-autoscaling.amazonaws.com,,,900.0,arn:aws:iam::123456789101:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable,AutoScaling-RetrieveCurrentCapacity,,,arn:aws:sts::123456789101:assumed-role/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable/AutoScaling-RetrieveCurrentCapacity,AROAJTJBAEN3ETJXXXXXX:AutoScaling-RetrieveCurrentCapacity,ASIA53Z5YAVVJYXXXXXX,"Nov 11, 2019 12:09:20 AM",IQoJb3JpZ2luX2VjEMj//////////wEaCXVzLWVhc3QtMSJHMEUCIEcN7Svyjs4Hz300NhihzTyR0iDEjLaHgsb+aA4oKEJBAiEAhg4LBGNxUYDBU9Rw3m+jhoB6x6wGLUeysJaUTu04fl0qsAII4f//////////ARABGgw5NTMwNzUzNjkzMjIiDCcwViSEORN6QUw6pyqEAp3bZpfJCAk2a0RBqnp7tpapAkzUBEdW59VeF+p+5LURREDvO0CWkHvD0UinAPGzLqJ4VRskE6nDS9RFp4a6HaD+L3ofo6xDeHG4e1g/Wvm70+05QYoVPz03l9agKIN855QGFdGObwlp/uSKN22E4AUcjq2O1E+Z8vtFbAmcdPdNAR3HjRvETCigRh80zZKLbXqlVFu6SwXUrc+VoKTlsEbuCP5rkpDKMbG/0bz8HGW/pSHwDfkLapjTiPqbrJaMGSLR9akch/V/orCUGk+zASza2/+wGzxvBsx0O/sWvO6uPsegu6hNSyD7qhECjM33tEDTilXGf7PJH3urAUaiRKuIhEPgMKzGou4FOrIC4rn4DUQL1WJO0GX37c9Gc4QTcLlEWNYEI/Ymxs3QiGXlEaZzkyY6ouWXp5z7vuKICIvUVTsOmUoiNc7pAjCqEhB7P+h3CM0nI3AAWz81omdeBnvFTBWibbb2tmntOYQ54p1H5FNn93/iU84GqmF4Fk1GZeWX3CCFVUs0NJPL8auR3pSHcijQD0t4CiGHffsmrTBnnxiEeYPy7p//NnyjYLhZZd8hsb19Fnwn4LNd9CylzaHUz+y7OXW6BXCpL4MRRuWj/CnXoqdsttwDGdEJfww1P9riQOETzIU8oaUflaAUiosiKuof7s7bbtMdbvhqwcoB+2O/QGIipZboaizBtrdwkrc5i6IBMTV8G2JJmRU0CN42RRvfVOSTui/CUazoM2bwc7MXXd1WfRhTM2OvrOGj,,69af45b8-0415-11ea-853b-f7540ec4e636,9373f077-c268-4081-b5fc-07de71a72811,arn:aws:iam::123456789101:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable,123456789101.0,AWS::IAM::Role,AwsApiCall,,,123456789101,,1f084614-bebd-49ac-9f89-191b56addff1,


## 9. Export tables in Parquet format using Redshift `UNLOAD`

`PARTITION BY` means the output will be partioned by the fields listed. Note that the `UNLOAD` below is partitioned `month` level rather than `day`. Montly partitioning is used so that you get slightly bigger Parquet files for this dataset. Using a large number of tiny Parquet creates overhead that can slow down your queries. 

### 9.1. `cloudtrail`**Note:** Please be patient. It may take a few minutes to unload the raw JSON

In [None]:
%%sql
UNLOAD ('SELECT * FROM public.v_export_cloudtrail WHERE log_year = 2019 AND log_month = 11')
TO 's3://ant404-lab-86feeb76/data_lake/table=cloudtrail/'
IAM_ROLE 'arn:aws:iam::080945919444:role/mod-27c4c61fae3b42fe-RedshiftClusterRole-1GBP75PRR61RG'
FORMAT PARQUET
ALLOWOVERWRITE
PARTITION BY (region, log_year, log_month )
;

 * postgresql+psycopg2://ant404:***@10.0.54.136:5439/dev


### 9.2. `connectionlog`

In [None]:
%%sql
UNLOAD ('SELECT * FROM rawlogs.connectionlog WHERE log_year = 2019 AND log_month = 11')
TO 's3://ant404-lab-86feeb76/data_lake/table=connectionlog/'
IAM_ROLE 'arn:aws:iam::080945919444:role/mod-27c4c61fae3b42fe-RedshiftClusterRole-1GBP75PRR61RG'
FORMAT PARQUET
ALLOWOVERWRITE
PARTITION BY (region, log_year, log_month )
;

### 9.3. `useractivitylog`

In [None]:
%%sql
UNLOAD ('SELECT * FROM rawlogs.useractivitylog WHERE log_year = 2019 AND log_month = 11')
TO 's3://ant404-lab-86feeb76/data_lake/table=useractivitylog/'
IAM_ROLE 'arn:aws:iam::080945919444:role/mod-27c4c61fae3b42fe-RedshiftClusterRole-1GBP75PRR61RG'
FORMAT PARQUET
ALLOWOVERWRITE
PARTITION BY (region, log_year, log_month )
;

## 10. Create a Data Lake schema and database

You are creating a separate external schema and external database so that you can provide access to the data lake tables separately from the raw source tables. For instance you may want to grant only Admins access to the raw data but allow Analysts to query the data lake.

In [None]:
%%sql
CREATE EXTERNAL SCHEMA IF NOT EXISTS datalake
FROM DATA CATALOG
DATABASE 'datalake'
IAM_ROLE 'arn:aws:iam::080945919444:role/mod-27c4c61fae3b42fe-RedshiftClusterRole-1GBP75PRR61RG'
CREATE EXTERNAL DATABASE IF NOT EXISTS
;
SELECT * FROM svv_external_schemas WHERE schemaname = 'datalake';

## 11. Define external tables on the Parquet data
Note that the partitioning on the table is defined monthly to match the exported data

### 11.1. `cloudtrail`

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
DROP TABLE IF EXISTS datalake.cloudtrail;
CREATE EXTERNAL TABLE datalake.cloudtrail (
      event_version                      VARCHAR(8)
    , user_identity_type                 VARCHAR(16)
    , user_identity_principalId          VARCHAR(128)
    , user_identity_arn                  VARCHAR(256)
    , user_identity_accountId            VARCHAR(16)
    , user_identity_invokedBy            VARCHAR(64)
    , user_identity_accessKeyId          VARCHAR(32)
    , user_identity_userName             VARCHAR(32)
    , session_context_mfa_authenticated  VARCHAR(8)
    , session_context_creation_date      VARCHAR(32)
    , session_issuer_type                VARCHAR(8)
    , session_issuer_principal_id        VARCHAR(32)
    , session_issuer_arn                 VARCHAR(256)
    , session_issuer_account_id          VARCHAR(16)
    , session_issuer_user_name           VARCHAR(64)
    , event_time                         VARCHAR(32)
    , event_source                       VARCHAR(64)
    , event_name                         VARCHAR(64)
    , aws_region                         VARCHAR(16)
    , source_ipaddress                   VARCHAR(64)
    , user_agent                         VARCHAR(256)
    , error_code                         VARCHAR(64)
    , error_message                      VARCHAR(512)
    , request_param_duration_seconds     INTEGER
    , request_param_role_arn             VARCHAR(256)
    , request_param_role_session_name    VARCHAR(64)
    , request_param_database_name        VARCHAR(16)
    , request_param_table_name           VARCHAR(64)
    , assumed_role_user_arn              VARCHAR(128)
    , assumed_role_user_assumed_role_id  VARCHAR(64)
    , credentials_access_key_id          VARCHAR(32)
    , credentials_expiration             VARCHAR(32)
    , credentials_session_token          VARCHAR(2048)
    , lake_formation_principal           VARCHAR(128)
    , request_id                         VARCHAR(64)
    , event_id                           VARCHAR(64)
    , resource_arn                       VARCHAR(256)
    , resource_accountid                 VARCHAR(16)
    , resource_type                      VARCHAR(32)
    , event_type                         VARCHAR(32)
    , api_version                        VARCHAR(16)
    , read_only                          VARCHAR(8)
    , recipient_account_id               VARCHAR(16)
    , service_event_details              VARCHAR(1024)
    , shared_event_id                    VARCHAR(64)
    , vpc_endpoint_id                    VARCHAR(16)
    , log_day                            INTEGER
) 
PARTITIONED BY (
      region VARCHAR(32)
    , log_year INT
    , log_month INT
    -- , log_day INT
)
STORED AS PARQUET
LOCATION 's3://ant404-lab-86feeb76/data_lake/table=cloudtrail/'
;

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
ALTER TABLE datalake.cloudtrail
ADD IF NOT EXISTS
    PARTITION (region='us-east-1', log_year=2019, log_month=11) LOCATION 's3://ant404-lab-86feeb76/data_lake/table=cloudtrail/region=us-east-1/log_year=2019/log_month=11/'
;

### 11.2. `connectionlog`

In [None]:
%%sql 
/* -- Escape autocommit with */END;/* -- */
DROP TABLE IF EXISTS datalake.connectionlog;
CREATE EXTERNAL TABLE datalake.connectionlog (
       event            VARCHAR(64)
     , recordtime       VARCHAR(32)
     , remotehost       VARCHAR(64)
     , remoteport       INTEGER
     , pid              INTEGER
     , dbname           VARCHAR(64)
     , username         VARCHAR(64)
     , authmethod       VARCHAR(64)
     , duration         BIGINT
     , sslversion       VARCHAR(32)
     , sslcipher        VARCHAR(32)
     , mtu              INTEGER
     , sslcompression   VARCHAR(16)
     , sslexpansion     VARCHAR(16)
     , iamauthguid      VARCHAR(64)
     , application_name VARCHAR(64)
     , log_day          INTEGER
 )
PARTITIONED BY (
      region VARCHAR(32)
    , log_year INT
    , log_month INT
    -- , log_day INT
)
STORED AS PARQUET
LOCATION 's3://ant404-lab-86feeb76/data_lake/table=connectionlog/'
;

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
ALTER TABLE datalake.connectionlog
ADD IF NOT EXISTS
    PARTITION (region='us-east-1', log_year=2019, log_month=11) LOCATION 's3://ant404-lab-86feeb76/data_lake/table=connectionlog/region=us-east-1/log_year=2019/log_month=11/'
;

### 11.3. `useractivitylog`

In [None]:
%%sql 
/* -- Escape autocommit with */END;/* -- */
DROP TABLE IF EXISTS datalake.useractivitylog;
CREATE EXTERNAL TABLE datalake.useractivitylog (
       recordtime  VARCHAR(32)
     , db          VARCHAR(64)
     , username    VARCHAR(64)
     , pid         BIGINT
     , userid      INTEGER
     , xid         BIGINT
     , query       VARCHAR(MAX)
     , log_day     INTEGER
 )
PARTITIONED BY (
      region VARCHAR(32)
    , log_year INT
    , log_month INT
    -- , log_day INT
)
STORED AS PARQUET
LOCATION 's3://ant404-lab-86feeb76/data_lake/table=useractivitylog/'
;

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
ALTER TABLE datalake.connectionlog
ADD IF NOT EXISTS
    PARTITION (region='us-east-1', log_year=2019, log_month=11) LOCATION 's3://ant404-lab-86feeb76/data_lake/table=useractivitylog/region=us-east-1/log_year=2019/log_month=11/'
;

## 12. Compare performance of original tables to new Data Lake extracts

###  Test the exported Data Lake table
The query takes a few seconds on the data lake version

In [None]:
%%sql
SELECT user_identity_principalid, COUNT(*)
FROM  datalake.cloudtrail
WHERE log_year = 2019 
  AND log_month = 11 
  AND log_day = 11 
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
;

In [None]:
%%sql
-- # Get run time of previous query
SELECT 'Parquet format table' tbl_type
     , ROUND(DATEDIFF(ms, q.starttime, q.endtime)::NUMERIC/1000,2) secs
FROM stl_query q
WHERE query = pg_last_query_id()
;

### Test the original JSON data table
The query takes up to 4 minutes on the raw JSON version 

In [None]:
%%sql
SELECT user_identity_principalid, COUNT(*)
FROM  public.v_export_cloudtrail
WHERE log_year = 2019 
  AND log_month = 11 
  AND log_day = 11 
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
;

In [None]:
%%sql
-- # Get run time of previous query
SELECT 'Raw data table' tbl_type
     , ROUND(DATEDIFF(ms, q.starttime, q.endtime)::NUMERIC/1000,2) secs
FROM stl_query q
WHERE query = pg_last_query_id()
;

### Further Info on Redshift Data Lake extracts
* Redshfit Documentation: 
* AWS Documentation: 
* AWS Blog: 