# ANT404 Lab #1 - Query Redshift Audit Logs

In this lab you will use Redshift Spectrum to query Redshift Audit logs. A bucket of example logs has been shared with your test account. 


### Background on Redshift Audit Logs
Audit logging is not enabled by default in Amazon Redshift. When you enable logging on your cluster, Amazon Redshift creates and uploads logs to Amazon S3 that capture data from the creation of the cluster to the present time. Each logging update is a continuation of the information that was already logged. 

Once enabled, Amazon Redshift logs information in the following log files:
* **Connection log** : authentication attempts, and connections and disconnections.
* **User log** : information about changes to database user definitions.  [Not used in this lab]
* **User activity log** : logs each query before it is run on the database.

Redshift Audit logs use multiple formats:
* **Connection log** files use a pipe delimited flat file format. 
* **User activity log** files do not have explicit delimiters, you will use a regular expression to define the fields. 



## 1. Check for credentials file
Check for the credentials created in the `START_HERE` notebook.

In [None]:
%%bash
cat ant404-lab.creds

## 2. Set local variables from credentials file
Run this `cell` to import the credentials created in `START_HERE` notebook into this notebook. Later cells rely on these variables.

In [None]:
import simplejson
with open("ant404-lab.creds") as fh:
    creds = simplejson.loads(fh.read())
username=creds["user_name"]
password=creds["password"]
host_name=creds["host_name"]
port_num=creds["port_num"]
db_name=creds["db_name"]

# Example Account, Region, and Cluster values for this lab
log_account=123456789101
region="us-east-1"
cluster_name="reporting-cluster"

# Default date values used to get sample files
audit_year=2019
audit_month=11
audit_day=10 

## 3. Set `env` shell variables with Audit log location elements
Run the `cell` to set these variables in the local shell. Do not quote the `%set_env` variable strings.

In [None]:
%set_env username={username}
%set_env log_account={log_account}

# Default date value used to get sample files
%set_env audit_date={audit_year}-{audit_month}-{audit_day}
%set_env audit_date_path=year={audit_year}/month={audit_month}/day={audit_day}

# S3 bucket for logs 
%set_env log_bucket=redshift-managed-spectrum-datasets-{region}

# S3 prefix path for logs
%set_env audit_prefix=dataset=auditlog/region={region}

# Log file name excluding date
%set_env audit_file={log_account}_redshift_{region}_{cluster_name}_

## 4. List the Audit Logs of each type
Run the following cells to count the Audit Logs of each type and list the first.

Notice that all Audit logs are in one S3 prefix, only the file name tells us the log type.

In [None]:
%%bash 
audit_log_full_prefix=$log_bucket/$audit_prefix/$audit_date_path/$audit_file

## Count and List the logs with the AWS CLI
echo "ConnectionLog files: "
echo "s3://$audit_log_full_prefix"
echo "-----------------------------"
aws s3 ls s3://$audit_log_full_prefix"connectionlog_" --summarize

echo ""; echo "UserActivityLog files: "
echo "s3://$audit_log_full_prefix"
echo "-----------------------------"
aws s3 ls s3://$audit_log_full_prefix"useractivitylog_" --summarize

## 5. Download and preview a `ConnectionLog` file

In [None]:
%%bash
file="${audit_file}connectionlog_${audit_date}T00:27"
# Download with the AWS CLI
aws s3 cp s3://$log_bucket/$audit_prefix/$audit_date_path/${file}.gz ${file}.gz
gzip -df ${file}.gz  # Unzip
head -15 ${file}     # Print 15 lines

## 6. Create manifests to refer to the separate Audit log file types
The `LOCATION` parameter for a Redshift Spectrum external table must refer to a prefix "folder" ending with a slash `/`. When different file types are in the same prefix you must use a manifest file to tell Redshift which files are part of the table.

The 3 types of Audit Logs are written to the same prefix. Here you will use the AWS CLI to create 1 manifest per day for each table.

In [None]:
%%bash
# Make a folder for the manifests
if [ ! -d manifests ]; then mkdir -p manifests; fi
# This function does the work
# > List the files in each prefix
# > Convert them into the manifest JSON 
# > Write them to the local folder
create_manifest(){ 
  aws s3api list-objects-v2 --bucket $1 --prefix ${2}/${3}/${4}${5} \
      --output json --query 'Contents[].{Key: Key, Size: Size}' | \
    jq '{entries: [ .[] | ({url: ("s3://'${1}'/" + .Key), meta: {content_length: .Size}})]}' \
    > manifests/${4}${5}_${6}.manifest
}
# Register the function
export -f create_manifest 
declare -a log_types=("connectionlog" "useractivitylog")
for type in "${log_types[@]}"; do
# Run this loop for all the days in the data
    for day in $(seq -w 1 22); do
        manifest_date="2019_11_$day"
        audit_date_path="year=2019/month=11/day=$day"
# Echo the arguments for each day
        echo $log_bucket $audit_prefix $audit_date_path $audit_file $type $manifest_date
    done
# Xargs runs the function in parallel to speed things up
done | xargs -n 6 -P 11 -I {} bash -c 'create_manifest $@' _ {}
ls manifests/

## 7.  Upload the manifests to an S3 bucket
A bucket has already been created in your lab account for this purpose

In [None]:
%%bash
# Upload all manifests with the AWS CLI
aws s3 cp ./manifests/ s3://ant404-lab-86feeb76/manifests/ --recursive

## 8. Connect to your Redshift cluster

You will use the `sqlalchemy` and `ipython-sql` Python libraries to manage the Redshift connection. 

This cell creates a `%sql` element so we can use the connection in other cells in the notebook.

-------
**Note:** _Please ignore the pink error message that says: "UserWarning: The psycopg2 wheel package will be renamed from release 2.8"_   
**Look for** 'Connected: ant404@dev' in the 'Out [ ]' section below the warning.

In [None]:
import sqlalchemy
import psycopg2
import simplejson

%reload_ext sql
%config SqlMagic.displaylimit = 25

connect_to_db = 'postgresql+psycopg2://'+username+':'+password+'@'+host_name+':'+port_num+'/'+db_name
%sql $connect_to_db

## 9. Check for existing external `database`/`schema`/`table`
These tables should be empty for now because you have not created any external data resources yet. 

In [None]:
%sql SELECT * FROM svv_external_databases WHERE databasename = 'logdata';

In [None]:
%sql SELECT * FROM svv_external_schemas WHERE schemaname = 'rawlogs';

In [None]:
%sql SELECT * FROM svv_external_tables WHERE schemaname = 'rawlogs' ORDER BY schemaname, tablename;

## 10. Create a new external `schema` and external `database`
This SQL references the username your defined

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
CREATE EXTERNAL SCHEMA IF NOT EXISTS rawlogs
FROM DATA CATALOG
DATABASE 'logdata'
IAM_ROLE 'arn:aws:iam::080945919444:role/mod-27c4c61fae3b42fe-RedshiftClusterRole-1GBP75PRR61RG'
CREATE EXTERNAL DATABASE IF NOT EXISTS
;
SELECT * FROM svv_external_schemas WHERE schemaname = 'rawlogs';

## 11. Conection Log - Create external table
The Connection Log records authentication attempts, connections, and disconnections. It contains the following fields:
* `event`: Connection or authentication event.
* `recordtime`: Time the event occurred.
* `remotehost`: Name or IP address of remote host.
* `remoteport`: Port number for remote host.
* `pid`: Process ID associated with the statement.
* `dbname`: Database name.
* `username`: User name.
* `authmethod`: Authentication method.
* `duration`: Duration of connection in microseconds.

* `sslversion`: Secure Sockets Layer (SSL) version.
* `sslcipher`: SSL cipher.
* `mtu`: Maximum transmission unit (MTU).
* `sslcompression`: SSL compression type.
* `sslexpansion`: SSL expansion type.
* `iamauthguid`: The IAM authentication ID for the CloudTrail request.
* `application_name`: The initial or updated name of the application for a session.

Use the provided DDL to create an external table for this data set.

In [None]:
%%sql 
/* -- Escape autocommit with */END;/* -- */
DROP TABLE IF EXISTS rawlogs.connectionlog;
CREATE EXTERNAL TABLE rawlogs.connectionlog (
       event            VARCHAR(64)
     , recordtime       VARCHAR(32)
     , remotehost       VARCHAR(64)
     , remoteport       INTEGER
     , pid              INTEGER
     , dbname           VARCHAR(64)
     , username         VARCHAR(64)
     , authmethod       VARCHAR(64)
     , duration         BIGINT
     , sslversion       VARCHAR(32)
     , sslcipher        VARCHAR(32)
     , mtu              INTEGER
     , sslcompression   VARCHAR(16)
     , sslexpansion     VARCHAR(16)
     , iamauthguid      VARCHAR(64)
     , application_name VARCHAR(64)
 )
PARTITIONED BY (
      region VARCHAR(32)
    , log_year INT
    , log_month INT
    , log_day INT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|' 
LOCATION 's3://ant404-lab-86feeb76/manifests/connectionlog/'
;
SELECT * FROM svv_external_tables WHERE schemaname = 'rawlogs' AND tablename = 'connectionlog';

## 12. Conection Log - Add partitions for each day
There is no data associated with a partitioned table until at least one partition is added.

In [None]:
%%sql 
/* -- Escape autocommit with */END;/* -- */
ALTER TABLE rawlogs.connectionlog 
ADD IF NOT EXISTS
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=1 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_01.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=2 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_02.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=3 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_03.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=4 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_04.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=5 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_05.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=6 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_06.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=7 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_07.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=8 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_08.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=9 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_09.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=10) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_10.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=11) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_11.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=12) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_12.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=13) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_13.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=14) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_14.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=15) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_15.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=16) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_16.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=17) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_17.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=18) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_18.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=19) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_19.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=20) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_20.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=21) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_21.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=22) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_connectionlog_2019_11_22.manifest'
;
SELECT * FROM svv_external_partitions WHERE schemaname = 'rawlogs' AND tablename = 'connectionlog' ;

## 13. Conection Log - Verify table works


In [None]:
%%sql
SELECT * 
FROM rawlogs.connectionlog
WHERE log_year = 2019 AND log_month = 11 AND log_day = 9
LIMIT 10;

> _If your query returns no results refer to the **Spectrum Trooubleshooting** section in the START HERE notebook_**

## 14. ConnectionLog - Try more queries
External tables work just like a regular Redshift local table

In [None]:
%%sql
-- This example finds the max length of each column
SELECT MAX(LEN(TRIM(event)))            AS event
     , MAX(LEN(TRIM(recordtime)))       AS recordtime
     , MAX(LEN(TRIM(remotehost)))       AS remotehost
     , MAX(LEN(TRIM(remoteport)))       AS remoteport
     , MAX(LEN(TRIM(pid)))              AS pid
     , MAX(LEN(TRIM(dbname)))           AS dbname
     , MAX(LEN(TRIM(username)))         AS username
     , MAX(LEN(TRIM(authmethod)))       AS authmethod
     , MAX(LEN(TRIM(duration)))         AS duration
     , MAX(LEN(TRIM(sslversion)))       AS sslversion
     , MAX(LEN(TRIM(sslcipher)))        AS sslcipher
     , MAX(LEN(TRIM(mtu)))              AS mtu
     , MAX(LEN(TRIM(sslcompression)))   AS sslcompression
     , MAX(LEN(TRIM(sslexpansion)))     AS sslexpansion
     , MAX(LEN(TRIM(iamauthguid)))      AS iamauthguid
     , MAX(LEN(TRIM(application_name))) AS application_name
     , MAX(LEN(TRIM(region)))           AS region
     , MAX(LEN(TRIM(log_year)))         AS log_year 
     , MAX(LEN(TRIM(log_month)))        AS log_month  
     , MAX(LEN(TRIM(log_day)))          AS log_day
FROM rawlogs.connectionlog 
;

## 15. UserActivity Log - Create external table
The Activity Log records each query before it is run on the database. It contains the following fields:

* `recordtime`: Time the event occurred.
* `db`: Database name.
* `user`: User name.
* `pid`: Process ID associated with the statement.
* `userid`: User ID.
* `xid`: Transaction ID.
* `query`: A prefix of LOG: followed by the text of the query, including newlines.

Use the provided DDL to create an external table for this data set.  
**Note that this DDL uses a regular expression to define the fields.**

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
DROP TABLE IF EXISTS rawlogs.useractivitylog;
CREATE EXTERNAL TABLE rawlogs.useractivitylog (
       quote       CHAR(1)
     , recordtime  VARCHAR(32)
     , db          VARCHAR(64)
     , username    VARCHAR(64)
     , pid         BIGINT
     , userid      INTEGER
     , xid         BIGINT
     , query       VARCHAR(MAX)
 )
PARTITIONED BY (
      region VARCHAR(32)
    , log_year INT
    , log_month INT
    , log_day INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex'='('')(.*) UTC \\[ db=(\\w+ )user=(\\w+) pid=(\\d+) userid=(\\d+) xid=(\\d+) \\]'' LOG: (.*)$')
LOCATION 's3://ant404-lab-86feeb76/manifests/useractivitylog/'
;
SELECT * FROM svv_external_tables WHERE schemaname = 'rawlogs' AND tablename = 'useractivitylog';

## 16. UserActivity Log - Add partitions for each day
There is no data associated with a partitioned table until at least one partition is added.  

In [None]:
%%sql
/* -- Escape autocommit with */END;/* -- */
ALTER TABLE rawlogs.useractivitylog 
ADD IF NOT EXISTS
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=1 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_01.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=2 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_02.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=3 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_03.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=4 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_04.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=5 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_05.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=6 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_06.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=7 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_07.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=8 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_08.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=9 ) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_09.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=10) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_10.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=11) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_11.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=12) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_12.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=13) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_13.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=14) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_14.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=15) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_15.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=16) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_16.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=17) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_17.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=18) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_18.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=19) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_19.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=20) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_20.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=21) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_21.manifest'
    PARTITION (region='us-east-1', log_year=2019, log_month=11, log_day=22) LOCATION 's3://ant404-lab-86feeb76/manifests/123456789101_redshift_us-east-1_reporting-cluster_useractivitylog_2019_11_22.manifest'
;
SELECT * FROM svv_external_partitions WHERE schemaname = 'rawlogs' AND tablename = 'useractivitylog' ;

## 17. UserActivity Log - Verify table works 

In [None]:
%%sql
SELECT recordtime
     , db 
     , username 
     , pid 
     , userid 
     , xid 
     , query 
FROM rawlogs.useractivitylog
WHERE userid > 1 -- # System user is 1
  AND log_year = 2019 
  AND log_month = 11 
ORDER BY username DESC
LIMIT 25;

## 18. UserActivity Log - Try more queries on `useractivitylog`
External tables work just like a regular Redshift local table

In [None]:
%%sql
-- This example finds the max length of each column
SELECT MAX(LEN(TRIM(recordtime))) AS recordtime
     , MAX(LEN(TRIM(db        ))) AS db        
     , MAX(LEN(TRIM(username  ))) AS username  
     , MAX(LEN(TRIM(pid       ))) AS pid       
     , MAX(LEN(TRIM(userid    ))) AS userid    
     , MAX(LEN(TRIM(xid       ))) AS xid       
     , MAX(LEN(TRIM(query     ))) AS query
     , MAX(LEN(TRIM(region    ))) AS region
     , MAX(LEN(TRIM(log_year  ))) AS log_year 
     , MAX(LEN(TRIM(log_month ))) AS log_month  
     , MAX(LEN(TRIM(log_day   ))) AS log_day
FROM rawlogs.useractivitylog 
;

## 19. You can also graph your SQL results

In [None]:
%%sql result << 
SELECT CASE WHEN username = '' THEN '<NONE>' ELSE username END username  
     , COUNT(*) log_entries
FROM rawlogs.useractivitylog 
WHERE recordtime <> ''
GROUP BY 1
ORDER BY 2 DESC
;

In [None]:
%matplotlib inline

In [None]:
result.bar()

In [None]:
result.pie()

### Further Info on Redshift Audit Logs
* Redshfit Documentation: ["Database Audit Logging"](https://docs.aws.amazon.com/redshift/latest/mgmt/db-auditing.html)
* AWS Documentation: [How do I use logs to track activity in my Amazon Redshift database cluster?](https://aws.amazon.com/premiumsupport/knowledge-center/logs-redshift-database-cluster/)
* AWS Blog: [Analyze Database Audit Logs for Security and Compliance Using Amazon Redshift Spectrum](https://aws.amazon.com/blogs/big-data/analyze-database-audit-logs-for-security-and-compliance-using-amazon-redshift-spectrum/)