# Data Warehouse with AWS Redshift - Test Notebook

This notebook lets test the ETL pipeline process using AWS Redshift database, executing python scripts that automatically create the service and load the data into the Data Warehouse

## Previous setup

 - Create an AWS account
 - Using Identity and Access Management (IAM) AWS services create the role with these permisions:
 
   - AmazonRedshiftReadOnlyAccess
   - AmazonRedshiftQueryEditor
   - AmazonRedshiftFullAccess
   - AdministratorAccess
   - AmazonS3ReadOnlyAccess
   - AmazonRedshiftDataFullAccess

 - Create the redshift cluster on AWS services (Verify the database region  be the same of the S3 Bucket)
 - Copy the dabase name, user, password and Endpoit and setup the dwh.cfg 

## Load Libraries

In [1]:
import boto3
import json
import configparser

## Load configuraion parameters from the dwh file

In [2]:
config = configparser.ConfigParser()
config.read_file(open('dwh.cfg'))

#CLUSTER
HOST = config.get('CLUSTER','HOST')
DB_NAME = config.get('CLUSTER','DB_NAME')
DB_USER = config.get('CLUSTER','DB_USER')
DB_PASSWORD = config.get('CLUSTER','DB_PASSWORD')
DB_PORT = config.get('CLUSTER','DB_PORT')

#IAM_ROLE
ARN = config.get("IAM_ROLE","ARN")

#S3
LOG_DATA = config.get("S3","LOG_DATA")
LOG_JSONPATH = config.get("S3","LOG_JSONPATH")
SONG_DATA = config.get("S3","SONG_DATA")

## STEP 1: Connect to the cluster

In [3]:
%load_ext sql

In [4]:
conn_string="postgresql://{}:{}@{}:{}/{}".format(DB_USER, DB_PASSWORD, HOST, DB_PORT, DB_NAME)
%sql $conn_string

'Connected: awsuser@dev'

## STEP 2: Test staging tables loading

In [5]:
# Count number of rows table staging_events
%sql SELECT COUNT(*) FROM staging_events;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
8056


In [6]:
# Query the first row from staging_events table
%sql SELECT * FROM staging_events LIMIT 1;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


artist,auth,firstname,gender,iteminsession,lastname,length,level,location,method,page,registration,sessionid,song,status,ts,useragent,userid
,Logged In,Walter,M,0,Frye,,free,"San Francisco-Oakland-Hayward, CA",GET,Home,1540919166796,38,,200,1541105830796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36""",39


In [7]:
# Number of items in staging_songs table
%sql SELECT COUNT(*) FROM staging_songs;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
14896


In [8]:
# Query the first row from staging_events table
%sql SELECT * FROM staging_events LIMIT 1;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


artist,auth,firstname,gender,iteminsession,lastname,length,level,location,method,page,registration,sessionid,song,status,ts,useragent,userid
,Logged In,Walter,M,0,Frye,,free,"San Francisco-Oakland-Hayward, CA",GET,Home,1540919166796,38,,200,1541105830796,"""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36""",39


## STEP 3: Test DW fact and dimension tables

In [9]:
# Number of rows in dimension users table
%sql SELECT COUNT(*) FROM users;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
104


In [10]:
# Number of rows in dimension songs table
%sql SELECT COUNT(*) FROM songs;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
14896


In [11]:
# Number of rows in dimension artists table
%sql SELECT COUNT(*) FROM artists;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
10025


In [12]:
# Number of rows in dimension time table
%sql SELECT COUNT(*) FROM time;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
6820


In [13]:
# Number of rows fact table songplay
%sql SELECT COUNT(*) FROM songplay;

 * postgresql://awsuser:***@redshift-udacity.cieqlaqdxwnt.us-west-2.redshift.amazonaws.com:5439/dev
1 rows affected.


count
9957


## STEP 4: Clean up your resources

Don't forget delete the resources on the AWS Services