# Procedure Notebook
This a notebook to explore and create processes before implementing in Python code. This is helpful for debugging and trying improve process before adjusting the production code.

## Overall Process Steps
1. Create fact and dimension tables using a star schema for Redshift
2. Load data into S3
3. Load data from S3 into Redshift staging tables
4. ETL data from Redshift staging tables to analytic tables
5. Load data from Redshift into dashboard for analytics team

In [3]:
!pip install jupyter_nbextensions_configurator
!jupyter nbextensions_configurator enable --user

Collecting jupyter_nbextensions_configurator
[?25l  Downloading https://files.pythonhosted.org/packages/51/a3/d72d5f2dc10c5ccf5a6f4c79f636bf071a5ce462dedd07af2f70384db6cb/jupyter_nbextensions_configurator-0.4.1.tar.gz (479kB)
[K    100% |████████████████████████████████| 481kB 12.9MB/s ta 0:00:01
[?25hCollecting jupyter_contrib_core>=0.3.3 (from jupyter_nbextensions_configurator)
  Downloading https://files.pythonhosted.org/packages/e6/8f/04a752a8b66a66e7092c035e5d87d2502ac7ec07f9fb6059059b6c0dc272/jupyter_contrib_core-0.3.3-py2.py3-none-any.whl
Building wheels for collected packages: jupyter-nbextensions-configurator
  Running setup.py bdist_wheel for jupyter-nbextensions-configurator ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/15/df/fe/2a74fe34709e7fdc5ae153a768675d9fda93cc7d5133ed1fb0
Successfully built jupyter-nbextensions-configurator
Installing collected packages: jupyter-contrib-core, jupyter-nbextensions-configurator
Successfully installed jupyter-con

In [None]:
#!pip install jupyter_contrib_nbextensions
#!jupyter contrib nbextension install — user
#!pip install jupyter_nbextensions_configurator
#!jupyter nbextensions_configurator enable --user

In [None]:
import os
import boto3

### 1. Create fact and dimension tables using a star schema for Redshift

#### 1. Setup AWS Manual Way
Intially setup your IAM roles, security groups, users, etc before doing it in a programmatic way.


##### Create Amazon IAM role
+ [Create an IAM role](https://console.aws.amazon.com/iam/home#/home)
+ Ensure role has administrator access to redshift, ec2, s3 and other areas.
                     
##### Create Amazon Security Group
+ [Create an Amazon Security Group](https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#SecurityGroups:)
+ Amazon Redshift needs a port range  = 5439

##### Launch a Redshift Cluster
+ [Launch Redshift Cluster](https://console.aws.amazon.com/redshift/)

##### Create IAM User
+ [Create IAM User](https://console.aws.amazon.com/iam/)
+ Ensure user has programmatic access
+ Attach policies for redshift, s3 and any othern necessary policies

##### Create an S3 Bucket
+ [Create an S3 Bucket](https://s3.console.aws.amazon.com/s3/home?region=us-west-2#)

##### Create PostgreSQL RDS
+ [Create a PostgreSQL RDS](https://us-west-2.console.aws.amazon.com/rds/home?region=us-west-2)

#### 2. Programmatic Access Way to Add and Delete
After you have set up the initial AWS structure the manual way, you can create new Identity Access Management (IAM) Users using Python

##### 2a. Create a configuration file called dwh.cfg file and insert parameters into file

In [None]:
# load configuration file

from configparser import ConfigParser

config = ConfigParser()
config.read_file(open('dwh.cfg'))

KEY=config.get('AWS','key')
SECRET= config.get('AWS','secret')

##### 2b. Create IAM Client

In [None]:
from boto3 import client

iam = client('iam',aws_access_key_id=KEY,
                     aws_secret_access_key=SECRET,
                     region_name='us-west-2'
                  )

##### 2c. Create IAM Role for Redshift to have ReadOnly access to S3

In [None]:
#Important for Role to have high access like Administrator access

from json import dumps
from botocore.exceptions import ClientError

DB_ROLE_NAME = config.get("CLUSTER", "DB_ROLE_NAME")

try:
    print("Creating new IAM Role")
    dwhRole = iam.create_role(
        Path='/',
        RoleName = DB_ROLE_NAME,
        Description = "Allows Redshift clusters to call AWS services on your behalf.",
        AssumeRolePolicyDocument=dumps(
            {'Statement':[{'Action': 'sts:AssumeRole',
                          'Effect':'Allow',
                          'Principal': {'Service': 'redshift.amazonaws.com'}}],
                         'Version':'2012-10-17'})
    )
except Exception as e:
    print(e)


##### 2d. Attach Necessary Policies to Role

In [None]:
#Code is attaching the AmazonS3ReadOnlyAccess to Role

print('Attaching Policy')
iam.attach_role_policy(RoleName=DB_ROLE_NAME,
                       PolicyArn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess")['ResponseMetadata']['HTTPStatusCode']

##### 2e. Get Amazon Resource Names (ARN) 

In [None]:
#The roleArn variable is used when creating a Redshift 
roleArn = iam.get_role(RoleName=DB_ROLE_NAME)['Role']['Arn']
print(roleArn)

##### 2f. Create Amazon Redshift Cluster

In [None]:
# Create Redshift client
from boto3 import client

redshift = client('redshift',
                       region_name="us-west-2",
                       aws_access_key_id=KEY,
                       aws_secret_access_key=SECRET)

In [None]:
# Load data variables to create database from configuration file

DB_CLUSTER_TYPE       = config.get("CLUSTER","DB_CLUSTER_TYPE")
DB_NUM_NODES          = config.get("CLUSTER","DB_NUM_NODES")
DB_NODE_TYPE          = config.get("CLUSTER","DB_NODE_TYPE")

DB_HOST               = config.get("CLUSTER","HOST")
DB_NAME               = config.get("CLUSTER","DB_NAME")
DB_USER               = config.get("CLUSTER","DB_USER")
DB_PASSWORD           = config.get("CLUSTER","DB_PASSWORD")
DB_PORT               = config.get("CLUSTER","DB_PORT")

In [None]:
# Create Redshift Database

try:
    response = redshift.create_cluster(        
        # Parameters for hardware
        ClusterType=DB_CLUSTER_TYPE,
        NodeType=DB_NODE_TYPE,
        NumberOfNodes=int(DB_NUM_NODES),
        
        # Parameters for identifiers & credentials
        DBName=DB_NAME,
        ClusterIdentifier=DB_HOST,
        MasterUsername=DB_USER,
        MasterUserPassword=DB_PASSWORD,
        
        # Parameter for role s3 access
        IamRoles=[roleArn]
    )
except Exception as e:
    print(e)

In [None]:
# Check on progess of creation and Redshift database type
import pandas as pd

def prettyRedshiftProps(props):
    pd.set_option('display.max_colwidth', -1)
    keysToShow = ["ClusterIdentifier", "NodeType", "ClusterStatus", "MasterUsername", "DBName", "Endpoint", "NumberOfNodes", 'VpcId']
    x = [(k, v) for k,v in props.items() if k in keysToShow]
    return pd.DataFrame(data=x, columns=["Key", "Value"])

myClusterProps = redshift.describe_clusters(ClusterIdentifier=DB_HOST)['Clusters'][0]
prettyRedshiftProps(myClusterProps)

In [None]:
import time
starttime=time.time()
while (redshift.describe_clusters(ClusterIdentifier=DB_HOST)['Clusters'][0] == 'creating') == True:
    print("Creating Redshift")
    time.sleep(60.0 - ((time.time() - starttime) % 60.0))


In [None]:
DB_ENDPOINT = myClusterProps['Endpoint']['Address']
DB_ROLE_ARN = myClusterProps['IamRoles'][0]['IamRoleArn']

##### 2g. Save DB Endpoint and DB ROLE ARN 

In [None]:
DB_ENDPOINT = myClusterProps['Endpoint']['Address']
DB_ROLE_ARN = myClusterProps['IamRoles'][0]['IamRoleArn']

print("DWH_ENDPOINT :: ", DB_ENDPOINT)
print("DWH_ROLE_ARN :: ", DB_ROLE_ARN)


##### 2h. Open incoming TCP Port to Access Endpoint if Not Done Already

In [None]:
#Create EC2 Resource
from boto3 import resource

ec2 = resource('ec2',aws_access_key_id=KEY,
                     aws_secret_access_key=SECRET,
                     region_name='us-west-2'
                  )

In [None]:
try:
    vpc = ec2.Vpc(id=myClusterProps['VpcId'])
    defaultSg = list(vpc.security_groups.all())[0]
    print(defaultSg)
    
    defaultSg.authorize_ingress(
        GroupName= defaultSg.group_name,  # TODO: fill out
        CidrIp='0.0.0.0/0',  # TODO: fill out
        IpProtocol='TCP',  # TODO: fill out
        FromPort=int(DB_PORT),
        ToPort=int(DB_PORT)
    )
except Exception as e:
    print(e)

##### 2i. Verifiy Connection to Cluster

In [None]:
# more basic method not using psycopg2 if necessary
%load_ext sql

conn_string="postgresql://{}:{}@{}:{}/{}".format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT,DB_NAME)
print(conn_string)
%sql $conn_string

In [None]:
# Library for viewing the data from S3 in pandas
!pip install s3fs

#### 3. Explore S3 Staging Data
Get familiar with data used in S3 to create the schema for the SQL tables

##### Song Data

In [None]:
file_path = 's3://udacity-dend/song_data/A/B/C/TRABCEI128F424C983.json'
song_df = pd.read_json(file_path, lines=True)

In [None]:
song_df.head()

In [None]:
song_df.info()

##### Log Data

In [None]:
file_path = 's3://udacity-dend/log_data/2018/11/2018-11-12-events.json'
log_df = pd.read_json(file_path, lines=True)

In [None]:
log_df.head(2)

In [None]:
log_df.info()

#### 4. Create Tables

Staging Tables Schema

<img src="images/staging_tables1.png" width="50%" />

Star Schema

<img src="images/star_schema4.png" width="100%" />

In [None]:
##To Do
# Add references to tables
# Add sorting keys and optimization for redshift

In [None]:
%%sql 
CREATE SCHEMA IF NOT EXISTS sparkify;
SET search_path TO sparkify;

DROP TABLE IF EXISTS event_stage;
DROP TABLE IF EXISTS song_stage;
DROP TABLE IF EXISTS fact_songplay;
DROP TABLE IF EXISTS dim_user;
DROP TABLE IF EXISTS dim_song;
DROP TABLE IF EXISTS dim_artist;
DROP TABLE IF EXISTS dim_time;

CREATE TABLE "event_stage" (
  "artist" varchar,
  "auth" varchar,
  "firstName" varchar,
  "gender" varchar(4),
  "itemInSession" varchar,
  "lastName" varchar,
  "length" varchar,
  "level" varchar,
  "location" varchar,
  "method" varchar,
  "page" varchar,
  "registration" varchar,
  "sessionId" varchar,
  "song" varchar,
  "status" varchar,
  "ts" varchar,
  "userAgent" varchar,
  "userId" varchar
);


CREATE TABLE "song_stage" (
  "artist_id" varchar PRIMARY KEY,
  "artist_latitude" varchar,
  "artist_location" varchar,
  "artist_longitude" varchar,
  "artist_name" varchar,
  "duration" varchar,
  "num_songs" varchar,
  "song_id" varchar,
  "title" varchar,
  "year" varchar
);


CREATE TABLE "fact_songplay" (
  "songplay_id" varchar PRIMARY KEY,
  "start_time" bigint NOT NULL,
  "user_id" int NOT NULL,
  "level" varchar NOT NULL,
  "song_id" varchar,
  "artist_id" varchar,
  "session_id" int,
  "location" varchar,
  "user_agent" varchar
);


CREATE TABLE "dim_user" (
  "user_id" int PRIMARY KEY NOT NULL,
  "first_name" varchar,
  "last_name" varchar,
  "gender" varchar,
  "level" varchar NOT NULL
);


CREATE TABLE "dim_song" (
  "song_id" varchar PRIMARY KEY NOT NULL,
  "title" varchar NOT NULL,
  "artist_id" varchar,
  "year" int,
  "duration" float8
);


CREATE TABLE "dim_artist" (
  "artist_id" varchar PRIMARY KEY NOT NULL,
  "name" varchar NOT NULL,
  "location" varchar,
  "latitude" float8,
  "longitude" float8
);


CREATE TABLE "dim_time" (
  "time_key" bigint PRIMARY KEY NOT NULL,  
  "start_time" timestamp NOT NULL,
  "hour" int NOT NULL,
  "day" int NOT NULL,
  "week" int NOT NULL,
  "month" int NOT NULL,
  "year" int NOT NULL,
  "weekday" varchar NOT NULL
);

### 2. Load data into S3

In [None]:
# !pip install s3fs

In [None]:
import pandas as pd
import boto3
import os

In [None]:
# load configuration file

from configparser import ConfigParser

config = ConfigParser()
config.read_file(open('dwh.cfg'))

KEY=config.get('AWS','key')
SECRET= config.get('AWS','secret')

This was already done by Udacity
+ log data path: s3://udacity-dend/log_data
    + song_data/A/B/C/TRABCEI128F424C983.json
    + song_data/A/A/B/TRAABJL12903CDCF1A.json
+ log data json pat
    + s3://udacity-dend/log_json_path.json
+ song data path: s3://udacity-dend/song_data
    + log_data/2018/11/2018-11-12-events.json
    + log_data/2018/11/2018-11-13-events.json

In [None]:
from boto3 import resource

s3 = boto3.resource('s3',
                       region_name="us-west-2",
                       aws_access_key_id=KEY,
                       aws_secret_access_key=SECRET)



In [None]:
# View log files
from os import path
log_bucket = s3.Bucket(name="udacity-dend")

log_bucket_list = []
for obj in log_bucket.objects.filter(Prefix="log_data"):
    file_path = path.join(obj.bucket_name, obj.key)
    log_bucket_list.append(file_path)
    
print("File count: ", len(log_bucket_list))
print(log_bucket_list[1:3])
print(log_bucket_list[-3:])

In [None]:
# Uncomment if you need to view the data again
log_df = pd.read_json('s3://udacity-dend/log_data/2018/11/2018-11-01-events.json', lines=True)
log_df.head(3)

In [None]:
# Uncomment if you need to view the data again
song_df = pd.read_json('s3://udacity-dend/song_data/A/B/C/TRABCEI128F424C983.json', lines=True)
song_df

In [None]:
jsonpath_df = pd.read_json('s3://udacity-dend/log_json_path.json')
jsonpath_df

### 3. Load data from S3 into Redshift staging tables

In [None]:
staging_events_copy = ("""
copy event_stage
from 's3://udacity-dend/log_data/2018/11/2018'
credentials 'aws_iam_role={}'
format as json 's3://udacity-dend/log_json_path.json';
""").format(DB_ROLE_ARN)


In [None]:
print(staging_events_copy)

In [None]:
%sql $staging_events_copy

In [None]:
%%sql
SET search_path TO sparkify;
SELECT count(*)
FROM event_stage;

In [None]:
%%sql
SET search_path TO sparkify;
SELECT *
FROM event_stage
LIMIT 5;

In [None]:
staging_songs_copy = ("""
copy song_stage
from 's3://udacity-dend/song_data/'
credentials 'aws_iam_role={}'
format as json 'auto';
""").format(DB_ROLE_ARN)

In [None]:
print(staging_songs_copy)

In [None]:
%sql $staging_songs_copy

In [None]:
%%sql
SET search_path TO sparkify;
SELECT count(*)
FROM song_stage;

In [None]:
%%sql
SET search_path TO sparkify;
SELECT *
FROM song_stage
limit 10;

### 4. ETL data from Redshift staging tables to analytic tables

#### dim_user table

In [None]:
#If need to drop and re-load dim_user for testing convert

In [None]:
#create dim_user table
#user_id, first_name, last_name, gender, level

In [None]:
%%sql
SET search_path TO sparkify;

INSERT INTO dim_user (user_id, first_name, last_name, gender, level)
SELECT DISTINCT
    CAST(e.userID as integer) AS user_id,
    e.firstName AS first_name,
    e.lastName AS last_name,
    e.gender AS gender,
    e.level AS level
FROM event_stage e
WHERE e.page = 'NextSong'


In [None]:
#check for duplication

In [None]:
%%sql
SELECT user_id, first_name, last_name, COUNT(*)
FROM dim_user
GROUP BY user_id, first_name, last_name
ORDER BY COUNT(*) DESC
LIMIT 10;

In [None]:
%%sql
SELECT *
FROM dim_user
WHERE user_id = 80

**Note**: Some users have a free and paid level that needs to be used for filtering

#### dim_song table

In [None]:
#If need to drop and re-load dim_song for testing convert

In [None]:
#create dim_song table
#song_id, title, artist_id, year, duration

In [None]:
%%sql
SET search_path TO sparkify;
INSERT INTO dim_song (song_id, title, artist_id, year, duration)
SELECT DISTINCT
    s.song_id AS song_id,
    s.title AS title,
    s.artist_id AS artist_id,
    CAST(s.year as integer) AS year,
    CAST(s.duration as decimal(8,2)) AS duration
FROM song_stage s

In [None]:
# View first five rows

In [None]:
%%sql
SELECT *
FROM dim_song
LIMIT 5;

In [None]:
# Check for dupilcation

In [None]:
%%sql
SELECT
    s.song_id,
    s.title,
    COUNT(*)
FROM dim_song s
GROUP BY s.song_id, s.title
ORDER BY COUNT(*) DESC
LIMIT 10;

#### dim_artist table

In [None]:
#If need to drop and re-load dim_artist for testing

In [None]:
#create dim_artist
#artist_id, name, location, latitude, longitude

In [None]:
%%sql
SET search_path TO sparkify;

INSERT INTO dim_artist (artist_id, name, location, latitude, longitude)
SELECT DISTINCT
    s.artist_id AS artist_id,
    s.artist_name AS name,
    s.artist_location AS location,
    CONVERT(float, s.artist_latitude) AS latitude,
    CONVERT(float, s.artist_longitude) AS longitude
FROM song_stage s
JOIN event_stage e
    ON (e.artist = s.artist_name AND e.song = s.title)
WHERE e.page = 'NextSong'



In [None]:
%%sql

SELECT *
FROM dim_artist
LIMIT 5;

In [None]:
%%sql
SELECT
    a.artist_id,
    a.name,
    COUNT(*)
FROM dim_artist a
GROUP BY a.artist_id, a.name
ORDER BY COUNT(*) DESC
LIMIT 10;

In [None]:
%%sql
SELECT *
FROM dim_artist a
WHERE a.name = 'Dwight Yoakam'

In [None]:
#create dim_time
#time_key, start_time, hour, day, week, month, year, weekday
#NOTE!!!, see if can ingest ts as int to stage

#### dim_time table

In [None]:
%%sql
DROP TABLE IF EXISTS dim_time

In [None]:
%%sql
SET search_path TO sparkify;
CREATE TABLE "dim_time" (
  "time_key" bigint PRIMARY KEY NOT NULL,  
  "start_time" timestamp NOT NULL,
  "hour" int NOT NULL,
  "day" int NOT NULL,
  "week" int NOT NULL,
  "month" int NOT NULL,
  "year" int NOT NULL,
  "weekday" varchar NOT NULL
);

In [None]:
%%sql
SET search_path TO sparkify;
INSERT INTO dim_time(time_key, start_time, hour, day, week, month, year, weekday)
SELECT DISTINCT
    CAST(e.ts as bigint) AS time_key,
    TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second' as start_time,
    EXTRACT(hour from TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') AS hour,
    CAST(DATE_PART(day, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second')  as Integer) AS day,
    CAST(DATE_PART(week, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') as Integer) AS week,
    CAST(DATE_PART(month, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') as Integer) AS month,
    CAST(DATE_PART(year, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') as Integer) AS year,
    CASE
        WHEN(
                DATE_PART(dayofweek, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') = 0.0
                OR
                DATE_PART(dayofweek, TIMESTAMP 'epoch' + e.ts/1000 *INTERVAL '1 second') = 6.0
            )
        THEN 'no'
        ELSE 'yes'
        END
        AS weekday
FROM event_stage e
WHERE e.page = 'NextSong'
ORDER BY time_key ASC;



In [None]:
# view first 5 rows

In [None]:
%%sql
SET search_path TO sparkify;
SELECT *
FROM dim_time
LIMIT 5;

In [None]:
# check for duplicates

In [None]:
%%sql
SET search_path TO sparkify;
SELECT
    t.time_key,
    COUNT(*)
FROM dim_time t
GROUP BY t.time_key
ORDER BY COUNT(*) DESC
LIMIT 10;

#### fact_songplay table

In [None]:
#time_key, user_id, level, song_id, artist_id, session_id, location, user_agent

**Note**: 
+ Many songs listed in events table don't have a song in songs table. Like you "You Gotta Be" from Des'ree. That is because the data is only a subset of 14,896 rows of the 1 million song database. So it is ok if the simulated events don't have the song sometimes.

In [None]:
%%sql
SELECT *
FROM song_stage
WHERE title like 'You Gotta Be'

In [None]:
%%sql
SELECT Count(*)
FROM song_stage

In [None]:
#1541106106796

In [None]:
%%sql
DROP TABLE IF EXISTS "fact_songplay"

In [None]:
%%sql
CREATE TABLE "fact_songplay" (
  "songplay_id" varchar PRIMARY KEY,
  "start_time" bigint NOT NULL,
  "user_id" int NOT NULL,
  "level" varchar NOT NULL,
  "song_id" varchar,
  "artist_id" varchar,
  "session_id" int,
  "location" varchar,
  "user_agent" varchar
);

In [None]:
%%sql
SET search_path TO sparkify;
INSERT INTO fact_songplay(songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent)
SELECT
    e.userId || st.song_id || e.itemInSession as songplay_id,
    CAST(e.ts as bigint) as start_time,
    CAST(e.userId as int) as user_id,
    e.level as level,
    st.song_id,
    st.artist_id,
    CAST(e.itemInSession as int) as session_id,
    e.location as location,
    e.userAgent as user_agent
FROM (select * from event_stage where page = 'NextSong') as e
LEFT JOIN song_stage st
    ON (e.artist = st.artist_name OR e.song = st.title)
WHERE song_id <> 'None'
ORDER BY start_time ASC

In [None]:
%%sql
SET search_path TO sparkify;
SELECT *
FROM fact_songplay
LIMIT 3;

### Quality Checks using psycop

In [None]:

import psycopg2
import pandas as pd
host = config['CLUSTER']['DB_ENDPOINT']
dbname = config['CLUSTER']['DB_NAME']
user = config['CLUSTER']['DB_USER']
password = config['CLUSTER']['DB_PASSWORD']
port = config['CLUSTER']['DB_PORT']

psy_conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(host, dbname, user, password, port))
psy_cur = psy_conn.cursor()

In [None]:
user_table_dups = ("""
SELECT user_id, first_name, last_name, level, COUNT(*)
FROM sparkify.dim_user
GROUP BY user_id, first_name, last_name, level
ORDER BY COUNT(*) DESC
LIMIT 5
;
""")
song_table_dups = ("""


""")

artist_table_dups =("""
SELECT
    a.artist_id,
    a.name,
    COUNT(*)
FROM dim_artist a
GROUP BY a.artist_id, a.name
ORDER BY COUNT(*) DESC
LIMIT 10;
""")

time_table_dups = ("""

""")

In [None]:
def check_duplicates(query, conn, table):

    count_df = pd.read_sql(query, conn)
    count_s = count_df['count']
    total_count = 0
    for val in count_s:
        if val == 1:
            total_count += 0
        
    if total_count > 0:
        return(print("Rows duplicated in table {}".format(table)))
        
    else:
        return(print("No rows duplicated in table {}".format(table)))

In [None]:
check_duplicates(artist_table_dups, psy_conn, table='artist')

In [None]:
count_df = pd.read_sql(user_table_dups, psy_conn)

count_s = count_df['count']
query = 'user_table_dups'
total_count = 0
for val in count_s:
    if val == 1:
        total_count += 0
        
        
if total_count > 0:
    print("Rows duplicated in {}".format(query))
else:
    print("No rows duplicated in {}".format(query))


In [None]:
psy_cur.execute(user_table_dups)

### Delete Resources Once Done

In [None]:
# delete Redshift resources
redshift.delete_cluster( ClusterIdentifier=DB_HOST,  SkipFinalClusterSnapshot=True)

In [None]:
# verify deletion
myClusterProps = redshift.describe_clusters(ClusterIdentifier=DB_HOST)['Clusters'][0]
prettyRedshiftProps(myClusterProps)

In [None]:
#remove roles and policies
iam.detach_role_policy(RoleName=DB_ROLE_NAME, PolicyArn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess")
iam.delete_role(RoleName=DB_ROLE_NAME)