### Notebook to create 2 Redshift tables


### Process employed for deciding variable type :      
  
1. Looked at a sample of 1000 files from the S3 bucket which translated to 48593 rows of data in the **impressions** dataframe and and 554 rows in **clicks**  dataframe
2. Checked individual columns for unique values and byte size range to select the varibale types
  
### Some observations and assumptions from the data:    
  
1. Primary Key for **clicks** table is `id`  
2. Primary Key for **impressions** table is also `id` 
3. Also observed the rows where the `impression_id` in **clicks** table matched the `id` in **impressions** table, the entries across all common column of both the tables were identical. This indicates the `impression_id` is the foreign key in **clicks** table that refrences **impressions** table's primary key
4. **meta_schema** column maybe indicating the name of the database schema but not clear from the table. So I've assumed **ll_schema** as the name of the schema 



<a id='Pre'></a>

     
### Pre-requisites to run notebook  

1.[Enter AWS Credentials](#AWSCred)




### Steps implemented in the notebook:

1. [Creates table **impressions**](#Imp)
2. [Creates table **clicks**](#Click)


In [None]:
import boto3
import psycopg2
import pandas as pd

<a id='AWSCred'></a>
#### Enter AWS Credentials

In [None]:
# Enter the Redshift DB credentials below

Dbname= 'Enter the dbname'
Port= 'Enter port'
User ='Enter user'
Password ='Enter password'
Host ='Enter host'

[Back](#Pre)

In [None]:
# Make conection to Redshift DB

try:
    con = psycopg2.connect( 
          host='%s' % (Host),
          user='%s' % (User),
          port='%s' % (Port),
          password='%s' % (Password),
          dbname='%s' % (DBname))
    print("Connection Successful")
except:
    print("Unable to connect to Redshift")
        
cur = con.cursor()    


<a id='Imp'></a>

#### Create Table-  impressions

In [None]:
# string for sql create table query

sql_create_impres = '''CREATE TABLE if not exists ll_schema.impressions( 
                         meta_schema varchar(64),
                         meta_version numeric(4,2),
                         gdpr_computed boolean,
                         gdpr_source varchar(64),
                         remote_i_p varchar(64),
                         user_agent varchar(64),
                         ecpm smallint,
                         datacenter boolean,
                         burn_in boolean,
                         is_valid_u_a boolean,
                         user_key integer not null,
                         impression_count smallint, 
                         id varchar(128) not null,
                         decision_id varchar(128) not null,
                         decision_idx smallint,
                         created_on timestamp,
                         event_created_on timestamp,
                         impression_created_on timestamp,
                         ad_type_id smallint,
                         auction_bids smallint,
                         brand_id integer not null,
                         campaign_id integer,
                         categories varchar(32),
                         channel_id integer,
                         creative_id bigint,
                         creative_pass_id integer,
                         delivery_mode smallint,
                         first_channel_id integer,
                         is_no_track boolean,
                         is_tracking_cookie_events boolean,
                         is_publisher_payout_exempt boolean,
                         keywords varchar(256),
                         matching_keywords varchar(256),
                         network_id integer,
                         pass_id bigint,
                         phantom_creative_pass_id smallint,
                         placement_name varchar(128),
                         phantom_pass_id smallint,
                         priority_id integer,
                         price double precision,
                         rate_type smallint,
                         relevancy_score smallint,
                         revenue double precision,
                         net_revenue double precision,
                         gross_revenue double precision,
                         served_by varchar(128),
                         served_by_pid integer,
                         served_by_asg varchar(128),
                         site_id smallint,
                         url varchar(256),
                         zone_id integer,
                         user_is_new boolean,
                         device_brand_name varchar(64),
                         device_model_name varchar(64),
                         device_os_raw_version integer,
                         device_os_major_version smallint,
                         device_os_minor_version smallint,
                         device_browser varchar(128),
                         device_browser_raw_version numeric(4,2),
                         device_browser_major_version smallint,
                         device_browser_minor_version smallint,
                         device_form_factor varchar(64)
                         primary key(id));'''   


#Attempt to create impressions table
try:
    cur.execute(sql_create_impres)
    con.commit()
    print("Created impressions table")
except:
    print("Failed to create impressions table")



[Back](#Pre)

<a id='Click'></a>

#### Create Table-  clicks

In [None]:
# string for sql create table query

sql_create_clicks = '''CREATE TABLE if not exists ll_schema.clicks( 
                         meta_schema varchar(64),
                         meta_version numeric(4,2),
                         gdpr_computed boolean,
                         gdpr_source varchar(64),
                         remote_i_p varchar(64),
                         user_agent varchar(64),
                         ecpm smallint,
                         datacenter boolean,
                         burn_in boolean,
                         is_valid_u_a boolean,
                         user_key integer not null,
                         click_count smallint,
                         id varchar(128) not null,
                         created_on timestamp,
                         event_created_on timestamp,
                         impression_created_on timestamp,
                         ad_type_id smallint,
                         brand_id integer not null,
                         campaign_id integer,
                         categories varchar(32),
                         channel_id integer,
                         creative_id bigint,
                         creative_pass_id integer,
                         delivery_mode smallint,
                         first_channel_id integer,
                         impression_id varchar(128) not null,
                         decision_id varchar(128) not null,
                         is_no_track boolean,
                         is_tracking_cookie_events boolean,
                         is_publisher_payout_exempt boolean,
                         keywords varchar(256),
                         matching_keywords varchar(256),
                         network_id integer,
                         pass_id bigint,
                         phantom_creative_pass_id smallint,
                         placement_name varchar(128),
                         phantom_pass_id smallint,
                         priority_id integer,
                         price double precision,
                         rate_type smallint,
                         revenue double precision,
                         served_by varchar(128),
                         served_by_pid integer,
                         served_by_asg varchar(128),
                         site_id smallint,
                         url varchar(256),
                         zone_id integer,
                         user_is_new boolean,
                         device_brand_name varchar(64),
                         device_model_name varchar(64),
                         device_os_raw_version integer,
                         device_os_major_version smallint,
                         device_os_minor_version smallint,
                         device_browser varchar(128),
                         device_browser_raw_version numeric(4,2),
                         device_browser_major_version smallint,
                         device_browser_minor_version smallint,
                         device_form_factor varchar(64),
                         primary key(id),
                         foreign key(impressionid) references impressions(id));'''   


#Attempt to create clicks table
try:
    cur.execute(sql_create_clicks)
    con.commit()
    print("Created clicks table")
except:
    print("Failed to create clicks table")


In [None]:
#close all connections

cur.close()
con.close()

[Back](#Pre)