## Jupyter Notebook Description

##### This is the code related to the technical walkthrough I did in this video: https://youtu.be/AX-TlukZL0c You will need to set your profile name and glue database connection name in order to make this work with your code as well as your sql query. This is to be used as an example to get you up and running querying data from redshift with the AWS SDK for Pandas Library.

In [1]:
import awswrangler
import boto3
import os

## Create Boto3 Session

In [2]:
profile_name = os.environ.get('profile_name')
boto3_session = boto3.Session(profile_name=profile_name,region_name='us-east-1')

## Create Redshift connection

In [3]:
glue_database_connection = 'adriano_redshift_cluster'
redshift_connection = awswrangler.redshift.connect(connection='adriano_redshift_cluster',
                                                   boto3_session=boto3_session,)

## Query an entire redshift table

In [4]:
sql_query = """select * from public.category_1
"""
df = awswrangler.redshift.read_sql_query(sql=sql_query,con=redshift_connection)
print (df)

    catid  catgroup        catname  \
0       2    Sports            NHL   
1       5    Sports            MLS   
2       3    Sports            NFL   
3       8     Shows          Opera   
4       9  Concerts    Country pop   
5       4    Sports            NBA   
6       7     Shows          Plays   
7      10  Concerts           Jazz   
8       1    Sports            MLB   
9       6     Shows       Musicals   
10     11  Concerts      Classical   
11     13  Concerts          House   
12     12  Concerts  Electro Swing   

                                           catdesc date_modified  
0                           National Hockey League    2021-03-11  
1                              Major League Soccer    2021-03-11  
2                         National Football League    2021-03-11  
3                        All opera and light opera    2021-03-11  
4    a fusion genre of country music and pop music    2023-03-05  
5                  National Basketball Association    2021-03-11 

## Query single table with where statement

In [5]:
sql_query = """select * from public.category_1
where catid = 6
"""
df = awswrangler.redshift.read_sql_query(sql=sql_query,con=redshift_connection)
print(df)

   catid catgroup   catname          catdesc date_modified
0      6    Shows  Musicals  Musical theatre    2021-03-11


## Query single table with parameters

In [6]:
cat_id = 6 #parameter to pass to sql query
sql_query = f"""select * from public.category_1
where catid = %s
"""
df = awswrangler.redshift.read_sql_query(sql=sql_query,con=redshift_connection,params=[cat_id])
print(df)

   catid catgroup   catname          catdesc date_modified
0      6    Shows  Musicals  Musical theatre    2021-03-11


## Query data from multiple tables with a SQL JOIN

In [7]:
cat_id = 6 #parameter to pass to sql query
sql_query = """select
eventid,
eventname,
category_1.catid,
catname,
starttime
from event
inner join public.category_1 on public.category_1.catid =  event.catid
where category_1.catid = %s
"""
df = awswrangler.redshift.read_sql_query(sql=sql_query,con=redshift_connection,params=[cat_id])
print(df)

      eventid          eventname  catid   catname           starttime
0        1334     The King and I      6  Musicals 2008-01-01 14:30:00
1        1376     The King and I      6  Musicals 2008-01-01 14:30:00
2        1766           Spamalot      6  Musicals 2008-01-02 15:00:00
3         744        Jersey Boys      6  Musicals 2008-01-03 20:00:00
4        1191           Spamalot      6  Musicals 2008-01-03 20:00:00
...       ...                ...    ...       ...                 ...
1295     1341  Shrek the Musical      6  Musicals 2008-12-30 15:00:00
1296     1545       Beatles LOVE      6  Musicals 2008-12-30 19:30:00
1297      816         Mamma Mia!      6  Musicals 2008-12-31 14:30:00
1298      914     Legally Blonde      6  Musicals 2008-12-31 19:00:00
1299     1489         Mamma Mia!      6  Musicals 2008-12-31 19:30:00

[1300 rows x 5 columns]
