[![AWS Data Wrangler](_static/logo.png "AWS Data Wrangler")](https://github.com/awslabs/aws-data-wrangler)

# 8 - Redshift - COPY & UNLOAD

`Amazon Redshift` has two SQL command that help to load and unload large amount of data staging it on `Amazon S3`:

1 - [COPY](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html)

2 - [UNLOAD](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

Let's take a look and how Wrangler can use it.

In [1]:
import awswrangler as wr

engine = wr.catalog.get_engine("aws-data-wrangler-redshift")

## Enter your bucket name:

In [2]:
import getpass
bucket = getpass.getpass()
path = f"s3://{bucket}/stage/"

 ···········································


## Enter your IAM ROLE ARN:

In [3]:
iam_role = getpass.getpass()

 ····················································································


### Creating a Dataframe from the NOAA's CSV files

[Reference](https://registry.opendata.aws/noaa-ghcn/)

In [4]:
cols = ["id", "dt", "element", "value", "m_flag", "q_flag", "s_flag", "obs_time"]

df = wr.s3.read_csv(
    path="s3://noaa-ghcn-pds/csv/1897.csv",
    names=cols,
    parse_dates=["dt", "obs_time"])  # ~127MB, ~4MM rows

df

Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,AG000060590,1897-01-01,TMAX,170,,,E,
1,AG000060590,1897-01-01,TMIN,-14,,,E,
2,AG000060590,1897-01-01,PRCP,0,,,E,
3,AGE00135039,1897-01-01,TMAX,140,,,E,
4,AGE00135039,1897-01-01,TMIN,40,,,E,
...,...,...,...,...,...,...,...,...
3898086,UZM00038457,1897-12-31,TMIN,-145,,,r,
3898087,UZM00038457,1897-12-31,PRCP,4,,,r,
3898088,UZM00038457,1897-12-31,TAVG,-95,,,r,
3898089,UZM00038618,1897-12-31,PRCP,66,,,r,


## Load and Unload with the regular functions (to_sql and read_sql_query)

In [5]:
%%time

wr.db.to_sql(
    df,
    engine,
    schema="public",
    name="regular",
    if_exists="replace",
    index=False
)

CPU times: user 1min 5s, sys: 2.62 s, total: 1min 8s
Wall time: 4min 29s


In [6]:
%%time

wr.db.read_sql_query("SELECT * FROM public.regular", con=engine)

CPU times: user 15.3 s, sys: 2.01 s, total: 17.3 s
Wall time: 27.3 s


Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,AG000060590,1897-01-01,TMIN,-14,,,E,
1,AGE00135039,1897-01-01,TMAX,140,,,E,
2,AGE00135039,1897-01-01,PRCP,0,,,E,
3,AGE00147705,1897-01-01,TMIN,98,,,E,
4,AGE00147708,1897-01-01,TMAX,170,,,E,
...,...,...,...,...,...,...,...,...
3898086,USW00094967,1897-12-31,TMAX,-144,,,6,
3898087,USW00094967,1897-12-31,PRCP,0,P,,6,
3898088,UZM00038457,1897-12-31,TMAX,-49,,,r,
3898089,UZM00038457,1897-12-31,PRCP,4,,,r,


## Load and Unload with COPY and UNLOAD commands

In [7]:
%%time

wr.db.copy_to_redshift(
    df=df,
    path=path,
    con=engine,
    schema="public",
    table="commands",
    mode="overwrite",
    iam_role=iam_role,
)

CPU times: user 2.23 s, sys: 201 ms, total: 2.43 s
Wall time: 9.51 s


In [8]:
%%time

wr.db.unload_redshift(
    sql="SELECT * FROM public.commands",
    con=engine,
    iam_role=iam_role,
    path=path,
    keep_files=True,
)

CPU times: user 3.65 s, sys: 671 ms, total: 4.32 s
Wall time: 12.5 s


Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,AG000060590,1897-01-01,TMAX,170,,,E,
1,AG000060590,1897-01-01,PRCP,0,,,E,
2,AGE00135039,1897-01-01,TMIN,40,,,E,
3,AGE00147705,1897-01-01,TMAX,164,,,E,
4,AGE00147705,1897-01-01,PRCP,0,,,E,
...,...,...,...,...,...,...,...,...
3898086,USW00094967,1897-12-31,TMAX,-144,,,6,
3898087,USW00094967,1897-12-31,PRCP,0,P,,6,
3898088,UZM00038457,1897-12-31,TMAX,-49,,,r,
3898089,UZM00038457,1897-12-31,PRCP,4,,,r,
