[![AWS Data Wrangler](_static/logo.png "AWS Data Wrangler")](https://github.com/awslabs/aws-data-wrangler)

# 8 - Redshift COPY & UNLOAD

`Amazon Redshift` has two SQL command that help to load and unload large amount of data staging it on `Amazon S3`:

1 - [COPY](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html)

2 - [UNLOAD](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

Let's take a look and how Wrangler could can use it.

In [1]:
import awswrangler as wr

engine = wr.catalog.get_engine("aws-data-wrangler-redshift")

## Enter your bucket name:

In [2]:
import getpass
bucket = getpass.getpass()
path = f"s3://{bucket}/stage/"

 ··········································


## Enter your IAM ROLE ARN:

In [3]:
iam_role = getpass.getpass()

 ···············································································


### Creating a Dataframe from the NOAA's CSV files

[Reference](https://registry.opendata.aws/noaa-ghcn/)

In [4]:
cols = ["id", "dt", "element", "value", "m_flag", "q_flag", "s_flag", "obs_time"]

df = wr.s3.read_csv(
    path="s3://noaa-ghcn-pds/csv/1897.csv",
    names=cols,
    parse_dates=["dt", "obs_time"])  # ~127MB, ~4MM rows

df

Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,USC00300379,1897-01-01,TMAX,50,,,6,
1,USC00300379,1897-01-01,TMIN,-61,,,6,
2,USC00300379,1897-01-01,PRCP,0,T,,6,
3,USC00300379,1897-01-01,SNOW,0,,,6,
4,ASN00070200,1897-01-01,PRCP,0,,,a,
...,...,...,...,...,...,...,...,...
3898086,USC00212698,1897-12-31,SNOW,0,,,6,
3898087,ASN00035059,1897-12-31,PRCP,0,,,a,
3898088,ASN00061000,1897-12-31,PRCP,38,,,a,
3898089,ASN00048117,1897-12-31,PRCP,0,,,a,


## Load and Unload with the regular functions (to_sql and read_sql_query)

In [5]:
%%time

wr.db.to_sql(
    df,
    engine,
    schema="public",
    name="regular",
    if_exists="replace",
    index=False
)

CPU times: user 1min 2s, sys: 2.83 s, total: 1min 4s
Wall time: 4min 26s


In [6]:
%%time

wr.db.read_sql_query("SELECT * FROM public.regular", con=engine)

CPU times: user 14.8 s, sys: 2.01 s, total: 16.9 s
Wall time: 24.9 s


Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,USC00300379,1897-01-01,TMAX,50,,,6,
1,USC00300379,1897-01-01,PRCP,0,T,,6,
2,ASN00070200,1897-01-01,PRCP,0,,,a,
3,USC00332067,1897-01-01,TMIN,72,,,6,
4,USC00332067,1897-01-01,SNOW,0,,,6,
...,...,...,...,...,...,...,...,...
3898086,ASN00047031,1897-12-31,PRCP,0,,,a,
3898087,USC00212698,1897-12-31,TMIN,-167,,,6,
3898088,USC00212698,1897-12-31,SNOW,0,,,6,
3898089,ASN00061000,1897-12-31,PRCP,38,,,a,


## Load and Unload with COPY and UNLOAD commands

In [7]:
%%time

wr.db.copy_to_redshift(
    df=df,
    path=path,
    con=engine,
    schema="public",
    table="commands",
    mode="overwrite",
    iam_role=iam_role,
)

CPU times: user 3.13 s, sys: 380 ms, total: 3.51 s
Wall time: 9.95 s


In [8]:
%%time

wr.db.unload_redshift(
    sql="SELECT * FROM public.commands",
    con=engine,
    iam_role=iam_role,
    path=path,
    keep_files=True,
)

CPU times: user 2.8 s, sys: 605 ms, total: 3.41 s
Wall time: 10 s


Unnamed: 0,id,dt,element,value,m_flag,q_flag,s_flag,obs_time
0,USC00300379,1897-01-01,TMAX,50,,,6,
1,USC00300379,1897-01-01,PRCP,0,T,,6,
2,ASN00070200,1897-01-01,PRCP,0,,,a,
3,USC00332067,1897-01-01,TMIN,72,,,6,
4,USC00332067,1897-01-01,SNOW,0,,,6,
...,...,...,...,...,...,...,...,...
3898086,USC00183150,1897-12-31,SNOW,0,,,6,
3898087,USC00212698,1897-12-31,TMAX,-122,,I,6,
3898088,USC00212698,1897-12-31,PRCP,0,P,,6,
3898089,ASN00035059,1897-12-31,PRCP,0,,,a,
