![AWS Data Wrangler](_static/logo.png "AWS Data Wrangler")

# AWS S3 write modes

[AWS Data Wrangler](https://github.com/awslabs/aws-data-wrangler) has 3 different write modes to store data on AWS S3.

* **append**
    - Only adds new files without any delete
* **overwrite**
    - Deletes everything in the target directory and then add the new files.
* **overwrite_partitions (Partition Upsert)**
    - Only deletes the paths of partitions that should be updated and then writes the new partitions rows. It's like a partition Upsert.

This tutorial will illustrate all of it.

In [1]:
from datetime import date

import awswrangler as wr
import pandas as pd

S3_BUCKET = "BUCKET_NAME"
S3_PATH = f"s3://{S3_BUCKET}/tutorial" 

## Mocking a Dataset

In [2]:
df = pd.DataFrame({
    "id": [1, 2],
    "value": ["foo", "boo"],
    "date": [date(2020, 1, 1), date(2020, 1, 2)]
})

wr.pandas.to_parquet(
    dataframe=df,
    path=S3_PATH,
    mode="overwrite",
    preserve_index=False
)

wr.pandas.read_parquet(path=S3_PATH)

Unnamed: 0,id,value,date
0,2,boo,2020-01-02
1,1,foo,2020-01-01


## Appending

In [3]:
df = pd.DataFrame({
    "id": [3],
    "value": ["bar"],
    "date": [date(2020, 1, 3)]
})

wr.pandas.to_parquet(
    dataframe=df,
    path=S3_PATH,
    mode="append",
    preserve_index=False
)

wr.pandas.read_parquet(path=S3_PATH)

Unnamed: 0,id,value,date
0,2,boo,2020-01-02
1,3,bar,2020-01-03
2,1,foo,2020-01-01


## Overwriting

In [4]:
df = pd.DataFrame({
    "id": [3],
    "value": ["bar"],
    "date": [date(2020, 1, 3)]
})

wr.pandas.to_parquet(
    dataframe=df,
    path=S3_PATH,
    mode="overwrite",
    preserve_index=False
)

wr.pandas.read_parquet(path=S3_PATH)

Unnamed: 0,id,value,date
0,3,bar,2020-01-03


## Mocking a partitoned Dataset

In [5]:
df = pd.DataFrame({
    "id": [1, 2],
    "value": ["foo", "boo"],
    "date": [date(2020, 1, 1), date(2020, 1, 2)]
})

wr.pandas.to_parquet(
    dataframe=df,
    path=S3_PATH,
    mode="overwrite",
    preserve_index=False,
    partition_cols=["date"]
)

wr.pandas.read_parquet(path=S3_PATH)

Unnamed: 0,id,value,date
0,1,foo,2020-01-01
1,2,boo,2020-01-02


## Upserting partitions (overwrite_partitions)

In [6]:
df = pd.DataFrame({
    "id": [2, 3],
    "value": ["xoo", "bar"],
    "date": [date(2020, 1, 2), date(2020, 1, 3)]
})

wr.pandas.to_parquet(
    dataframe=df,
    path=S3_PATH,
    mode="overwrite_partitions",
    preserve_index=False,
    partition_cols=["date"]
)

wr.pandas.read_parquet(path=S3_PATH)

Unnamed: 0,id,value,date
0,1,foo,2020-01-01
1,2,xoo,2020-01-02
2,3,bar,2020-01-03
