In [None]:
!pip install getdaft

# Feature of the Week: Input/Output Configurations (IOConfig)

`IOConfig` is Daft's mechanism for controlling the behavior of data input/output from storage. It is useful for:

1. **Providing credentials** for authenticating with cloud storage services
2. **Tuning performance** or reducing load on storage services

For a deeper look at `IOConfig`, see: [IOConfig Documentation](https://www.getdaft.io/projects/docs/en/latest/api_docs/doc_gen/io_configs/daft.io.IOConfig.html?highlight=IOConfig)


## Default IOConfig Behavior

The default behavior for IOConfig is to automatically detect credentials on your machines.

In [None]:
import daft

# By default, calls to AWS S3 will use credentials retrieved from the machine(s) that they are called from
#
# For AWS S3 services, the default mechanism is to look through a chain of possible "providers":
# https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials
df = daft.read_csv("s3://daft-public-data/file.csv")
df.collect()

## Overriding the IOConfig
### Setting a Global Override

Oftentimes you may want Daft to just use a certain configuration by default whenever it has to access storage such as S3, GCS or Azure Blob Store.

> **Example:**
>
> An extremely common use-case is to create a set of temporary credentials once, and share that across all calls to data access happening in Daft.
>
> The example below demonstrates this with AWS S3's `boto3` Python SDK.

In [None]:
# Use the boto3 library to generate temporary credentials which can be used for S3 access
import boto3
session = boto3.session.Session()
creds = session.get_credentials()

# Attach temporary credentials to a Daft IOConfig object
MY_IO_CONFIG = daft.io.IOConfig(
    s3=daft.io.S3Config(
        key_id=creds.access_key,
        access_key=creds.secret_key,
        session_token=creds.token,
    )
)

# Set the default config to `MY_IO_CONFIG` so that it is used in the absence of any overrides
daft.set_planning_config(default_io_config=MY_IO_CONFIG)

### Overriding IOConfigs per-API call

Daft also allows for more granular per-call overrides through the use of keyword arguments.

This is extremely flexible, allowing you to use a different set of credentials to read from two different locations!

Here we use `daft.read_csv` as an example, but the same `io_config=...` keyword arg also exists for other I/O related functionality such as:

1. `daft.read_parquet`
2. `daft.read_json`
3. `Expression.url.download()`

In [None]:
# An "Anonymous" IOConfig will access storage **without credentials**, and can only access fully public data
MY_ANONYMOUS_IO_CONFIG = daft.io.IOConfig(s3=daft.io.S3Config(anonymous=True))

# Read this file using `MY_ANONYMOUS_IO_CONFIG` instead of the overridden global config `MY_IO_CONFIG`
df1 = daft.read_csv("s3://daft-public-data/melbourne-airbnb/melbourne_airbnb.csv", io_config=MY_ANONYMOUS_IO_CONFIG)