[![AWS SDK for pandas](_static/logo.png "AWS SDK for pandas")](https://github.com/aws/aws-sdk-pandas)

Athena supports read, time travel, write, and DDL queries for Apache Iceberg tables that use the Apache Parquet format for data and the AWS Glue catalog for their metastore. More in [User Guide](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html).

### Create Iceberg table

In [43]:
import awswrangler as wr

glue_database = "aws_sdk_pandas"
glue_table = "iceberg_test"
path = "s3://.../iceberg_test/"

# Cleanup table before create
wr.catalog.delete_table_if_exists(database=glue_database, table=glue_table)

create_sql = (
    f"CREATE TABLE {glue_table} (id int, name string) "
    f"LOCATION '{path}' "
    f"TBLPROPERTIES ( 'table_type' ='ICEBERG', 'format'='parquet' )"
)

query_execution_id = wr.athena.start_query_execution(
    sql=create_sql,
    database=glue_database,
    wait=True,
)

### Insert data

In [44]:
query_execution_id = wr.athena.start_query_execution(
    sql=f"INSERT INTO {glue_table} VALUES (1,'John'), (2, 'Lily'), (3, 'Richard')",
    database=glue_database,
    wait=True,
)

In [45]:
query_execution_id = wr.athena.start_query_execution(
    sql=f"INSERT INTO {glue_table} VALUES (4,'Anne'), (5, 'Jacob'), (6, 'Leon')",
    database=glue_database,
    wait=True,
)

### Read query metadata

In a SELECT query, you can use the following properties after `table_name` to query Iceberg table metadata:

- `$files` Shows a table's current data files

- `$manifests` Shows a table's current file manifests

- `$history` Shows a table's history

- `$partitions` Shows a table's current partitions

In [14]:
wr.athena.read_sql_query(
    sql=f'SELECT * FROM "{glue_table}$files"',
    database=glue_database,
    ctas_approach=False,
    unload_approach=False,
)

Unnamed: 0,content,file_path,file_format,record_count,file_size_in_bytes,column_sizes,value_counts,null_value_counts,nan_value_counts,lower_bounds,upper_bounds,key_metadata,split_offsets,equality_ids
0,0,s3://.../iceberg_test/01/data/2...,PARQUET,3,355,"{1=48, 2=61}","{1=3, 2=3}","{1=0, 2=0}",{},"{1=4, 2=Anne}","{1=6, 2=Leon}",,,
1,0,s3://.../iceberg_test/01/data/3...,PARQUET,3,360,"{1=48, 2=63}","{1=3, 2=3}","{1=0, 2=0}",{},"{1=1, 2=John}","{1=3, 2=Richard}",,,


In [17]:
wr.athena.read_sql_query(
    sql=f'SELECT * FROM "{glue_table}$manifests"',
    database=glue_database,
    ctas_approach=False,
    unload_approach=False,
)

Unnamed: 0,path,length,partition_spec_id,added_snapshot_id,added_data_files_count,added_rows_count,existing_data_files_count,existing_rows_count,deleted_data_files_count,deleted_rows_count,partitions
0,s3://.../iceberg_test/01/metada...,6546,0,1445575484465379918,1,3,0,0,0,0,[]
1,s3://.../iceberg_test/01/metada...,6550,0,1863224087207392743,1,3,0,0,0,0,[]


In [26]:
df = wr.athena.read_sql_query(
    sql=f'SELECT * FROM "{glue_table}$history"',
    database=glue_database,
    ctas_approach=False,
    unload_approach=False,
)

snapshot_id = df.snapshot_id[0]

df

Unnamed: 0,made_current_at,snapshot_id,parent_id,is_current_ancestor
0,2023-03-15 14:52:42.286000+00:00,1863224087207392743,,True
1,2023-03-15 14:52:53.843000+00:00,1445575484465379918,1.8632240872073928e+18,True


In [20]:
wr.athena.read_sql_query(
    sql=f'SELECT * FROM "{glue_table}$partitions"',
    database=glue_database,
    ctas_approach=False,
    unload_approach=False,
)

Unnamed: 0,record_count,file_count,total_size,data
0,6,2,715,"{id={min=1, max=6, null_count=0, nan_count=nul..."


### Time travel queries

In [36]:
wr.athena.read_sql_query(
    sql=f"SELECT * FROM {glue_table} FOR TIMESTAMP AS OF (current_timestamp - interval '5' second)",
    database=glue_database,
)

Unnamed: 0,id,name
0,1,John
1,4,Anne
2,2,Lily
3,3,Richard
4,5,Jacob
5,6,Leon


### Version travel queries


In [34]:
wr.athena.read_sql_query(
    sql=f"SELECT * FROM {glue_table} FOR VERSION AS OF {snapshot_id}",
    database=glue_database,
)

Unnamed: 0,id,name
0,1,John
1,2,Lily
2,3,Richard
