feat(glue-alpha): support Apache Iceberg tables by ksco92 · Pull Request #37988 · aws/aws-cdk

ksco92 · 2026-05-24T14:38:47Z

Issue # (if applicable)

Reason for this change

CloudFormation AWS::Glue::Table supports Apache Iceberg via OpenTableFormatInput, but the shape that survives UpdateTable is not the one most documentation shows. Placing columns under tableInput.storageDescriptor.columns and openTableFormatInput.icebergInput together creates an Iceberg table on CREATE, then silently strips table_type=ICEBERG and metadata_location from the Glue parameters on the first UPDATE. Athena queries after that fail with HIVE_UNSUPPORTED_FORMAT.

The working shape places schema, partition spec, sort order, and properties under openTableFormatInput.icebergInput.icebergTableInput and omits tableInput entirely. There is no L2 in @aws-cdk/aws-glue-alpha that emits this shape today; users either reach into the L1 escape hatch or end up with the broken shape.

Description of changes

Adds an IcebergTable L2 construct in @aws-cdk/aws-glue-alpha plus the supporting types (IcebergType, IcebergPartitionTransform, IcebergDataFormat, IcebergFormatVersion, IcebergSortDirection, IcebergNullOrder, IcebergColumn, IcebergPartitionField, IcebergSortField, IIcebergTable).

The construct:

emits only the safe openTableFormatInput.icebergInput.icebergTableInput shape; never publishes a tableInput sibling
validates partition transforms against source-column types at synth time (day / month / year require date/timestamp; hour requires timestamp; bucket and truncate require their respective source-type whitelists)
validates tableProperties against the codec / write-mode / format-version matrix (rejects e.g. merge-on-read on v1, bzip2 on parquet, non-positive numeric values)
resolves identifierFieldNames to ids, refusing floating-point columns per the Iceberg spec
threads a single id counter through nested types so every id in the schema is globally unique
surfaces grantRead / grantWrite / grantReadWrite as four separate IAM statements: Glue actions on the Glue table ARN, s3:ListBucket on the bucket ARN with an s3:prefix condition limiting the grantee to the table's own prefix, s3:GetBucketLocation / s3:ListBucketMultipartUploads on the bucket ARN unconditionally (these actions do not support s3:prefix and would be silently denied if conditioned), and S3 object-level actions on bucket/prefix*
honors optional per-column id pinning so users can add, remove, and reorder columns across deploys without breaking the Iceberg "ids unique per table schema" invariant

The types model is intentionally jsii-friendly — IcebergType and IcebergPartitionTransform are single concrete classes discriminated by a public kind enum, no private constructors, no function-typed fields.

Description of how you validated changes

test/iceberg-table.test.ts — 27 unit tests exercising happy paths, defaults, partition / sort rendering, identifier resolution, pinned column ids, every validation failure path, and the four-statement grant shape.
test/integ.iceberg-table.ts + committed .snapshot/ — provisions a Glue database, a warehouse bucket, and two Iceberg tables (orders with day + bucket partitions, sort order, identifier-field-ids, and merge-on-read; events with hourly partitioning). The integ-runner deployed and verified end-to-end in us-east-1 in 81s.
The README in @aws-cdk/aws-glue-alpha gained an "Iceberg Tables" section with rosetta-compilable snippets and an explicit limitations callout (CFN's metadataOperation: CREATE-only restriction, the cross-deploy field-id reuse that the construct cannot detect, and the Iceberg void intermediate that CFN cannot express when dropping a partition-source column).

Checklist

My code adheres to the CONTRIBUTING GUIDE and DESIGN GUIDELINES

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Adds `IcebergTable`, an L2 construct that creates Apache Iceberg tables in the AWS Glue Data Catalog via the working `AWS::Glue::Table.OpenTableFormatInput.IcebergInput.IcebergTableInput` shape. The construct supports CREATE / UPDATE / DELETE through `cdk deploy` like any other resource. The motivating issue documents that the obvious shape (`tableInput.storageDescriptor.columns` + `openTableFormatInput.iceberginput`) silently strips `table_type=ICEBERG` from the Glue parameters on the first UPDATE, making the table unqueryable in Athena. `IcebergTable` emits only the safe shape and refuses to publish `tableInput` next to `openTableFormatInput`. Surface: - `IcebergType` primitives + `decimal(p, s)` / `fixed(L)` / `list` / `map` / `struct` factories. Nested types thread a single id counter through the schema so every field/element/key/value gets a globally unique id per the Iceberg spec. - `IcebergPartitionTransform` with source-type validation: `IDENTITY`, `YEAR`, `MONTH`, `DAY`, `HOUR`, `VOID`, `bucket(N)`, `truncate(W)`. - Sort orders with `IcebergSortDirection` (asc/desc) and `IcebergNullOrder` (nulls-first/nulls-last). - Identifier-field-ids resolved from column names; floating-point columns rejected per the spec. - Optional per-column `id` pinning for safe schema evolution across deploys. - `IcebergDataFormat` (parquet/orc/avro — default parquet) and `IcebergFormatVersion` (v1/v2 — default v2). - `tableProperties` validator: rejects bad codec/format/write-mode combinations, `merge-on-read` on a v1 table, non-positive numeric values, non-boolean booleans, at synth time. - `grantRead` / `grantWrite` / `grantReadWrite` and `fromIcebergTableAttributes` import shim. Grants are split across four IAM statements so `s3:ListBucket` is scoped to the table's prefix while `s3:GetBucketLocation` and `s3:ListBucketMultipartUploads` (which do not support the `s3:prefix` condition) are granted on the bucket ARN unconditionally. Tests: - 27 unit tests in `test/iceberg-table.test.ts`. - `test/integ.iceberg-table.ts` provisions a database, a warehouse bucket, and two Iceberg tables (one with partitions + sort + identifier ids + merge-on-read; one with hourly partitioning). Snapshot committed under `test/integ.iceberg-table.js.snapshot/`. The integ-runner deploy ran end-to-end in us-east-1 (81s) and verified the resulting Glue / S3 state. Documented limitations: - `OpenTableFormatInput.IcebergInput.metadataOperation` only accepts `CREATE` in CFN; subsequent deploys flow through Glue's `UpdateTable` path. - The construct does not detect cross-deploy field-id reuse. Pin column ids explicitly on tables you intend to evolve and treat dropped ids as retired forever. - Dropping a partition-source column requires an Iceberg `void` transform intermediate that CFN cannot express. The construct accepts the change but Athena queries against the result will fail — drop the partition first, then the column in a subsequent deploy. fixes aws#29660

ksco92 had a problem deploying to automation May 24, 2026 14:38 — with GitHub Actions Failure

ksco92 temporarily deployed to automation May 24, 2026 14:38 — with GitHub Actions Inactive

github-actions Bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 labels May 24, 2026

ksco92 temporarily deployed to automation May 24, 2026 14:39 — with GitHub Actions Inactive

ksco92 mentioned this pull request May 24, 2026

(glue): Iceberg Table Support on S3Table construct #29660

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(glue-alpha): support Apache Iceberg tables#37988

feat(glue-alpha): support Apache Iceberg tables#37988
ksco92 wants to merge 1 commit into
aws:mainfrom
ksco92:feat/glue-alpha-iceberg-table

ksco92 commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ksco92 commented May 24, 2026

Issue # (if applicable)

Reason for this change

Description of changes

Description of how you validated changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant