Skip to content

Iceberg REST catalog should not hardcode S3 properties at init #4987

@mengw15

Description

@mengw15

Task Summary

Both the Scala (IcebergUtil.createRestCatalog in common/workflow-core/)
and Python (iceberg_utils.create_rest_catalog in amber/src/main/python/)
helpers pass hardcoded s3.endpoint, s3.region, s3.path-style-access,
s3.access-key-id and s3.secret-access-key (sourced from
StorageConfig.s3* / StorageConfig.S3_*) when initializing the REST
catalog client. These entries are inert at runtime: the REST catalog server
returns the same settings per-table in the loadTable config response,
and the Iceberg / pyiceberg client ignores any locally-set credentials
whenever the server's response includes s3.remote-signing-enabled: true
(Lakekeeper's default).

They also block per-warehouse storage isolation, where each warehouse may
point to a different bucket / region / credentials.

Proposed change

Drop the hardcoded s3.* entries from both helpers' property maps; pass
only warehouse, the catalog uri, and (Scala only) the FileIO
implementation hint — the catalog server stays the source of truth for S3
settings. Update the Python caller in iceberg_catalog_instance.py to drop
the now-unused arguments.

S3 settings remain on StorageConfig and continue to flow into Python via
PythonWorkflowWorker; they're still consumed by the unrelated
pytexera/storage/large_binary_manager.py boto3 client, which writes to a
non-Iceberg bucket and is out of scope for this task.

Task Type

Refactor / Cleanup

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions