Task Summary
Both the Scala (IcebergUtil.createRestCatalog in common/workflow-core/)
and Python (iceberg_utils.create_rest_catalog in amber/src/main/python/)
helpers pass hardcoded s3.endpoint, s3.region, s3.path-style-access,
s3.access-key-id and s3.secret-access-key (sourced from
StorageConfig.s3* / StorageConfig.S3_*) when initializing the REST
catalog client. These entries are inert at runtime: the REST catalog server
returns the same settings per-table in the loadTable config response,
and the Iceberg / pyiceberg client ignores any locally-set credentials
whenever the server's response includes s3.remote-signing-enabled: true
(Lakekeeper's default).
They also block per-warehouse storage isolation, where each warehouse may
point to a different bucket / region / credentials.
Proposed change
Drop the hardcoded s3.* entries from both helpers' property maps; pass
only warehouse, the catalog uri, and (Scala only) the FileIO
implementation hint — the catalog server stays the source of truth for S3
settings. Update the Python caller in iceberg_catalog_instance.py to drop
the now-unused arguments.
S3 settings remain on StorageConfig and continue to flow into Python via
PythonWorkflowWorker; they're still consumed by the unrelated
pytexera/storage/large_binary_manager.py boto3 client, which writes to a
non-Iceberg bucket and is out of scope for this task.
Task Type
Refactor / Cleanup
Task Summary
Both the Scala (
IcebergUtil.createRestCatalogincommon/workflow-core/)and Python (
iceberg_utils.create_rest_cataloginamber/src/main/python/)helpers pass hardcoded
s3.endpoint,s3.region,s3.path-style-access,s3.access-key-idands3.secret-access-key(sourced fromStorageConfig.s3*/StorageConfig.S3_*) when initializing the RESTcatalog client. These entries are inert at runtime: the REST catalog server
returns the same settings per-table in the
loadTableconfigresponse,and the Iceberg / pyiceberg client ignores any locally-set credentials
whenever the server's response includes
s3.remote-signing-enabled: true(Lakekeeper's default).
They also block per-warehouse storage isolation, where each warehouse may
point to a different bucket / region / credentials.
Proposed change
Drop the hardcoded
s3.*entries from both helpers' property maps; passonly
warehouse, the cataloguri, and (Scala only) the FileIOimplementation hint — the catalog server stays the source of truth for S3
settings. Update the Python caller in
iceberg_catalog_instance.pyto dropthe now-unused arguments.
S3 settings remain on
StorageConfigand continue to flow into Python viaPythonWorkflowWorker; they're still consumed by the unrelatedpytexera/storage/large_binary_manager.pyboto3 client, which writes to anon-Iceberg bucket and is out of scope for this task.
Task Type
Refactor / Cleanup