Skip to content

fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs#4409

Merged
aglinxinyuan merged 5 commits into
mainfrom
xinyuan-fix-python-warehouse
Apr 20, 2026
Merged

fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs#4409
aglinxinyuan merged 5 commits into
mainfrom
xinyuan-fix-python-warehouse

Conversation

@aglinxinyuan
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan commented Apr 18, 2026

What changes were proposed in this PR?

Iceberg tables created via the Python API could not be read back on the Java/Scala side because the two runtimes were registering the Postgres JDBC catalog with different warehouse values, which PyIceberg persists into the table metadata.

The Python side (create_postgres_catalog in amber/src/main/python/core/storage/iceberg/iceberg_utils.py) was prefixing the same path with file://, so tables created by Python UDFs were registered under file:///... while Scala-side lookups expected the un-prefixed path.

This caused subsequent reads of Python-written Iceberg tables to fail (wrong/unresolvable warehouse path in the metadata pointer).

Drop the file:// prefix in create_postgres_catalog so Python matches the Scala catalog's warehouse value exactly. PyIceberg accepts a plain local path here and will treat it as a local filesystem warehouse, consistent with the Scala JdbcCatalog configuration.

Any related issues, documentation, discussions?

Closes #4408

How was this PR tested?

Added a test case and tested manually:

  1. Create an Iceberg table from a Python UDF operator and confirm it can be read back from the Scala/Java engine in the same workflow.
  2. Re-run existing Iceberg-backed workflows (Python-write → Python-read and Python-write → Scala-read) and confirm no regressions.
  3. Verify on Windows that the warehouse path passed in (with colon stripped) still resolves correctly from Python.

Was this PR authored or co-authored using generative AI tooling?

No.

@aglinxinyuan aglinxinyuan self-assigned this Apr 18, 2026
Copilot AI review requested due to automatic review settings April 18, 2026 04:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an Iceberg interoperability bug where Python-created tables could not be read by the Java/Scala engine due to a mismatched Postgres JDBC catalog warehouse value.

Changes:

  • Align PyIceberg SqlCatalog warehouse configuration with the Java/Scala JdbcCatalog by removing the file:// prefix.
  • Ensure Iceberg table metadata written from Python uses the same warehouse string the Scala side expects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amber/src/main/python/core/storage/iceberg/iceberg_utils.py
Comment thread amber/src/main/python/core/storage/iceberg/iceberg_utils.py
Copy link
Copy Markdown
Contributor

@Xiao-zhen-Liu Xiao-zhen-Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aglinxinyuan aglinxinyuan merged commit 71ed5aa into main Apr 20, 2026
11 checks passed
@aglinxinyuan aglinxinyuan deleted the xinyuan-fix-python-warehouse branch April 20, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg document created by python API cannot be read due to wrong warehouse path

3 participants