fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs by aglinxinyuan · Pull Request #4409 · apache/texera

aglinxinyuan · 2026-04-18T04:47:40Z

What changes were proposed in this PR?

Iceberg tables created via the Python API could not be read back on the Java/Scala side because the two runtimes were registering the Postgres JDBC catalog with different warehouse values, which PyIceberg persists into the table metadata.

The Python side (create_postgres_catalog in amber/src/main/python/core/storage/iceberg/iceberg_utils.py) was prefixing the same path with file://, so tables created by Python UDFs were registered under file:///... while Scala-side lookups expected the un-prefixed path.

This caused subsequent reads of Python-written Iceberg tables to fail (wrong/unresolvable warehouse path in the metadata pointer).

Drop the file:// prefix in create_postgres_catalog so Python matches the Scala catalog's warehouse value exactly. PyIceberg accepts a plain local path here and will treat it as a local filesystem warehouse, consistent with the Scala JdbcCatalog configuration.

Any related issues, documentation, discussions?

Closes #4408

How was this PR tested?

Added a test case and tested manually:

Create an Iceberg table from a Python UDF operator and confirm it can be read back from the Scala/Java engine in the same workflow.
Re-run existing Iceberg-backed workflows (Python-write → Python-read and Python-write → Scala-read) and confirm no regressions.
Verify on Windows that the warehouse path passed in (with colon stripped) still resolves correctly from Python.

Was this PR authored or co-authored using generative AI tooling?

No.

Copilot

Pull request overview

Fixes an Iceberg interoperability bug where Python-created tables could not be read by the Java/Scala engine due to a mismatched Postgres JDBC catalog warehouse value.

Changes:

Align PyIceberg SqlCatalog warehouse configuration with the Java/Scala JdbcCatalog by removing the file:// prefix.
Ensure Iceberg table metadata written from Python uses the same warehouse string the Scala side expects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Xiao-zhen-Liu

LGTM

init

14f25ed

aglinxinyuan requested a review from Xiao-zhen-Liu April 18, 2026 04:47

aglinxinyuan self-assigned this Apr 18, 2026

Copilot AI review requested due to automatic review settings April 18, 2026 04:47

github-actions Bot added engine python labels Apr 18, 2026

Copilot started reviewing on behalf of aglinxinyuan April 18, 2026 04:48 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Comment thread amber/src/main/python/core/storage/iceberg/iceberg_utils.py

Comment thread amber/src/main/python/core/storage/iceberg/iceberg_utils.py

aglinxinyuan added 3 commits April 17, 2026 23:18

add unit test

85ef8d2

fix fmt

ff672ad

Merge branch 'main' into xinyuan-fix-python-warehouse

cdea138

Xiao-zhen-Liu approved these changes Apr 20, 2026

View reviewed changes

Merge branch 'main' into xinyuan-fix-python-warehouse

9b63a96

aglinxinyuan merged commit 71ed5aa into main Apr 20, 2026
11 checks passed

aglinxinyuan deleted the xinyuan-fix-python-warehouse branch April 20, 2026 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs#4409

fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs#4409
aglinxinyuan merged 5 commits into
mainfrom
xinyuan-fix-python-warehouse

aglinxinyuan commented Apr 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Xiao-zhen-Liu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aglinxinyuan commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Xiao-zhen-Liu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aglinxinyuan commented Apr 18, 2026 •

edited

Loading