fix: reduce catalog round-trips in `IcebergDocument.hasNext()` to improve result read performance by kunwp1 · Pull Request #4293 · apache/texera

kunwp1 · 2026-03-13T21:49:55Z

What changes were proposed in this PR?

This PR addresses #4289 by optimizing IcebergDocument.hasNext() to minimize redundant catalog round-trips. By introducing a guard condition, we ensure seekToUsableFile() and its subsequent catalog calls are only triggered when the current record iterator is fully exhausted.

If the current file has more records, return true immediately.
Only if the current file is exhausted, check usableFileIterator.
Only if usableFileIterator is also empty, call seekToUsableFile().

Any related issues, documentation, discussions?

Fix #4289

How was this PR tested?

Import and use Untitled workflow (9).json.
Use a CSV file containing 1M records.
Set storage.iceberg.table.commit.batch-size to 1M (matching the total record count).
Compare the performance before fix and after fix. For me it was 2m 45s vs 36s.

Was this PR authored or co-authored using generative AI tooling?

No.

chenlica · 2026-03-14T03:17:32Z

@Xiao-zhen-Liu @bobbai00 Please review it.

bobbai00

Scala side LGTM! Can you also check the python side's iceberg document's corresponding logic, change it and teset it?

kunwp1 · 2026-03-16T19:09:37Z

Scala side LGTM! Can you also check the python side's iceberg document's corresponding logic, change it and teset it?

I checked the python side but seems like we don't have such corresponding logic. Can you confirm?

bobbai00 · 2026-03-16T19:42:59Z

Scala side LGTM! Can you also check the python side's iceberg document's corresponding logic, change it and teset it?

I checked the python side but seems like we don't have such corresponding logic. Can you confirm?

Under this folder: https://github.com/apache/texera/tree/main/amber/src/main/python/core/storage/iceberg

For example:
https://github.com/apache/texera/blob/main/amber/src/main/python/core/storage/iceberg/iceberg_document.py and

kunwp1 · 2026-03-16T19:45:34Z

Scala side LGTM! Can you also check the python side's iceberg document's corresponding logic, change it and teset it?

I checked the python side but seems like we don't have such corresponding logic. Can you confirm?

Under this folder: https://github.com/apache/texera/tree/main/amber/src/main/python/core/storage/iceberg

For example: https://github.com/apache/texera/blob/main/amber/src/main/python/core/storage/iceberg/iceberg_document.py and

I checked those files and they don't have the problematic logic. Seems like the implementation on the python side is different. We don't have to fix the python side.

Fix

594eccd

kunwp1 requested a review from bobbai00 March 13, 2026 21:49

kunwp1 self-assigned this Mar 13, 2026

Merge branch 'main' into chris-fix-4289

f52a098

github-actions bot added the common label Mar 13, 2026

chenlica requested a review from Xiao-zhen-Liu March 14, 2026 03:17

Merge branch 'main' into chris-fix-4289

62befde

bobbai00 changed the title ~~fix: Optimized IcebergDocument.hasNext() to reduce catalog round-trips~~ fix: reduce catalog round-trips in IcebergDocument.hasNext() to improve result writes performance Mar 14, 2026

bobbai00 requested changes Mar 14, 2026

View reviewed changes

kunwp1 changed the title ~~fix: reduce catalog round-trips in IcebergDocument.hasNext() to improve result writes performance~~ fix: reduce catalog round-trips in IcebergDocument.hasNext() to improve result read performance Mar 16, 2026

Merge branch 'main' into chris-fix-4289

ae46fb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce catalog round-trips in `IcebergDocument.hasNext()` to improve result read performance#4293

fix: reduce catalog round-trips in `IcebergDocument.hasNext()` to improve result read performance#4293
kunwp1 wants to merge 4 commits intoapache:mainfrom
kunwp1:chris-fix-4289

kunwp1 commented Mar 13, 2026

Uh oh!

chenlica commented Mar 14, 2026

Uh oh!

bobbai00 left a comment

Uh oh!

kunwp1 commented Mar 16, 2026

Uh oh!

bobbai00 commented Mar 16, 2026

Uh oh!

kunwp1 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kunwp1 commented Mar 13, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

chenlica commented Mar 14, 2026

Uh oh!

bobbai00 left a comment

Choose a reason for hiding this comment

Uh oh!

kunwp1 commented Mar 16, 2026

Uh oh!

bobbai00 commented Mar 16, 2026

Uh oh!

kunwp1 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kunwp1 commented Mar 16, 2026 •

edited

Loading