[WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287] by varun-lakhyani · Pull Request #15341 · apache/iceberg

varun-lakhyani · 2026-02-16T20:21:04Z

Github issue #15287
Adding AyncTaskOpener which executes an asynchronous opening of tasks and storing them in a queue which base reader can access and take iterators directly from there instead of opening it then.

For lot of smaller files where IO/open overhead is comparable to processing time, parallel opening could make it work faster.

Keeping this specific test in mind to start working on this (Compaction of 10 small files).

iceberg/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java

Line 194 in a2802c4

public void testRewriteDataFilesOnNonPartitionTable() {

[WIP] High level design and rough implementation done

Pending: Deciding when to go through async path, Exact implementation along with edge cases like making ALL_TASKS_COMPLETE marker surely not reproducible by user, testing it, benchmarking against synchronous flow to verify its betterment, flow or design changes if any.

High level changes

c0c3b34

github-actions bot added the spark label Feb 16, 2026

varun-lakhyani added 2 commits February 17, 2026 02:21

final check

4d8be4c

minor typo

e581035

varun-lakhyani changed the title ~~[WIP] Spark: Make Spark readers function asynchronously for many small files [Issue - 15287]~~ [WIP] Spark: Make Spark readers function asynchronously for many small files [#15287] Feb 16, 2026

varun-lakhyani changed the title ~~[WIP] Spark: Make Spark readers function asynchronously for many small files [#15287]~~ [WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287] Feb 16, 2026

varun-lakhyani marked this pull request as draft February 16, 2026 22:16

varun-lakhyani mentioned this pull request Feb 16, 2026

Make Spark readers function asynchronously for many small files. #15287

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287]#15341

[WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287]#15341
varun-lakhyani wants to merge 3 commits intoapache:mainfrom
varun-lakhyani:spark-readers

varun-lakhyani commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

varun-lakhyani commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

varun-lakhyani commented Feb 16, 2026 •

edited

Loading