Skip to content

[WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287]#15341

Draft
varun-lakhyani wants to merge 3 commits intoapache:mainfrom
varun-lakhyani:spark-readers
Draft

[WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287]#15341
varun-lakhyani wants to merge 3 commits intoapache:mainfrom
varun-lakhyani:spark-readers

Conversation

@varun-lakhyani
Copy link
Contributor

@varun-lakhyani varun-lakhyani commented Feb 16, 2026

Github issue #15287
Adding AyncTaskOpener which executes an asynchronous opening of tasks and storing them in a queue which base reader can access and take iterators directly from there instead of opening it then.

For lot of smaller files where IO/open overhead is comparable to processing time, parallel opening could make it work faster.

Keeping this specific test in mind to start working on this (Compaction of 10 small files).

[WIP] High level design and rough implementation done

Pending: Deciding when to go through async path, Exact implementation along with edge cases like making ALL_TASKS_COMPLETE marker surely not reproducible by user, testing it, benchmarking against synchronous flow to verify its betterment, flow or design changes if any.

@github-actions github-actions bot added the spark label Feb 16, 2026
@varun-lakhyani varun-lakhyani changed the title [WIP] Spark: Make Spark readers function asynchronously for many small files [Issue - 15287] [WIP] Spark: Make Spark readers function asynchronously for many small files [#15287] Feb 16, 2026
@varun-lakhyani varun-lakhyani changed the title [WIP] Spark: Make Spark readers function asynchronously for many small files [#15287] [WIP] Spark: Make Spark readers function asynchronously for many small files [issue #15287] Feb 16, 2026
@varun-lakhyani varun-lakhyani marked this pull request as draft February 16, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant