Skip to content

feat: use java virtual threads for virtual storage load on demand thread pool by default#19396

Merged
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:virtual-threads-segment-cache-load-on-demand
May 6, 2026
Merged

feat: use java virtual threads for virtual storage load on demand thread pool by default#19396
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:virtual-threads-segment-cache-load-on-demand

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

This PR switches the SegmentLocalCacheManager on-demand load executor (used in virtual-storage mode) from a fixed platform-thread pool to one virtual thread per task with a Semaphore for backpressure, and converts SegmentCacheEntry from synchronized to ReentrantLock so virtual threads park rather than pinning their carrier during the long-running mount path.

The on-demand load work is overwhelmingly socket wait against deep storage with a small CPU portion at the end (factorize/mmap setup). Virtual threads let us fan out hundreds of in-flight loads cheaply, with the semaphore providing a similar kind of backpressure the pool size used to give implicitly. This becomes especially relevant once partial loading lands and the pool starts handling many smaller per-internal-segment-file range read calls instead of one big mount per segment.

The ReentrantLock conversion in SegmentCacheEntry is what makes the virtual threads switch actually pay off on Java 21. Without it, the entire mount() body runs inside synchronized (this), pinning the carrier thread and effectively
capping concurrent mounts at the carrier-pool size regardless of how many virtual threads are spawned. ReentrantLock parks the virtual thread properly. Once Druid's minimum is Java 24+, JEP 491 makes this conversion redundant, but it's a mechanical minimum-risk change in the meantime.

changes:

  • switch default SegmentLocalCacheManager.virtualStorageLoadOnDemandExec to virtual threads with a Semaphore for backpressure
  • added SegmentLoaderConfig.virtualStorageUseVirtualThreads (druid.segmentCache.virtualStorageUseVirtualThreads) config that defaults to true, but allows opt-out via setting to false
  • raise default SegmentLoaderConfig.virtualStorageLoadThreads default to Math.max(32, 4 * cores), sized as ~4x lookahead per processing thread
  • convert SegmentCacheEntry from synchronized to ReentrantLock so virtual threads park instead of pinning the carrier during mount

…ol by default

changes:
* switch default `SegmentLocalCacheManager.virtualStorageLoadOnDemandExec` to virtual threads with a `Semaphore` for backpressure
* added `SegmentLoaderConfig.virtualStorageUseVirtualThreads` (`druid.segmentCache.virtualStorageUseVirtualThreads`) config that defaults to true, but allows opt-out via setting to false
* raise default `SegmentLoaderConfig.virtualStorageLoadThreads` default to Math.max(32, 4 * cores), sized as ~4x lookahead per processing thread
* convert `SegmentCacheEntry` from `synchronized` to `ReentrantLock` so virtual threads park instead of pinning the carrier during mount
@clintropolis clintropolis changed the title use java virtual threads for virtual storage load on demand thread pool by default feat: use java virtual threads for virtual storage load on demand thread pool by default May 1, 2026
@jtuglu1
Copy link
Copy Markdown
Contributor

jtuglu1 commented May 1, 2026

Any benchmarks?

@clintropolis
Copy link
Copy Markdown
Member Author

Any benchmarks?

no, not really yet, we have the option to go back to the previous behavior if it is worse for any reason.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 2, 2026

Once Druid's minimum is Java 24+, JEP 491 makes this conversion redundant, but it's a mechanical minimum-risk change in the meantime.

A comment about this in the code would be nice. Also: does running on a Java 25 runtime get this benefit now, or do we need to target Java 25 too?

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good and I appreciate the emergency outlet to use the legacy path. one testing nit:

  • Can we parameterize the tests that flex this code to use both the new default as well as the legacy thread model? Maybe overkill since the only diff is in the construction of the executor... which is why it feels nitty

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

This is an automated review by Codex GPT-5

if (config.isVirtualStorageUseVirtualThreads()) {
log.info(
"Using virtual storage mode with virtual threads - max concurrent on demand loads: [%d].",
config.getVirtualStorageLoadThreads()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Validate virtualStorageLoadThreads before using it as a semaphore limit

With virtualStorageUseVirtualThreads=true, setting druid.segmentCache.virtualStorageLoadThreads to 0 now creates a Semaphore with zero permits. Every on-demand load task then blocks forever in acquireUninterruptibly(), so queries wait until timeout instead of failing fast at startup. The old fixed-thread-pool path rejected 0 via Executors.newFixedThreadPool(0). Please validate virtualStorageLoadThreads > 0 for both modes, or otherwise preserve fail-fast behavior.

@clintropolis
Copy link
Copy Markdown
Member Author

Also: does running on a Java 25 runtime get this benefit now, or do we need to target Java 25 too?

I am unsure if it would have needed to target java 25 to get the benefit if this PR instead was just making the part of the change of switching the pool to virtual but leaving the mount stuff using synchronized. Since I changed in this PR to not use synchronized I didn't bother to look. Are you curious whether we should consider not switching from using sychronized or just curious in general?

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.


This is an automated review by Codex GPT-5

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.


This is an automated review by Codex GPT-5

@clintropolis clintropolis merged commit a9ca2da into apache:master May 6, 2026
87 of 90 checks passed
@clintropolis clintropolis deleted the virtual-threads-segment-cache-load-on-demand branch May 6, 2026 22:19
@github-actions github-actions Bot added this to the 38.0.0 milestone May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants