Skip to content

feat: partial segment cache infrastructure#19496

Merged
clintropolis merged 9 commits into
apache:masterfrom
clintropolis:partial-load-segment-cache-entries
May 23, 2026
Merged

feat: partial segment cache infrastructure#19496
clintropolis merged 9 commits into
apache:masterfrom
clintropolis:partial-load-segment-cache-entries

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

This PR adds the cache-layer primitives that will allow mounting a V10 segment as multiple entries on one storage location. The models is that there will be one always-resident metadata entry that holds the parsed header + file mapper (PartialSegmentMetadataCacheEntry), plus one entry per group of containers associated with a 'bundle' of files in the v10 segment (PartialSegmentBundleCacheEntry). Each bundle is independently mountable and so importantly, evictable. The entries share one on-disk layout with the existing eager-load path. No partial-aware Segment or CursorFactory yet, those will be added in a follow-up PR.

changes:

  • add PartialSegmentMetadataCacheEntry a CacheEntry that range-reads the V10 header on mount, constructs PartialSegmentFileMapperV10, and shrinks its reservation to actual on-disk size
  • add PartialSegmentBundleCacheEntry and PartialSegmentBundleCacheEntryIdentifier are CacheEntry associated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via the StorageLocation methods (weak reference holds on the parent cache entries) and reference-counted usage references
  • add PartialSegmentCacheBootstrap a helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundles
  • add ResizableCacheEntry interface and StorageLocation.adjustReservation (shrink-only) so the metadata entry can tighten its reservation post-mount
  • rename SegmentFileBuilder.startFileGroupstartFileBundle; introduce ROOT_BUNDLE_NAME as the default bundle for containers written without an explicit declaration * rename json field SegmentFileContainerMetadata.fileGroupbundle; now non-null via getter, normalizes to ROOT_BUNDLE_NAME in the constructor, default value omitted from JSON using a custom JsonInclude filter
  • Extract shared DirectoryBackedRangeReader and CountingRangeReader test helpers; consolidate duplicates across processing + server tests

changes:
* add `PartialSegmentMetadataCacheEntry` a `CacheEntry` that range-reads the V10 header on mount, constructs `PartialSegmentFileMapperV10`, and shrinks its reservation to actual on-disk size
* add `PartialSegmentBundleCacheEntry` and `PartialSegmentBundleCacheEntryIdentifier` are `CacheEntry` associated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via the `StorageLocation` methods (weak reference holds on the parent cache entries) and reference-counted usage references
* add `PartialSegmentCacheBootstrap` a helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundles
* add `ResizableCacheEntry` interface and `StorageLocation.adjustReservation` (shrink-only) so the metadata entry can tighten its reservation post-mount
* rename `SegmentFileBuilder.startFileGroup` → `startFileBundle`; introduce `ROOT_BUNDLE_NAME` as the default bundle for containers written without an explicit declaration                                                              * rename json field `SegmentFileContainerMetadata.fileGroup` → `bundle`; now non-null via getter, normalizes to `ROOT_BUNDLE_NAME` in the constructor, default value omitted from JSON using a custom `JsonInclude` filter
* Extract shared `DirectoryBackedRangeReader` and `CountingRangeReader` test helpers; consolidate duplicates across processing + server tests
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 1
P2 1
P3 0
Total 2

Reviewed 24 of 24 changed files.


This is an automated review by Codex GPT-5.5

@JsonProperty("startOffset") long startOffset,
@JsonProperty("size") long size,
@JsonProperty("fileGroup") @Nullable String fileGroup
@JsonProperty("bundle") @Nullable String bundle
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Preserve old fileGroup metadata on read

This constructor now only binds the new bundle JSON property, so V10 metadata already written by current master with "fileGroup":"projA" is read as null and normalized to ROOT_BUNDLE_NAME. After upgrade, grouped containers from existing segments are no longer discoverable as their original projection/base bundle, so partial bootstrap/acquire paths will either restore the wrong root bundle or fail to find the requested bundle. Please accept both names, for example with a fileGroup alias/backcompat creator path, while writing only bundle if desired.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this a few days ago in #19468, so it hasn't been officially released yet, just renaming it here to have more consistent naming between stuff. Also, if any segments written with fileGroup happen to exist, they will still work more or less properly, they will just fallback to behaving as older segments written prior to that, where all containers will just have a single 'bundle' when used with partial downloads, which I think is ok (V10 is not on by default yet).

return;
}

((ResizableCacheEntry) entry).resizeReservation(newSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Update held weak-byte accounting when shrinking

addWeakReservationHold records currHoldBytes using the weak entry's size at hold acquisition. If that held weak entry later calls adjustReservation, this code shrinks currWeakSizeBytes and the entry size but leaves currHoldBytes at the old value; when the hold closes, trackWeakRelease subtracts only the new smaller size, permanently inflating VirtualStorageLocationStats.getHoldBytes(). This affects the new partial metadata flow whenever a weak-reserved metadata entry shrinks while its bootstrap/acquire hold is active. The shrink path needs to adjust hold bytes for active holds, and the tests should cover held weak entries.

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled the follow-up on fileGroup to bundle compatibility; no further inline reply is needed.

Reviewed 24 of 24 changed files.


This is an automated review by Codex GPT-5.5

);
}
if (containerFile.exists()) {
containerFile.delete();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() returns false when it fails. Is ignoring it (as is done here) good, or should we throw an exception or do something else?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I think it should log.warn. We could throw, it would later be eaten and logged and we would leave a messier mid-eviction state than if we power through it and do as much as we can. We should definitely log about it though, so I'll add that.

* @param jsonMapper used to parse the header
* @param location the storage location these entries belong to; the metadata entry is registered as
* static and bundle entries are registered as weak
* @throws IllegalStateException if the expected header file is missing or unreadable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to actually throw DruidException

continue;
}
final String fileName = entry.getKey();
if (downloadedFiles.remove(fileName)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can evictContainer race with mapFile? Does something prevent them from running concurrently with each other?

I ask because it looks like this function (evictContainer) is removing the container from containers first and then removes the internal files from downloadedFiles. On the other hand, mapFile checks downloadedFiles first, and then if the file is there, it pulls the container from containers and dereferences it. Maybe it'd be null at that point if a concurrent evictContainer was happening?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, i think technically it can if callers are just ad-hoc doing stuff with a file-mapper. In practice, the only way to get a hold of this thing in a way that can call mapFile is via the cache entry acquire-reference stuff, which then blocks the callers of evictContainer from trying to call it. I should add some javadocs to this method that callers are responsible for guarding access to unloading stuff from the partial mapper and ensuring that this cannot happen.

* {@link #ensureFilesAvailable}) call is in flight for any file in this container. This is enforced one layer up
* by the cache-entry refcount: {@code PartialSegmentBundleCacheEntry} only invokes {@code evictContainer} from its
* {@code doActualUnmount} callback, which fires only after every reference acquired via
* {@code acquireReference()} has been closed. Bypassing that gate is dangerous — {@link ByteBufferUtils#unmap}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this PR but wouldn't mind if this comment was tidied up a bit prior to merge. It's a bit verbose.

@clintropolis clintropolis merged commit 215f415 into apache:master May 23, 2026
88 of 90 checks passed
@clintropolis clintropolis deleted the partial-load-segment-cache-entries branch May 23, 2026 20:55
@github-actions github-actions Bot added this to the 38.0.0 milestone May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants