feat: partial segment cache infrastructure#19496
Conversation
changes: * add `PartialSegmentMetadataCacheEntry` a `CacheEntry` that range-reads the V10 header on mount, constructs `PartialSegmentFileMapperV10`, and shrinks its reservation to actual on-disk size * add `PartialSegmentBundleCacheEntry` and `PartialSegmentBundleCacheEntryIdentifier` are `CacheEntry` associated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via the `StorageLocation` methods (weak reference holds on the parent cache entries) and reference-counted usage references * add `PartialSegmentCacheBootstrap` a helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundles * add `ResizableCacheEntry` interface and `StorageLocation.adjustReservation` (shrink-only) so the metadata entry can tighten its reservation post-mount * rename `SegmentFileBuilder.startFileGroup` → `startFileBundle`; introduce `ROOT_BUNDLE_NAME` as the default bundle for containers written without an explicit declaration * rename json field `SegmentFileContainerMetadata.fileGroup` → `bundle`; now non-null via getter, normalizes to `ROOT_BUNDLE_NAME` in the constructor, default value omitted from JSON using a custom `JsonInclude` filter * Extract shared `DirectoryBackedRangeReader` and `CountingRangeReader` test helpers; consolidate duplicates across processing + server tests
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 1 |
| P2 | 1 |
| P3 | 0 |
| Total | 2 |
Reviewed 24 of 24 changed files.
This is an automated review by Codex GPT-5.5
| @JsonProperty("startOffset") long startOffset, | ||
| @JsonProperty("size") long size, | ||
| @JsonProperty("fileGroup") @Nullable String fileGroup | ||
| @JsonProperty("bundle") @Nullable String bundle |
There was a problem hiding this comment.
[P1] Preserve old fileGroup metadata on read
This constructor now only binds the new bundle JSON property, so V10 metadata already written by current master with "fileGroup":"projA" is read as null and normalized to ROOT_BUNDLE_NAME. After upgrade, grouped containers from existing segments are no longer discoverable as their original projection/base bundle, so partial bootstrap/acquire paths will either restore the wrong root bundle or fail to find the requested bundle. Please accept both names, for example with a fileGroup alias/backcompat creator path, while writing only bundle if desired.
There was a problem hiding this comment.
I added this a few days ago in #19468, so it hasn't been officially released yet, just renaming it here to have more consistent naming between stuff. Also, if any segments written with fileGroup happen to exist, they will still work more or less properly, they will just fallback to behaving as older segments written prior to that, where all containers will just have a single 'bundle' when used with partial downloads, which I think is ok (V10 is not on by default yet).
| return; | ||
| } | ||
|
|
||
| ((ResizableCacheEntry) entry).resizeReservation(newSize); |
There was a problem hiding this comment.
[P2] Update held weak-byte accounting when shrinking
addWeakReservationHold records currHoldBytes using the weak entry's size at hold acquisition. If that held weak entry later calls adjustReservation, this code shrinks currWeakSizeBytes and the entry size but leaves currHoldBytes at the old value; when the hold closes, trackWeakRelease subtracts only the new smaller size, permanently inflating VirtualStorageLocationStats.getHoldBytes(). This affects the new partial metadata flow whenever a weak-reserved metadata entry shrinks while its bootstrap/acquire hold is active. The shrink path needs to adjust hold bytes for active holds, and the tests should cover held weak entries.
FrankChen021
left a comment
There was a problem hiding this comment.
Handled the follow-up on fileGroup to bundle compatibility; no further inline reply is needed.
Reviewed 24 of 24 changed files.
This is an automated review by Codex GPT-5.5
| ); | ||
| } | ||
| if (containerFile.exists()) { | ||
| containerFile.delete(); |
There was a problem hiding this comment.
delete() returns false when it fails. Is ignoring it (as is done here) good, or should we throw an exception or do something else?
There was a problem hiding this comment.
oops, I think it should log.warn. We could throw, it would later be eaten and logged and we would leave a messier mid-eviction state than if we power through it and do as much as we can. We should definitely log about it though, so I'll add that.
| * @param jsonMapper used to parse the header | ||
| * @param location the storage location these entries belong to; the metadata entry is registered as | ||
| * static and bundle entries are registered as weak | ||
| * @throws IllegalStateException if the expected header file is missing or unreadable |
There was a problem hiding this comment.
seems to actually throw DruidException
| continue; | ||
| } | ||
| final String fileName = entry.getKey(); | ||
| if (downloadedFiles.remove(fileName)) { |
There was a problem hiding this comment.
Can evictContainer race with mapFile? Does something prevent them from running concurrently with each other?
I ask because it looks like this function (evictContainer) is removing the container from containers first and then removes the internal files from downloadedFiles. On the other hand, mapFile checks downloadedFiles first, and then if the file is there, it pulls the container from containers and dereferences it. Maybe it'd be null at that point if a concurrent evictContainer was happening?
There was a problem hiding this comment.
Well, i think technically it can if callers are just ad-hoc doing stuff with a file-mapper. In practice, the only way to get a hold of this thing in a way that can call mapFile is via the cache entry acquire-reference stuff, which then blocks the callers of evictContainer from trying to call it. I should add some javadocs to this method that callers are responsible for guarding access to unloading stuff from the partial mapper and ensuring that this cannot happen.
| * {@link #ensureFilesAvailable}) call is in flight for any file in this container. This is enforced one layer up | ||
| * by the cache-entry refcount: {@code PartialSegmentBundleCacheEntry} only invokes {@code evictContainer} from its | ||
| * {@code doActualUnmount} callback, which fires only after every reference acquired via | ||
| * {@code acquireReference()} has been closed. Bypassing that gate is dangerous — {@link ByteBufferUtils#unmap} |
There was a problem hiding this comment.
I am approving this PR but wouldn't mind if this comment was tidied up a bit prior to merge. It's a bit verbose.
Description
This PR adds the cache-layer primitives that will allow mounting a V10 segment as multiple entries on one storage location. The models is that there will be one always-resident metadata entry that holds the parsed header + file mapper (
PartialSegmentMetadataCacheEntry), plus one entry per group of containers associated with a 'bundle' of files in the v10 segment (PartialSegmentBundleCacheEntry). Each bundle is independently mountable and so importantly, evictable. The entries share one on-disk layout with the existing eager-load path. No partial-awareSegmentorCursorFactoryyet, those will be added in a follow-up PR.changes:
PartialSegmentMetadataCacheEntryaCacheEntrythat range-reads the V10 header on mount, constructsPartialSegmentFileMapperV10, and shrinks its reservation to actual on-disk sizePartialSegmentBundleCacheEntryandPartialSegmentBundleCacheEntryIdentifierareCacheEntryassociated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via theStorageLocationmethods (weak reference holds on the parent cache entries) and reference-counted usage referencesPartialSegmentCacheBootstrapa helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundlesResizableCacheEntryinterface andStorageLocation.adjustReservation(shrink-only) so the metadata entry can tighten its reservation post-mountSegmentFileBuilder.startFileGroup→startFileBundle; introduceROOT_BUNDLE_NAMEas the default bundle for containers written without an explicit declaration * rename json fieldSegmentFileContainerMetadata.fileGroup→bundle; now non-null via getter, normalizes toROOT_BUNDLE_NAMEin the constructor, default value omitted from JSON using a customJsonIncludefilterDirectoryBackedRangeReaderandCountingRangeReadertest helpers; consolidate duplicates across processing + server tests