[core] Add manifest sort compaction#7826
Conversation
0b3134e to
cfe50a5
Compare
cfe50a5 to
9540939
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
Review: [core] Add manifest sort compaction
The analysis covers CoreOptions (new options), FileStoreCommitImpl (integration), ManifestFileMerger (core algorithm), and related tests.
Key Findings
1. Stale entries in createdManifestsForAbort
cleanUnusedCreatedManifestFiles deletes intermediate manifest files from the filesystem but does not remove their entries from newFilesForAbort. If an error later triggers abort cleanup, cleanUpNoReuseTmpManifests will call deleteQuietly on already-deleted filenames. This is benign today (because ObjectsFile.delete uses deleteQuietly), but fragile if that implementation ever changes. Recommend removing stale entries from the list.
2. pickRuns heuristic
The sort-by-size-then-pick-smallest strategy is reasonable but has no inline comment explaining why smallest-first is chosen. A one-liner would help future readers.
3. rewriteFileNames.size() <= 1 guard
Because each group in rewriteGroups has size > 1, the set will always be either 0 or >= 2. The condition is effectively isEmpty(). Minor clarity point.
4. Potential NPE on partitionStats()
minPartition() / maxPartition() dereference file.partitionStats() without a null check. If legacy manifests could have null stats, a defensive filter in compactCandidates would be safer.
5. Scope widening in compactManifestOnce
The 3-param mergeManifests (called from compactManifestOnce) will now also trigger sort compaction when the option is enabled. Worth documenting this behavior expansion.
Tests
Good coverage of happy-path, partial-target-run rewrite, single-partition skip, and delete-manifest skip scenarios. The sortEntriesByPartition comparator ordering (partition > bucket > level > filename) ensures deterministic output.
Overall: correct design, sound algorithm, good test coverage. The stale-list concern (#1) is the most actionable item.
Summary
Validation