Skip to content

[core] Add manifest sort compaction#7826

Open
leaves12138 wants to merge 4 commits into
apache:masterfrom
leaves12138:codex/manifest-sort-compaction
Open

[core] Add manifest sort compaction#7826
leaves12138 wants to merge 4 commits into
apache:masterfrom
leaves12138:codex/manifest-sort-compaction

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

Summary

  • add manifest-sort.enabled and manifest-sort.rewrite-manifest-count options
  • add independent manifest sort compaction that groups ManifestFileMeta by partition-range sorted runs
  • wire the sort compaction after existing manifest merge during snapshot/manifest compact commits
  • add focused manifest metadata tests for partial target-run rewrite and delete-manifest skip behavior

Validation

  • mvn -pl paimon-core -am -DfailIfNoTests=false -Dtest=ManifestFileMetaTest test
  • mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=FileStoreCommitTest#testManifestCompact test

@leaves12138 leaves12138 changed the title [codex] Add manifest sort compaction [core] Add manifest sort compaction May 12, 2026
@leaves12138 leaves12138 force-pushed the codex/manifest-sort-compaction branch 4 times, most recently from 0b3134e to cfe50a5 Compare May 12, 2026 10:06
@leaves12138 leaves12138 force-pushed the codex/manifest-sort-compaction branch from cfe50a5 to 9540939 Compare May 12, 2026 10:23
@leaves12138 leaves12138 marked this pull request as ready for review May 12, 2026 12:11
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [core] Add manifest sort compaction

The analysis covers CoreOptions (new options), FileStoreCommitImpl (integration), ManifestFileMerger (core algorithm), and related tests.

Key Findings

1. Stale entries in createdManifestsForAbort

cleanUnusedCreatedManifestFiles deletes intermediate manifest files from the filesystem but does not remove their entries from newFilesForAbort. If an error later triggers abort cleanup, cleanUpNoReuseTmpManifests will call deleteQuietly on already-deleted filenames. This is benign today (because ObjectsFile.delete uses deleteQuietly), but fragile if that implementation ever changes. Recommend removing stale entries from the list.

2. pickRuns heuristic

The sort-by-size-then-pick-smallest strategy is reasonable but has no inline comment explaining why smallest-first is chosen. A one-liner would help future readers.

3. rewriteFileNames.size() <= 1 guard

Because each group in rewriteGroups has size > 1, the set will always be either 0 or >= 2. The condition is effectively isEmpty(). Minor clarity point.

4. Potential NPE on partitionStats()

minPartition() / maxPartition() dereference file.partitionStats() without a null check. If legacy manifests could have null stats, a defensive filter in compactCandidates would be safer.

5. Scope widening in compactManifestOnce

The 3-param mergeManifests (called from compactManifestOnce) will now also trigger sort compaction when the option is enabled. Worth documenting this behavior expansion.

Tests

Good coverage of happy-path, partial-target-run rewrite, single-partition skip, and delete-manifest skip scenarios. The sortEntriesByPartition comparator ordering (partition > bucket > level > filename) ensures deterministic output.

Overall: correct design, sound algorithm, good test coverage. The stale-list concern (#1) is the most actionable item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants