feat(cluster): manifest-vs-storage reconciliation — Phase 5 anti-entropy#412
feat(cluster): manifest-vs-storage reconciliation — Phase 5 anti-entropy#412
Conversation
Periodically detects and repairs drift between the Raft-replicated cluster file manifest and physical storage. Two kinds of drift: - Orphan manifest entries (manifest references a missing file) - Orphan storage files (file exists but no manifest entry) Druid-style posture: off by default; once enabled, cron auto-acts on drift older than a 24h grace window with a 10k per-run blast cap. Mirrors the retention chunked-batch + abort-on-Raft-error pattern at retention.go:818-879. Cluster gate is dual-mode by backend topology: - Shared storage (S3/Azure/MinIO): one node sweeps via IsActiveCompactor - Local storage: every node walks own disk, filtered by OriginNodeID - Standalone: defensive guard refuses to act (no manifest to reconcile) Manifest-before-storage invariant upheld at the cycle level: orphan-manifest sweep runs first, orphan-storage runs only if the first half completes. Re-checks before every delete catch concurrent writer/retention/compaction races. Sorted output makes cap-bounded runs deterministic. API: GET /status (RequireRead), POST /trigger (RequireAdmin, defaults dry_run=true; act mode requires both dry_run=false AND act=true), GET /runs/:id (RequireAdmin). Audit events emitted per batch with bounded sample paths.
|
/gemini review @gemini-code-assist this PR adds Phase 5 manifest-vs-storage reconciliation (anti-entropy) for Arc Enterprise clusters. Please pay particular attention to:
Three internal staff/principal-level review agents already ran; their blocking findings are addressed in this PR. Looking for fresh eyes. |
There was a problem hiding this comment.
Code Review
This pull request introduces a periodic reconciler to detect and repair drift between the Raft-replicated cluster file manifest and physical storage, specifically addressing orphan manifest entries and orphan storage files. The implementation includes a core reconciliation engine, a cron-based scheduler, and a new API handler for status reporting and manual triggers. Feedback focuses on a potential data loss risk where storage backends lacking ObjectLister support bypass the grace window due to zeroed modification times. Additionally, there are concerns regarding the robustness of error classification using substring matching and the potential for excessive I/O during the root walk discovery process in clusters with many databases.
There was a problem hiding this comment.
Code Review
This pull request introduces a manifest-vs-storage reconciliation system for Enterprise clusters to repair drift between Raft metadata and physical storage. The implementation includes a periodic reconciler with safety features like grace windows and blast caps, a cron-based scheduler, and a new API for status monitoring and manual triggers. Review feedback suggests several optimizations, including increasing initial map capacities for better performance, using strings.Join for efficient path sampling in audit logs, improving the robustness of error classification, and explicitly clearing large manifest slices to reduce memory pressure during runs.
Round 1 (HIGH + 2 MEDIUMs): - HIGH: zero-mtime files now treated as YOUNG (protected) instead of eligible for deletion. The List+StatFile fallback path (used when a backend doesn't implement ObjectLister) produces zero LastModified values; the previous behavior would have made those files immediately eligible, bypassing the grace window. Conservative default prevents data loss; emit a Warn so operators know the safety net engaged. (walk.go, diff.go) - MEDIUM: replace substring-match isRaftError with errors.Is against a new raft.ErrManifestApply sentinel. Coordinator now wraps every manifest-mutation error (RegisterFile / DeleteFile / BatchFileOps / UpdateFile, leader and forwarded paths) via errors.Join so the classification is robust to future log-string renames. Sentinel lives in the leaf raft package to avoid an import cycle. - MEDIUM: bound derivePrefixes' root-walk via cfg.MaxRootWalkDatabases (default 1000). On a sprawling cluster or shared-bucket leak, the unbounded inner ListDirectories loop could trigger thousands of metadata calls per tick. Cap=0 disables the root walk entirely. Round 2 (3 MEDIUMs): - MEDIUM: bump derivePrefixes set map initial capacity from 32 to 128 to cover typical production schemas without re-allocs. - MEDIUM: replace joinSample loop concatenation with strings.Join. - MEDIUM: nil out the full manifest slice after reducing to ObjectKeys so ~40 MB of pinned FileEntry data (SHA256, SizeBytes, CreatedAt) becomes GC-eligible during the rest of a 200k-entry run. Refs PR #412.
|
/gemini review @gemini-code-assist addressed your findings in commit a04b862. Summary: Round 1 (HIGH + 2 MEDIUMs)
Round 2 (3 MEDIUMs)
New tests added: zero-mtime-protected, root-walk-cap-bounds-listdirectories, root-walk-cap-zero-disables. Looking for fresh eyes on the typed-error wiring (verify no other manifest-write callers were missed) and any additional findings. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new manifest-vs-storage reconciliation feature to detect and repair drift between the Raft-replicated manifest and physical storage. The implementation includes a periodic reconciler, a scheduler, and API endpoints for status and manual triggering. I have identified several issues in the root walk logic where database-level checks were ineffective, and I have suggested improvements for log management to prevent log flooding during transient storage failures.
- HIGH: derivePrefixes was checking the database-only key against the database/measurement set, which always missed — every known database consumed MaxRootWalkDatabases budget and starved the discovery of truly-unknown databases. Build a separate dbSet keyed by "database/" so the membership check is correct. Regression test TestReconcile_RootWalkCapDoesNotCountKnownDatabases pins the fix. - MEDIUM: rate-limit storage.Exists failure logs in sweepOrphanManifest. A transient backend hiccup that fails Exists across an entire chunk produced N Warns, drowning operator dashboards. Per-file detail now goes into run.Errors (capped at 32) and a single summary Warn fires at the end of each batch. - MEDIUM: same pattern for storage.Delete failures in applyStorageDeletes — already captured in run.Errors, now emits one summary Warn per batch instead of one per file. - MEDIUM: hoist the ObjectLister-missing Warn to once-per-run in walkStorage instead of once-per-prefix in listPrefix. On a cluster with 1000 measurements this drops the warning count from 1000 to 1. Refs PR #412.
|
/gemini review @gemini-code-assist round 3 fixes pushed in commit 96802d6:
Per-file failure detail still flows into |
There was a problem hiding this comment.
Code Review
This pull request introduces the Phase 5 Manifest-vs-Storage Reconciliation feature, which periodically identifies and resolves drift between the cluster's Raft manifest and physical storage. The implementation includes a new reconciliation package, a cron-based scheduler, and an HTTP API for status monitoring and manual triggers. Review feedback highlights several improvement opportunities: bounding the initial directory listing in the root walk to prevent potential memory issues, parallelizing synchronous existence checks during the manifest sweep to improve performance on remote backends, and utilizing a circular buffer for the run history to avoid inefficient slice prepending.
All three round-4 findings addressed (no defers):
- MEDIUM: bound the initial ListDirectories("") iteration in
derivePrefixes to 4 × MaxRootWalkDatabases. The previous bound only
capped the inspected counter; a shared bucket with thousands of
unrelated top-level prefixes could still pin megabytes of dirnames
and burn CPU on the dbSet lookup loop. Headroom of 4× covers the
case where most top-level entries are known databases that the
dbSet check skips. Operator-visible Warn fires when the limit
trims the returned list.
- MEDIUM: parallel storage.Exists during the manifest sweep.
Sequential HEAD-per-file on S3/Azure was the slow path; with chunks
up to BatchSize=1000 a manifest sweep could take minutes per
network RTT. New parallelExists helper runs a bounded worker pool
(default 8 via RecheckConcurrency, 1 forces sequential) feeding
into a fixed-index result slice so the existing sequential
ops-build loop is unchanged. Conservative default stays under
cloud per-second rate limits. Two new tests pin the concurrency
bound and the sequential-when-1 mode.
- MEDIUM: ring buffer for runHistory. The previous
append([]*Run{run}, r.runHistory...) prepend was O(N); the new
fixed-size slice + head index + size counter is O(1). Read methods
walk the ring newest-first; behavior of RecentRuns and FindRun is
unchanged from the caller's perspective. Regression test pins
newest-first ordering and FIFO eviction across wrap-around.
Refs PR #412.
|
/gemini review @gemini-code-assist round 4 fixes pushed in commit 86b69fd. All three findings addressed (no defers):
This should be the final pass — looking for any remaining findings before merge. |
…dditions The original release-notes entry for the reconciler reflected commit ab0d4ce; rounds 1-4 of the gemini review added behavior the operator docs hadn't caught up with: - Standalone-mode safety guard (orphan-storage sweep refuses to act) - AbortReason taxonomy via raft.ErrManifestApply sentinel (raft_quorum_loss / lease_lost / unknown distinct on dashboards) - Parallel storage.Exists re-checks (RecheckConcurrency, default 8) - Root-walk fan-out cap (MaxRootWalkDatabases, default 1000) - Strict 7-segment looksLikeManagedPath when DeletePreManifestOrphans=false - Conservative protection on un-mtime'd backends (zero LastModified -> YOUNG) - WalkPartial flag + per-prefix errors in run summary - Auth wiring on all 3 routes (RequireRead on /status, RequireAdmin on /trigger and /runs/:id) Two new config knobs added to the table: max_root_walk_databases, recheck_concurrency.
There was a problem hiding this comment.
Code Review
This pull request introduces a periodic reconciler to detect and repair drift between the Raft-replicated file manifest and physical storage, addressing both orphan manifest entries and storage files. The implementation includes a new internal/reconciliation package, a cron-based scheduler, and administrative API endpoints for status and manual triggers. Review feedback suggests improving code organization by centralizing utility functions, clarifying release note descriptions for path-based deletion rules, and optimizing the scheduler by caching parsed cron schedules to avoid redundant parsing.
Combined gemini round-5 (3 items) + 4 parallel internal staff reviews
(correctness / code-quality / security / perf-observability) into a
single commit so the next gemini pass lands on a near-clean diff.
HIGH (correctness — internal):
- CapHit lost when a manifest-sweep chunk is all-recheck-skipped at the
cap boundary: continue skipped the audit + return paths so the
reconciler kept doing recheck I/O on subsequent chunks. Same bug in
storage sweep where len(toDelete)==0 used continue instead of return.
- WalkPartial and CapHit not in run_completed/run_aborted audit detail.
SIEM consumers had no way to tell "0 orphans, all clean" from "blind
walk produced 0 candidates" or "we hit the cap and stopped early."
- Inner ListDirectories failures in derivePrefixes silently swallowed —
now surfaced via prefixDiscovery.errors and Run.WalkPartial.
- ctx.Err() not checked inside the root-walk loop after my round-4
scan-limit change; a misbehaving backend could burn the whole budget
after cancel. Added per-iteration check.
HIGH (gemini + code-quality):
- appendBounded was in sweep_storage.go but used cross-file. Moved all
four cap/sample helpers (appendBounded, joinSample, sampleStrings,
sampleCandidatePaths, boolStr) into helpers.go. Renamed `cap` shadow
to `limit`. Collapsed sampleStrings/sampleCandidatePaths into a
generic sampleSlice.
- scheduler.tick used direct == against sentinel errors (silently
breaks if anyone wraps them). Switched to errors.Is.
MEDIUM (security):
- License re-validation middleware on /trigger and /runs/:id (per
CLAUDE.md "License middleware goes after auth middleware"). New
License.CanUseClustering() / Client.CanUseClustering() and a
requireClusteringLicense fiber middleware mirroring the
requireTieringLicense pattern. /status remains license-free so
expired clusters can still report disabled state.
- applyDefaults clamps operator overrides: MaxRootWalkDatabases at
1_000_000 (defends against int overflow in the × 4 scan-limit
arithmetic), RecheckConcurrency at 64 (cloud rate-limit safety),
negative MaxDeletesPerRun / GraceWindow / etc default to safe values
(no negative-slice-index panics).
MEDIUM (perf):
- ErrGateRevoked sentinel + errors.Is replaces the substring-match
isLeaseLostError. Both sweeps wrap %w ErrGateRevoked.
- Cached parsed cron.Schedule on the Scheduler struct + module-level
cronParser. nextRun no longer re-parses on every Status() request.
- Replaced fmt.Sprintf("%d", n) hot-path calls with strconv.Itoa /
strconv.FormatInt in the audit-emit and cap-hit paths.
- computeDiff now uses a single map[string]bool (manifestSeen) instead
of two separate maps. Saves ~16 MB of transient allocation per run
on a 200k-entry manifest.
MEDIUM (code quality):
- backend_kind threaded into run_completed and run_aborted audit
details. Operators inspecting an audit row can now tell shared-vs-
local without correlating to status.
- skipped_recheck added to run_completed audit.
- Dropped unreachable LocalNodeID=="" branch in manifestToKeys
(NewReconciler already rejects that combination).
MEDIUM (release notes):
- Tightened delete_pre_manifest_orphans description per gemini's
feedback — explicitly calls out the 7-segment layout and `.`/`..`
rejection rather than the vague "database/measurement/..." wording.
New tests:
- TestApplyDefaults_ClampsOperatorOverridesToCeilings — pins the new
MaxRootWalkDatabases / RecheckConcurrency / MaxDeletesPerRun clamp
semantics.
CLAUDE.md updated locally (not committed — gitignored): post-
implementation review section now mandates a 4-agent parallel review
including a perf/observability/operational-hygiene lens explicitly
charged with catching the line-level items gemini flags.
Refs PR #412.
|
/gemini review @gemini-code-assist round 5 fixes pushed in commit 3f41088. Combined gemini's 3 round-5 findings + 4 parallel internal staff/principal reviews into one commit so this lands closer to a clean diff. HIGH (correctness, internal review):
HIGH (gemini + code-quality):
MEDIUM (security):
MEDIUM (perf):
MEDIUM (code quality):
MEDIUM (release notes):
New regression test: Looking for any final findings before merge. |
There was a problem hiding this comment.
Code Review
This pull request introduces a periodic reconciler to detect and repair drift between the Raft-replicated cluster file manifest and physical storage, addressing both orphan manifest entries and orphan storage files. The implementation includes a core reconciliation engine, a cron-based scheduler, an HTTP API for status and manual triggers, and integration with the existing clustering and storage layers. Feedback highlights a logic error in applyDefaults that prevents disabling the root walk by setting MaxRootWalkDatabases to 0. Additionally, it is suggested to change the default for delete_pre_manifest_orphans to false to prevent accidental data loss in shared storage environments.
Gemini round 6 flagged two items; pre-emptive same-shape audit confirmed gemini's findings and surfaced no additional fields with the same bug shape. Combined into a single commit so the next gemini pass lands on a clean diff. HIGH (gemini): config-driven MaxRootWalkDatabases=0 was silently overridden to 1000 by applyDefaults, preventing operators from reaching the disabled state via arc.toml. Convention now: `< 0` means "operator did not set this, fill default 1000"; `0` means "explicitly disabled" and passes through to the downstream guard at walk.go:derivePrefixes (which gates root walk on `> 0`); positive values used (clamped at maxRootWalkDatabasesCap to defend against × 4 overflow). All other fields remain on `<= 0` "fill default" semantics — overloading every numeric knob with shape-#1 semantics would diverge from the existing ManifestOnlyDryRun behavioral knob and surprise operators in unrelated config paths. MEDIUM (gemini): delete_pre_manifest_orphans default flipped from true to false. Shared-bucket deployments where stray non-Arc files co-exist with Arc data must NOT see those files deleted on first cron run. Operators who deliberately want pre-Phase-1 / migration- residual cleanup now opt in explicitly. Mirrors Iceberg's remove_orphan_files / Delta VACUUM secure-by-default posture. MEDIUM (internal review): manifest_only_dry_run default flipped from false to true. Even though the feature is opt-in via reconciliation.enabled, flipping enabled=true alone should not auto-delete on the very first cron tick. Operators review dry-run audits, then explicitly flip manifest_only_dry_run=false to allow real deletes. Druid's coordinator kill task and Iceberg's remove_orphan_files ship the same posture. Test infrastructure: - newReconciler test helper sets MaxRootWalkDatabases=-1 when the caller passes 0 so tests don't accidentally land in the disabled state; tests that DO want disabled set 0 explicitly. - Two integration tests using NewReconciler directly add MaxRootWalkDatabases:-1 to opt into root walk. - New TestApplyDefaults_MaxRootWalkDatabasesZeroFromConfigDisablesRootWalk pins the gemini-flagged convention. Release notes updated: "off by default AND report-only on first run" prose, plus the two flipped default rows in the config table. Refs PR #412.
|
/gemini review @gemini-code-assist round 6 fixes pushed in commit 38da2a2. HIGH (your finding):
MEDIUM (your finding) — secure-by-default flips:
Process note: I ran a pre-emptive same-shape audit on the package looking for other fields with the config-defaults-vs-disabled confusion you spotted. Confirmed there were no other instances — only Looking for any final findings before merge. |
There was a problem hiding this comment.
Code Review
This pull request introduces a periodic reconciler to detect and repair drift between the Raft-replicated cluster file manifest and physical storage, addressing both orphan manifest entries and orphan storage files. The implementation includes a core reconciliation engine, a cron-based scheduler, and API endpoints for status and manual triggers. Feedback suggests refining the DryRun logic in the core engine to allow manual overrides via the API even when the global dry-run configuration is enabled. Additionally, the check for excluding compaction state files should be made more specific to avoid incorrectly ignoring valid user data that might contain the reserved string in its path.
…ction-state filter Gemini round 7 flagged two MEDIUMs; pre-emptive same-shape audit escalated #1 to HIGH (operator-intent suppression with no diagnostic) and confirmed no other instances of either shape elsewhere in the package. HIGH (gemini, escalated): Reconcile previously did `DryRun = dryRun || cfg.ManifestOnlyDryRun`. With the round-6 secure- by-default flip of ManifestOnlyDryRun=true, this silently forced every Reconcile call into dry-run, including the manual API path. Operators hitting POST /trigger?dry_run=false&act=true (which already encodes the explicit double-confirmation) would be silently denied the act-mode they asked for. The cron path still passes cfg.ManifestOnlyDryRun directly via scheduler.tick, so dropping the OR preserves cron safety while honoring legitimate manual overrides. Two regression tests pin both behaviors: - TestReconcile_ManualOverrideHonorsDryRunFalseEvenWithCfgDryRunOnly - TestReconcile_CronPathDryRunStillRespectsCfg MEDIUM (gemini): isParquetCandidate used `strings.Contains(p, "_compaction_state/")` which is a substring match, not segment-aware. A measurement named with that substring embedded — e.g. "db/m_compaction_state_logs/..." — would have all its valid data files silently filtered out of orphan-storage candidacy. Replaced with an explicit `strings.Split + segment == "_compaction_state"` check, mirroring the segment-aware shape used by looksLikeManagedPath. Added three regression cases to the TestIsParquetCandidate table covering substring-but-not-segment positions where false positives would previously trigger. Refs PR #412.
|
/gemini review @gemini-code-assist round 7 fixes pushed in commit ba98d58. MEDIUM (your finding, internally re-graded HIGH) — manual API override now honored:
MEDIUM (your finding) — segment-aware compaction-state filter:
Pre-emptive same-shape audit: ran a parallel staff-engineer audit before fixing, looking for other belt-and-braces overrides that suppress operator intent and other path-shape substring matches. Confirmed no other instances of either bug shape in the package. Looking for any final findings before merge. |
There was a problem hiding this comment.
Code Review
This pull request introduces a periodic reconciliation system for Enterprise clusters to detect and repair drift between the Raft-replicated file manifest and physical storage. It addresses orphan manifest entries and orphan storage files with safety features like dry-run modes, grace windows, and blast caps. Feedback suggests improving the root-walk discovery to cover orphan measurements in known databases, removing redundant deduplication logic in the storage walk, and optimizing path segment checks to reduce allocations.
| if _, exists := dbSet[key]; exists { | ||
| // Already covered by the manifest-derived dbSet; | ||
| // don't count toward the inspection cap. | ||
| continue | ||
| } |
There was a problem hiding this comment.
The root-walk fallback currently skips any database already present in the manifest (dbSet). This optimization prevents the reconciler from discovering orphan measurements (measurements that exist in storage but not in the manifest) within known databases. While this saves on ListDirectories calls, it creates a gap in anti-entropy coverage. Consider if a more thorough discovery (perhaps sampled or periodic) is needed for known databases.
| res := walkResult{ | ||
| records: make([]objectRecord, 0, len(manifest)), | ||
| partial: discovery.partial, | ||
| prefixErrors: append([]string{}, discovery.errors...), | ||
| } | ||
| seen := make(map[string]struct{}, len(manifest)) | ||
|
|
||
| for _, prefix := range prefixes { | ||
| if ctx.Err() != nil { | ||
| return res, ctx.Err() | ||
| } | ||
| records, err := r.listPrefix(ctx, prefix) | ||
| if err != nil { | ||
| // A per-prefix list failure is logged and the prefix is | ||
| // skipped; the run continues against the remaining prefixes | ||
| // so a single transient backend hiccup doesn't kill the | ||
| // whole cycle. Errors are reported on the Run summary so | ||
| // operators can spot "0 orphans found" runs that were | ||
| // actually blind to most of the bucket. | ||
| res.partial = true | ||
| res.prefixErrors = append(res.prefixErrors, fmt.Sprintf("prefix %q: %v", prefix, err)) | ||
| r.logger.Warn(). | ||
| Err(err). | ||
| Str("prefix", prefix). | ||
| Msg("Reconciliation: prefix list failed; skipping") | ||
| continue | ||
| } | ||
| for _, rec := range records { | ||
| if _, dup := seen[rec.path]; dup { | ||
| continue | ||
| } | ||
| seen[rec.path] = struct{}{} | ||
| res.records = append(res.records, rec) | ||
| } | ||
| } | ||
|
|
||
| return res, nil | ||
| } | ||
|
|
||
| // prefixDiscovery is the result of derivePrefixes. partial=true and/or |
There was a problem hiding this comment.
The seen map used for deduplication in walkStorage is redundant because the prefixes generated by derivePrefixes are disjoint by construction (each is a unique database/measurement/ pair). Removing this map and using a direct append will reduce memory overhead and improve performance, especially for large manifests.
res := walkResult{
records: make([]objectRecord, 0, len(manifest)),
partial: discovery.partial,
prefixErrors: append([]string{}, discovery.errors...),
}
for _, prefix := range prefixes {
if ctx.Err() != nil {
return res, ctx.Err()
}
records, err := r.listPrefix(ctx, prefix)
if err != nil {
// A per-prefix list failure is logged and the prefix is
// skipped; the run continues against the remaining prefixes
// so a single transient backend hiccup doesn't kill the
// whole cycle. Errors are reported on the Run summary so
// operators can spot "0 orphans found" runs that were
// actually blind to most of the bucket.
res.partial = true
res.prefixErrors = append(res.prefixErrors, fmt.Sprintf("prefix %q: %v", prefix, err))
r.logger.Warn().
Err(err).
Str("prefix", prefix).
Msg("Reconciliation: prefix list failed; skipping")
continue
}
res.records = append(res.records, records...)
}
return res, nil| for _, seg := range strings.Split(p, "/") { | ||
| if seg == "_compaction_state" { | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
Using strings.Split here for every file found in storage can lead to significant allocations during large reconciliation runs. Since we only need to check if _compaction_state is a path segment, and we already know the path ends in .parquet, we can use more efficient string checks.
if strings.HasPrefix(p, "_compaction_state/") || strings.Contains(p, "/_compaction_state/") {
return false
}
Summary
storage.Writeand the file-registrar Raft propose, or pre-Phase-1 history).Design
Cluster gate is dual-mode by backend topology:
IsActiveCompactor— reuses the failover-managed compactor lease as a single-sweeper election with no new state.OriginNodeID == self.sweepOrphanStoragerefuses to act (no manifest to reconcile against).Manifest-before-storage invariant upheld at the cycle level: orphan-manifest sweep runs first; orphan-storage runs only if the first half completes cleanly. Mirrors the retention chunked-batch pattern at retention.go:818-879 — same chunk size, same manifest-failure abort semantics, same per-file marshal safety.
Re-checks before every delete catch concurrent writer/retention/compaction races. Sorted output (
computeDiff) makes cap-bounded runs deterministic about which orphans get cleaned. Per-chunk gate re-evaluation ensures lease handoff aborts the run promptly.API
GET /api/v1/reconciliation/statusRequireReadPOST /api/v1/reconciliation/triggerRequireAdmindry_run=falseANDact=true(defense against accidental curl)GET /api/v1/reconciliation/runs/:idRequireAdminConfiguration (
ARC_RECONCILIATION_*)Defaults lean conservative:
enabled=false,schedule="17 4 * * *",grace_window=24h,max_deletes_per_run=10000,max_manifest_size=200000.manifest_only_dry_runexists as a safety bridge for operators who want a few weeks of report-only runs before flipping to full auto.Audit
Per-batch (not per-file, to keep audit log proportional):
reconcile.run_started,reconcile.run_completed,reconcile.run_aborted,reconcile.manifest_batch_deleted,reconcile.storage_batch_deleted,reconcile.cap_hit.Run.WalkPartialflag plusRun.Errorssurface per-prefix list failures so an operator can disambiguate "0 orphans found" from "we couldn't see most of the bucket".License
Rides on
FeatureClusteringat coordinator construction. Standalone Arc has no manifest, so reconciliation is naturally Enterprise-only — no separate flag (avoiding InfluxDB-style fragmentation).Scope of changes
internal/reconciliation/(engine, scheduler, gate, walk, diff, two sweeps, tests)internal/api/reconciliation.go(handler with auth-wrapped routes)cmd/arc/main.go(constructor + dual-modereconciliationClusterGate)internal/config/config.go(ReconciliationConfig+ viper defaults)RELEASE_NOTES_2026.05.1.mdNo FSM changes. No new Raft commands. No
storage.Backendinterface changes. Reuses retention'sBatchFileOpsInManifestchunking pattern verbatim.Test plan
LocalBackend(real fs walk, realChtimes, realExists)TriggerNow, status fields)go build ./cmd/... ./internal/...cleango vetcleanReviewer notes
Coordinatorinterface inreconciler.gois the only cross-package surface; defined in the consumer package to break the cluster→reconciliation→cluster cycle.markAborteddefaults unknown errors toAbortUnknown(notAbortRaftQuorumLoss) to avoid polluting Raft dashboards.Run.WalkPartialis set when any per-prefix list failed; per-prefix errors are written intoRun.Errors.looksLikeManagedPathrequires the full 7-segment Arc layout (db/m/yyyy/mm/dd/hh/file) and rejects./..segments — the strict-mode safety net forDeletePreManifestOrphans=false.🤖 Generated with Claude Code
Closes #391