Skip to content

feat: affinity model for locality-preserving HAMT keys (RFC 0002)#61

Merged
rmanibus merged 2 commits into
mainfrom
feat/affinity-model
Mar 7, 2026
Merged

feat: affinity model for locality-preserving HAMT keys (RFC 0002)#61
rmanibus merged 2 commits into
mainfrom
feat/affinity-model

Conversation

@rmanibus
Copy link
Copy Markdown
Contributor

@rmanibus rmanibus commented Mar 7, 2026

Summary

Implement RFC 0002 by introducing affinity-aware HAMT routing keys so files sharing a parent directory share top trie levels, reducing incremental metadata churn.

What Changes

  • Core/engine integration for affinity key usage in backup scan/upload paths.
  • HAMT updates for affinity-aware routing behavior.
  • Benchmark coverage and helper script for measuring locality effects.
  • RFC and benchmark documentation updates.

Key Idea

AffinityKey(parentID, fileID) = SHA256(parentID)[:4] + SHA256(fileID)[4:]

This keeps sibling files close in the trie so incremental updates in flat directories reduce rewrites from roughly O(N*depth) toward O(depth).

Files of Interest

  • internal/hamt/hamt.go
  • internal/hamt/affinity_bench_test.go
  • internal/engine/backup_scan.go
  • internal/engine/backup.go
  • scripts/benchmark/affinity.sh
  • docs/affinity-benchmark.md
  • rfcs/0002-affinity-model.md

Tracking

… 0002)

Introduces AffinityKey(parentID, fileID) = SHA256(parentID)[:4] + SHA256(fileID)[4:]
so files sharing the same parent directory share the top 3 routing levels of the
trie. This collapses incremental-backup metadata rewrites from O(N·depth) to O(depth)
for a flat directory of N changed files.

- hamt: new Insert/Lookup/Delete signatures accept parentID alongside fileID;
  LeafEntry gains a PathKey field so buildNode can split leaves without re-deriving
  routing keys; LookupByFileID added as an O(N) fallback for path resolution
- engine: BackupManager.parentIndex tracks fileID→parentID during scan so
  lookupMetaByFileID can resolve AffinityKey lookups; falls back to LookupByFileID
  for entries not seen in the current scan (incremental backup case)
- core: Snapshot.HAMTVersion=2 tags new snapshots; LeafEntry.PathKey stored for
  correct leaf-split routing of affinity-keyed entries
@rmanibus rmanibus force-pushed the feat/affinity-model branch 2 times, most recently from a4c4dba to a739c74 Compare March 7, 2026 12:03
@rmanibus rmanibus force-pushed the feat/affinity-model branch from a739c74 to 0ce24dc Compare March 7, 2026 12:03
@rmanibus rmanibus merged commit 04014ee into main Mar 7, 2026
6 checks passed
@rmanibus rmanibus deleted the feat/affinity-model branch March 7, 2026 12:09
@rmanibus rmanibus changed the title feat: implement affinity model for locality-preserving HAMT keys (RFC 0002) feat: affinity model for locality-preserving HAMT keys (RFC 0002) Mar 15, 2026
@rmanibus rmanibus added the enhancement New feature or request label Mar 15, 2026
@rmanibus rmanibus added this to the RFC 0002: Affinity model milestone Mar 15, 2026
@rmanibus rmanibus linked an issue Mar 15, 2026 that may be closed by this pull request
@rmanibus rmanibus requested a review from Copilot March 16, 2026 07:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements RFC 0002 by switching HAMT routing from file-only hashing to affinity-aware keys so siblings share upper trie levels, reducing incremental metadata churn, plus adds benchmarks and documentation to validate locality effects.

Changes:

  • Introduces AffinityKey(parentID, fileID) routing and updates HAMT/engine call sites to pass parent context.
  • Adds end-to-end and unit/benchmark coverage (script + Go test/bench) to measure node-write reduction.
  • Updates snapshot/model metadata to record HAMT versioning and stores per-leaf PathKey for compatibility/operations.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/benchmark/affinity.sh Adds an E2E benchmark script that compares clustered vs scattered incremental updates by counting node/* objects written.
rfcs/0002-affinity-model.md Adds the RFC describing the affinity key scheme and intended versioning/migration behavior.
pkg/store/pack.go Adds debug logging around pack buffering and catalog flush statistics.
internal/hamt/hamt.go Implements affinity routing, extends Tree API (Insert/Lookup/Delete), and adds a walk-based lookup fallback.
internal/hamt/hamt_test.go Updates HAMT unit tests for the new Tree API signature.
internal/hamt/affinity_bench_test.go Adds a proof-style test and benchmarks comparing affinity vs legacy routing costs.
internal/engine/backup_scan.go Wires affinity-aware lookups/inserts/deletes through scan/incremental paths and adds parent indexing for lookups.
internal/engine/backup_upload.go Passes parent context into HAMT inserts during upload.
internal/engine/backup.go Tracks parent index and stamps snapshots with HAMTVersion: 2.
internal/core/models.go Extends LeafEntry with PathKey and Snapshot with HAMTVersion.
internal/engine/backup_test.go Updates backup tests to perform affinity-aware lookups (parentID + fileID).
internal/engine/prune_test.go Updates prune test HAMT inserts for new API.
internal/engine/diff_test.go Updates diff test HAMT inserts for new API.
internal/engine/check_test.go Updates check tests’ HAMT inserts for new API.
docs/affinity-benchmark.md Adds documented benchmark results and interpretation/trade-offs.
Comments suppressed due to low confidence (1)

internal/engine/backup_scan.go:152

  • detectChange now does Lookup(oldRoot, primaryParentID(meta), meta.FileID). For repositories created before RFC 0002, oldRoot is keyed by the legacy routing (hash(fileID) only), so this lookup will miss and treat previously-backed-up files as new/changed. This needs to branch on the previous snapshot’s HAMTVersion (default legacy when missing) and either use legacy routing (e.g., parentID=fileID) or migrate the old HAMT before scanning.
func (bm *BackupManager) detectChange(oldRoot string, meta *core.FileMeta) (changed bool, oldRef string, err error) {
	oldRef, err = bm.tree.Lookup(oldRoot, primaryParentID(meta), meta.FileID)
	if err != nil {
		return false, "", fmt.Errorf("hamt lookup: %w", err)
	}
	if oldRef == "" {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/engine/backup_scan.go
Comment thread internal/engine/backup.go
Comment thread pkg/store/pack.go
Comment thread rfcs/0002-affinity-model.md
Comment thread rfcs/0002-affinity-model.md
Comment thread rfcs/0002-affinity-model.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC 0002: Epic / Tracking issue for affinity model

2 participants