Skip to content

Add cross-cycle presence cache to reduce repair HeadObject calls#192

Merged
raymondjacobson merged 1 commit intoOpenAudio:mainfrom
RolfAris:feat/cross-cycle-known-present
Apr 4, 2026
Merged

Add cross-cycle presence cache to reduce repair HeadObject calls#192
raymondjacobson merged 1 commit intoOpenAudio:mainfrom
RolfAris:feat/cross-cycle-known-present

Conversation

@RolfAris
Copy link
Copy Markdown
Contributor

@RolfAris RolfAris commented Apr 1, 2026

Summary

  • Repair cycles call bucket.Attributes (HeadObject) for every locally-held CID. On S3-compatible backends, this is the dominant source of metadata API calls.
  • Adds an imcache LRU cache (knownPresent, 500K entries, no TTL) that remembers confirmed-present keys across cycles. Non-cleanup cycles check the cache and skip Attributes on hit.
  • Cleanup cycles (every 4th) call RemoveAll() and do full verification — they need ModTime for over-replication decisions and run blob integrity validation.
  • Cache is populated after all validation and cleanup checks pass, so corrupt or about-to-be-deleted blobs are never cached.
  • Populated on replicateToMyBucket writes. Invalidated on dropFromMyBucket and cleanup validation deletes.
  • Scoped to the repair batch path only — haveInMyBucket and serving paths are unchanged.

Design notes

  • Uses imcache with WithNoExpiration() and LRU eviction, consistent with the four existing caches on MediorumServer. No new dependencies.
  • Blobs deleted outside mediorum (manual storage console, backend-side loss) remain cached until the next cleanup cycle clears and re-verifies everything. This is an accepted tradeoff — cleanup runs every 4th cycle and is authoritative.
  • The knownPresent.Remove inside the cleanup validation delete block is currently unreachable due to a pre-existing bug where err is checked instead of errVal (see Fix two bugs in repairCid: dead cleanup validation and wrong polarity #175). It becomes live when Fix two bugs in repairCid: dead cleanup validation and wrong polarity #175 merges.
  • repair_known_present counter tracks cache hits per cycle. known_present_size is logged at cycle completion.

Repair cycles call bucket.Attributes (HeadObject) for every locally-held
CID to verify presence. On S3-compatible backends this is the dominant
source of metadata API calls.

Add an imcache LRU (500K entries, no TTL) that remembers confirmed-present
keys across cycles. Non-cleanup repair cycles check the cache first and
skip the Attributes call on hit. Cleanup cycles (every 4th) clear the
cache via RemoveAll and do full verification — they need ModTime for
over-replication decisions and run blob validation.

Cache is populated after all validation passes (so corrupt blobs are never
cached), on replicateToMyBucket writes, and invalidated on dropFromMyBucket
deletes and cleanup validation deletes.
Copy link
Copy Markdown
Contributor

@raymondjacobson raymondjacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice little PR here! The once every 4 cycles cleanup makes sense to me.

we'll see how the 500k limit fares. it may be worth exposing this in a var because archive nodes that volunteer to store the whole catalog would make use of a larger cache too

@raymondjacobson raymondjacobson merged commit cd74101 into OpenAudio:main Apr 4, 2026
2 of 6 checks passed
RolfAris added a commit to RolfAris/go-openaudio that referenced this pull request Apr 6, 2026
Move the listing-derived presence index from cleanup-only to all repair
cycles (uploads, previews, qm_cids). Always on — no flag needed.

The index replaces per-key HeadObject with a single ListObjects
pagination at cycle start. Staleness between listings is covered by
the existing knownPresent write-path cache (PR OpenAudio#192).

On build failure, falls back to per-key HeadObject (same as before).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants