Skip to content

cache blessed base manifests and hot sets in memory#37

Merged
jaredLunde merged 1 commit into
mainfrom
jared/cached-fork
Mar 24, 2026
Merged

cache blessed base manifests and hot sets in memory#37
jaredLunde merged 1 commit into
mainfrom
jared/cached-fork

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

Base images (bases/*) are immutable after bless — no reason to fetch them from S3 on every fork. Add bounded in-memory caches (64 entries) on ExportRouter for deserialized VolumeManifests and parsed hot set indices. Cache hits clone an Arc (~0ns) instead of an S3 round-trip (~100ms).

  • Lazy population: first fork from a base fills the cache, subsequent forks hit it
  • Pre-warm on startup: after discover_exports, scan bases/ in each unique S3 prefix and load manifests + hot sets concurrently (8-wide)
  • Only download up to remaining cache capacity during pre-warm

Base images (bases/*) are immutable after bless — no reason to fetch
them from S3 on every fork. Add bounded in-memory caches (64 entries)
on ExportRouter for deserialized VolumeManifests and parsed hot set
indices. Cache hits clone an Arc (~0ns) instead of an S3 round-trip
(~100ms).

- Lazy population: first fork from a base fills the cache, subsequent
  forks hit it
- Pre-warm on startup: after discover_exports, scan bases/ in each
  unique S3 prefix and load manifests + hot sets concurrently (8-wide)
- Only download up to remaining cache capacity during pre-warm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit 3d41940 into main Mar 24, 2026
21 checks passed
@jaredLunde jaredLunde deleted the jared/cached-fork branch March 24, 2026 06:25
jaredLunde added a commit that referenced this pull request May 16, 2026
Pure mechanical move. Nine handoff-specific methods come off
`ExportRouter` and onto a new `HandoffCoordinator` in
`glidefs/src/handoff/coordinator.rs`:

- handoff_snapshot → HandoffCoordinator::snapshot
- freeze_all, unfreeze_all
- set_all_caches_freeze
- take_ublk_server, recover_handoff_devices, revive_after_failed_handoff
- is_per_io_daemon_supported
- get_handler_sync (also kept ExportRouter::get_handler async variant for NBD)

The coordinator wraps `Arc<ExportRouter>` and reaches per-export
state through three new `pub(crate)` accessors on the router:
- `exports_map() -> &DashMap<String, ExportState>`
- `cache_dir_path() -> &Path`
- `ublk_server_mutex() -> &Mutex<UblkServer>` (cfg-gated)

The single `cache.inner.manifest_etag.lock()` reach in
`recover_handoff_devices` is replaced by a new `pub(crate)` method
`WriteCache::set_manifest_etag(Option<String>)`, so the coordinator
never reaches into `pub(super) inner`.

`PredecessorCutoverCtx` and `SuccessorTakeoverCtx` now carry
`Arc<HandoffCoordinator>` instead of `Arc<ExportRouter>`. CRH and
the trait's default `get_handler` impl follow.

`run_predecessor` and `run_successor` take `Arc<HandoffCoordinator>`.
`cli/server.rs` constructs the coordinator next to the router build
(both predecessor SIGHUP path and successor entry point).

`router.rs` shrinks from ~3232 lines of handoff cruft to its actual
job: per-export I/O dispatch.

handoff_sequential_50_crh ✓ (this run; 575s, 50 clean handoffs,
fio do_verify clean, oracle scan zero corrupt blocks)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant