feat(paging): EmbeddingCache → PagedResourcePool (Phase 2 — fixes 0/64 hit rate)#933
Merged
feat(paging): EmbeddingCache → PagedResourcePool (Phase 2 — fixes 0/64 hit rate)#933
Conversation
The PagedResourcePool primitive is the per-resource brain. The broker is
the cross-resource brain: one orchestrator that reads pressure from every
registered pool, decides which to relieve, and pulls the eviction lever.
Same broker is the future home of recipe-aware priority arbitration,
ML-policy-driven tiering, and LLM-mediated control for novel pressure
situations — those land in 7b/7c without changing the pool API.
What lands here:
- PressureSource trait — name + pressure + evict_some + stats
- PressureBroker struct with register/unregister/relieve/snapshot/spawn_tick
- PressureTier (Normal/Warning/High/Critical at 0.6/0.8/0.95)
- Tier-driven relief: Normal/Warning observe; High evicts worst pool;
Critical evicts all over-budget pools
- Blanket impl: every PagedResourcePool<K, V> auto-satisfies
PressureSource — consumers register Arc<pool>, no adapter struct
- PagedResourcePool::evict_under_pressure (broker-callable eviction)
- PagedResourcePool::stats_blocking (sync stats for non-async trait)
- PagedResourcePool::config_name (stable name accessor)
Tests (8/8 passing):
- tier_thresholds_match_gpu_pressure_watcher
- no_action_when_all_pools_normal
- high_pressure_evicts_only_worst_pool
- critical_pressure_evicts_all_over_budget_pools
- registration_dedups_by_name
- unregister_removes_pool
- real_paged_resource_pool_plugs_into_broker_via_blanket_impl
- snapshot_orders_pools_by_pressure_descending
The integration test proves the architectural point: build a real
PagedResourcePool<String, Vec<u8>>, fill it past 0.80 pressure,
register it via blanket impl, call broker.relieve(), watch pressure
drop. No adapter struct, no per-consumer wiring boilerplate.
Doc updated: RESOURCE-ARCHITECTURE.md Phase 7 status + 7a/7b/7c split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0/64 hit rate fix. Internal Rust callers (DataModule.backfill_vectors,
ModuleBackedEmbeddingProvider, anywhere using generate_embedding{,s_batch})
were silently bypassing the embedding cache because it was wired only into
the IPC handler — handle_generate had its own ad-hoc HashMap<(String, u64),
CachedEmbedding> with djb2 keys + 5-minute TTL + 10k count cap.
This commit moves the cache one layer down: the pool lives behind
generate_embedding{,s_batch}, so every embedder — IPC and Rust — hits the
same cache uniformly. The IPC handler becomes a thin wrapper around the
batched function. The architecture-doc claim that the migration "fixes
the 0/64 hit rate" is now true.
What's in this commit:
- EmbeddingKey { model: String, content_hash: [u8; 32] } — structured
key, model-namespaced (different models map text → different vectors,
so they need distinct cache slots), SHA-256 content hash (fixed-size,
collision-free vs djb2)
- EMBEDDING_POOL: PagedResourcePool<EmbeddingKey, Vec<f32>> replaces
EMBEDDING_CACHE. 256 MB byte-driven budget (~170k entries at 384d FP32,
>> previous 10k count cap), size_weighted_lru eviction, eventually
overridden by recipe-declared budgets (Phase 9) + pressure broker (7b)
- generate_embedding / generate_embeddings_batch are pool-aware — the
actual user-visible behavior change. Batch path: per-text cache check,
one model.embed() for the miss subset, per-text insert. Single path:
pool.get → fall through to model.embed → pool.insert.
- handle_generate collapses to ~15 lines (was ~70) by delegating to
generate_embeddings_batch
- handle_cache_stats now exposes pressure + max_bytes + evictions in
addition to hits/misses/size — broker-aware cache visibility
- handle_cache_clear → pool.clear() (new pool method that drains entries
+ resets hit/miss/eviction counters)
- PagedResourcePool::clear() — admin-level reset, drops pinned too,
distinct from evict_under_pressure (which respects pins)
Tests (5/5 passing, 50/50 paging tests still passing):
- generate_embeddings_batch_hits_pool_before_loading_model — proves
cache hit short-circuits BEFORE fastembed model load. Pre-populates
pool with vector for "nonexistent-model-name"; if the cache path
were broken, model load would fail. This is the regression test for
the 0/64 bug.
- single_embedding_hits_pool_before_loading_model — same proof for
single-text generate_embedding
- embedding_key_is_model_namespaced — same text + different model =
distinct cache slots (correctness invariant)
- pool_clear_resets_stats_and_drops_entries — pool.clear() drains
entries AND resets hit/miss/eviction counters
- batch_with_partial_hits_records_correct_hit_count — partial-hit
batch (no misses, all-cached) returns without model.embed()
Tests serialize via TEST_LOCK on the global pool because EMBEDDING_POOL
is a process-global singleton (matches IPC reality) and parallel test
execution would race the shared hit/miss counters.
Branch base: feature/pressure-broker (PR #932), needs stats_blocking +
PressureSource. When #932 lands, this rebases cleanly onto main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
Apr 18, 2026
Phase 3a of the LoRAAdapterPool migration: ship the eviction lever the
PressureBroker needs to drive genome adapter eviction without an
activate_skill call, plus the value-aware EvictionPriority<V> the deeper
Phase 3b migration will need to express priority-weighted scoring as a
pool callback. Bridge step — keeps GenomePagingEngine's 21 existing
tests intact while making it broker-driveable.
What's in this commit:
- PagedResourcePool: EvictionPriority generalized to EvictionPriority<V>
so consumers can read domain-specific metadata from the value during
eviction scoring (an adapter's priority field, an MoE expert's routing
weight, a memory-recall entry's salience). lru_priority and
size_weighted_lru stay value-blind via the V param being unused. This
is the lever Phase 3b's full pool migration of GenomePagingEngine
will use to express the existing `age_seconds / (priority * 10)` formula
as a pool callback.
- GenomePagingEngine::evict_under_pressure(target_pressure: f32) -> u64
Drives existing select_eviction_victim in a loop until memory pressure
drops to target, returning bytes freed. Same victim-selection formula
and critical-adapter protection (priority > 0.9) as activate_skill's
implicit eviction — no behavioral divergence, just a different driver.
Same code path picked the victim either way; this commit just lets the
broker call it.
- cognition/genome-evict-under-pressure IPC handler — exposes the lever
so it's testable manually + ready for the broker singleton to call
once Phase 3b wires per-persona PressureSource wrappers into the
DashMap<persona_id, PersonaCognition>.
Tests (4 new + 21 existing, all passing — 38 total in genome_paging
suite, 54 in paging, 5 in modules::embedding = 97 across the stack):
- test_evict_under_pressure_no_op_when_below_target — already-healthy
pool returns 0 without touching anything
- test_evict_under_pressure_drops_lru_until_target — three normal
adapters at 90% pressure → drop oldest two to reach ≤ 50% target
- test_evict_under_pressure_never_drops_critical_adapters — all-critical
pool yields zero bytes freed; loop terminates honestly without
panicking or dropping a priority>0.9 adapter
- test_evict_under_pressure_releases_gpu_guard_for_each_victim —
verifies guard cleanup + available-map repatriation, even with no
gpu_manager set (clean fallback)
Phase 3b (separate PR — needs broker singleton wiring):
- PressureSource wrapper struct over Arc<Mutex<GenomePagingEngine>>
- Internal pool migration (active HashMap → PagedResourcePool with
EvictionPriority<V> reading adapter.priority)
- GpuAllocationGuard moves into the pool's value type (Drop releases
GPU memory on eviction without engine intervention)
Branch base: feature/embedding-cache-pool (PR #933) → which is based on
feature/pressure-broker (PR #932). When both upstream PRs land, this
rebases cleanly onto main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
joelteply
added a commit
that referenced
this pull request
Apr 19, 2026
Phase 3a of the LoRAAdapterPool migration: ship the eviction lever the
PressureBroker needs to drive genome adapter eviction without an
activate_skill call, plus the value-aware EvictionPriority<V> the deeper
Phase 3b migration will need to express priority-weighted scoring as a
pool callback. Bridge step — keeps GenomePagingEngine's 21 existing
tests intact while making it broker-driveable.
What's in this commit:
- PagedResourcePool: EvictionPriority generalized to EvictionPriority<V>
so consumers can read domain-specific metadata from the value during
eviction scoring (an adapter's priority field, an MoE expert's routing
weight, a memory-recall entry's salience). lru_priority and
size_weighted_lru stay value-blind via the V param being unused. This
is the lever Phase 3b's full pool migration of GenomePagingEngine
will use to express the existing `age_seconds / (priority * 10)` formula
as a pool callback.
- GenomePagingEngine::evict_under_pressure(target_pressure: f32) -> u64
Drives existing select_eviction_victim in a loop until memory pressure
drops to target, returning bytes freed. Same victim-selection formula
and critical-adapter protection (priority > 0.9) as activate_skill's
implicit eviction — no behavioral divergence, just a different driver.
Same code path picked the victim either way; this commit just lets the
broker call it.
- cognition/genome-evict-under-pressure IPC handler — exposes the lever
so it's testable manually + ready for the broker singleton to call
once Phase 3b wires per-persona PressureSource wrappers into the
DashMap<persona_id, PersonaCognition>.
Tests (4 new + 21 existing, all passing — 38 total in genome_paging
suite, 54 in paging, 5 in modules::embedding = 97 across the stack):
- test_evict_under_pressure_no_op_when_below_target — already-healthy
pool returns 0 without touching anything
- test_evict_under_pressure_drops_lru_until_target — three normal
adapters at 90% pressure → drop oldest two to reach ≤ 50% target
- test_evict_under_pressure_never_drops_critical_adapters — all-critical
pool yields zero bytes freed; loop terminates honestly without
panicking or dropping a priority>0.9 adapter
- test_evict_under_pressure_releases_gpu_guard_for_each_victim —
verifies guard cleanup + available-map repatriation, even with no
gpu_manager set (clean fallback)
Phase 3b (separate PR — needs broker singleton wiring):
- PressureSource wrapper struct over Arc<Mutex<GenomePagingEngine>>
- Internal pool migration (active HashMap → PagedResourcePool with
EvictionPriority<V> reading adapter.priority)
- GpuAllocationGuard moves into the pool's value type (Drop releases
GPU memory on eviction without engine intervention)
Branch base: feature/embedding-cache-pool (PR #933) → which is based on
feature/pressure-broker (PR #932). When both upstream PRs land, this
rebases cleanly onto main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
Apr 19, 2026
Phase 3a of the LoRAAdapterPool migration: ship the eviction lever the
PressureBroker needs to drive genome adapter eviction without an
activate_skill call, plus the value-aware EvictionPriority<V> the deeper
Phase 3b migration will need to express priority-weighted scoring as a
pool callback. Bridge step — keeps GenomePagingEngine's 21 existing
tests intact while making it broker-driveable.
What's in this commit:
- PagedResourcePool: EvictionPriority generalized to EvictionPriority<V>
so consumers can read domain-specific metadata from the value during
eviction scoring (an adapter's priority field, an MoE expert's routing
weight, a memory-recall entry's salience). lru_priority and
size_weighted_lru stay value-blind via the V param being unused. This
is the lever Phase 3b's full pool migration of GenomePagingEngine
will use to express the existing `age_seconds / (priority * 10)` formula
as a pool callback.
- GenomePagingEngine::evict_under_pressure(target_pressure: f32) -> u64
Drives existing select_eviction_victim in a loop until memory pressure
drops to target, returning bytes freed. Same victim-selection formula
and critical-adapter protection (priority > 0.9) as activate_skill's
implicit eviction — no behavioral divergence, just a different driver.
Same code path picked the victim either way; this commit just lets the
broker call it.
- cognition/genome-evict-under-pressure IPC handler — exposes the lever
so it's testable manually + ready for the broker singleton to call
once Phase 3b wires per-persona PressureSource wrappers into the
DashMap<persona_id, PersonaCognition>.
Tests (4 new + 21 existing, all passing — 38 total in genome_paging
suite, 54 in paging, 5 in modules::embedding = 97 across the stack):
- test_evict_under_pressure_no_op_when_below_target — already-healthy
pool returns 0 without touching anything
- test_evict_under_pressure_drops_lru_until_target — three normal
adapters at 90% pressure → drop oldest two to reach ≤ 50% target
- test_evict_under_pressure_never_drops_critical_adapters — all-critical
pool yields zero bytes freed; loop terminates honestly without
panicking or dropping a priority>0.9 adapter
- test_evict_under_pressure_releases_gpu_guard_for_each_victim —
verifies guard cleanup + available-map repatriation, even with no
gpu_manager set (clean fallback)
Phase 3b (separate PR — needs broker singleton wiring):
- PressureSource wrapper struct over Arc<Mutex<GenomePagingEngine>>
- Internal pool migration (active HashMap → PagedResourcePool with
EvictionPriority<V> reading adapter.priority)
- GpuAllocationGuard moves into the pool's value type (Drop releases
GPU memory on eviction without engine intervention)
Branch base: feature/embedding-cache-pool (PR #933) → which is based on
feature/pressure-broker (PR #932). When both upstream PRs land, this
rebases cleanly onto main.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The 0/64 hit rate fix the architecture doc promised. Internal Rust callers (`DataModule.backfill_vectors`, `ModuleBackedEmbeddingProvider`, anywhere using `generate_embedding{,s_batch}`) were silently bypassing the embedding cache because it lived inside the IPC handler — `handle_generate` had its own ad-hoc `HashMap<(String, u64), CachedEmbedding>` with djb2 keys + 5-minute TTL + 10k count cap. Internal callers went around it.
This PR moves the cache one layer down: the pool sits behind `generate_embedding{,s_batch}`, so every embedder — IPC handler and Rust caller — hits the same cache uniformly. The IPC handler becomes a thin wrapper around the batched function. The architecture-doc claim is now true.
What's in this commit
Tests (5/5 passing — `cargo test --lib modules::embedding::tests`)
Tests serialize via `TEST_LOCK` on the global pool — `EMBEDDING_POOL` is a process-global singleton (matches IPC reality) and parallel test execution would race shared hit/miss counters. The pure key-shape test doesn't need the lock.
50/50 paging tests still pass — no regressions.
Branch base
This PR is based on `feature/pressure-broker` (PR #932), because Phase 2 needs `stats_blocking` (sync stats for the cache_stats handler called from sync IPC code). When #932 lands, this rebases cleanly onto main.
Test plan
🤖 Generated with Claude Code