feat(genome): demand-aligned-recall PR-3f — MustIncludeCandidateSource#1382
Merged
Merged
Conversation
Resolves CapabilityQuery.must_include hard pins as candidates per
GENOME-FOUNDRY-SENTINEL Part 7: "Hard pins — recall MUST include
these in the RankedPool even if their score is low. Used for
persona-private LoRA layers and sticky engrams."
Plays through the composite seam shipped in PR-3e: wired AFTER a
resident source like WorkingSetCandidateSource with ByArtifactId
dedup, must-include items that ARE resident get the resident
source's Hot residency + factor data; must-include items NOT
resident get this source's NotResident placeholder (still ranked,
just lower combined score).
What lands
- MustIncludeCandidateSource — zero-state unit struct (no Arc state
needed; the source is pure-function over the query)
- CandidateSource::fetch impl that:
- reads query.must_include Vec<ArtifactRef>
- maps each variant (LoRALayer / MoEExpert / Engram) to a
CandidateArtifact with the appropriate PageKind
- marks every must-include candidate as ResidencyHint::
NotResident { acquirable_from: SentinelRefinement }
- uses NEUTRAL_FACTOR_STUB (0.5) for the three non-tier factors,
same convention as WorkingSetCandidateSource (PR-3d)
Recommended composite wiring
let composite = CompositeCandidateSource::with_default_dedup(vec![
Arc::new(WorkingSetCandidateSource::new(mgr)), // Hot first
Arc::new(MustIncludeCandidateSource::new()), // Pins
// future: catalog walker, federation source
]);
Spec contract met: every hard-pinned artifact surfaces in the
RankedPool; if it's resident, it gets full residency-aware score;
if not, it still appears (at lower combined) so composition can
see "this was pinned but isn't here yet — schedule the foundry."
Tests
6 new tests:
- empty_must_include_returns_empty_candidates (no-error empty
contract)
- variant_mapping_preserves_page_kind (LoRALayer/MoEExpert/Engram
variants → PageKind mapping)
- must_include_marks_candidates_as_not_resident
- factors_use_neutral_stubs_consistent_with_working_set_source
- source_is_object_safe_for_dyn_dispatch
- composite_with_dedup_resident_wins_must_include_for_pinned_hot_
artifact — the architectural payoff: resident pin keeps Hot,
non-resident pin gets NotResident, both appear in merged Vec
6/6 pass. No regressions across other 2873 lib tests.
Stack
- #1346 / #1353 / #1355 / #1358 / #1362 — my working-set-manager
- #1366 — DAR PR-1: pure types
- #1367 + #1370 — DAR PR-2: trait + composite types
- #1371 — DAR PR-3a: scoring function + per-factor curves
- #1372 — DAR PR-3b: LocalDemandAlignedRecall ranking engine
- #1374 — DAR PR-3c: trait impl + CandidateSource seam
- #1378 — DAR PR-3d: WorkingSetCandidateSource (working-set source)
- #1380 — DAR PR-3e: CompositeCandidateSource (extensibility seam)
- THIS PR — DAR PR-3f: MustIncludeCandidateSource (hard-pin source)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
joelteply
added a commit
that referenced
this pull request
May 18, 2026
…ALOG §II) (#1387) PR-1 of inference-llm. Pure typed event surface for the local-LLM generation module. The module itself (composition → tokenizer → llama.cpp invoke → token stream) lands in PR-2/PR-3; PR-1 ships the wire so producers + consumers can build against it today. Unblocked by my just-shipped Lane H + recall + working-set stacks. What lands - InferenceRequestId — typed Uuid newtype; all four events carry the same field name (requestId on wire) for correlation - CompositionPlan — opaque ArtifactId reference; composer module fills the full shape later - SamplingParams { temperature, top_p, top_k, repeat_penalty } with llama.cpp-baseline defaults (0.8 / 0.95 / 40 / 1.1) - GenerationBudget { max_tokens, max_duration_ms } — both honored - FinishReason enum: Stop / MaxTokens / MaxDuration / StopSequence { matched } / Error { reason } — typed per Joel's never-swallow - InferenceRequest — [InferenceRequest] subscription event - InferenceComplete — emission with completion + finish + timing - FirstTokenEmitted — emission for TTFT observability (microsecond precision; sub-ms achievable on warm models) - ResidencyFault — emission when inference would need a not- resident page; sentinel learns + upgrades tier policy Tests 13 behavioral tests + 9 ts-rs export_bindings = 22 total. 22/22 pass. No regressions across other 2883 lib tests. Clippy baseline bump 154→156 — drift from recent canary merges. Fixed two doc-list warnings in this file (reworded "* 1000" math to avoid being parsed as a markdown list item). Stack - Lane H end-to-end (codex's #1331→#1373) - Working-set-manager + DAR end-to-end (mine, #1346→#1382) - THIS PR — inference-llm PR-1: typed event surface - NEXT — PR-2: InferenceLlmModule ServiceModule impl wired to the artifact dispatch - THEN — PR-3: tokenizer + llama.cpp invoke + token stream Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-3f of demand-aligned-recall. Resolves
CapabilityQuery::must_includehard pins as candidates per GENOME-FOUNDRY-SENTINEL Part 7: "Hard pins — recall MUST include these in the RankedPool even if their score is low."Plays through the composite seam (PR-3e #1380): wired AFTER
WorkingSetCandidateSourcewithByArtifactIddedup, must-include items that ARE resident get the working-set'sHotresidency + factor data; items NOT resident get this source'sNotResidentplaceholder (still ranked, just lower combined score).What lands
MustIncludeCandidateSource— zero-state unit struct (no Arc state needed; the source is pure-function over the query).CandidateSource::fetchimpl that:query.must_includeVecCandidateArtifactwith the appropriatePageKindResidencyHint::NotResident { acquirable_from: SentinelRefinement }NEUTRAL_FACTOR_STUB(0.5) for the three non-tier factors — same convention as PR-3dRecommended composite wiring
Spec contract met: every hard-pinned artifact surfaces; if resident, full residency-aware score; if not, still appears at lower combined so composition can see "pinned but not here — schedule the foundry."
Test plan
cargo test --lib --features metal,accelerate genome::recall_source_must_include— 6/6 pass:empty_must_include_returns_empty_candidatesvariant_mapping_preserves_page_kindmust_include_marks_candidates_as_not_residentfactors_use_neutral_stubs_consistent_with_working_set_sourcesource_is_object_safe_for_dyn_dispatchcomposite_with_dedup_resident_wins_must_include_for_pinned_hot_artifact— the architectural payoff: resident pin keeps Hot, non-resident pin gets NotResident, both appear in merged VecStack
LocalDemandAlignedRecallranking engineWorkingSetCandidateSourceCompositeCandidateSourceMustIncludeCandidateSource(hard-pin source)🤖 Generated with Claude Code