Skip to content

feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922

Merged
joelteply merged 3 commits intomainfrom
feature/phase-1-5-consumer-ordering
Apr 18, 2026
Merged

feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922
joelteply merged 3 commits intomainfrom
feature/phase-1-5-consumer-ordering

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Builds on PR #920 (Phase 1 — composer-side stable-first ordering). This is the consumer side that completes the prefix-reuse story end-to-end.

What changes

ChatRAGBuilder.buildContext section 2.4 reorders three injections:

  1. Tool definitions moved from end → start (after identity). They're INVARIANT — belong in the byte-stable prefix region.
  2. Generic source loop unchanged, already iterates Map in insertion order which equals tier-sorted order from Phase 1 (extractFromComposition inserts in result.sections order; that array was tier-sorted by RAGComposer).
  3. HumanPresenceTracker injection moved from start → end. Presence is volatile (changes when users switch rooms) — must live in suffix.

Final assembly order:

identity (INVARIANT)
→ tool definitions (INVARIANT)
→ loop in tier order (remaining INVARIANT → SEMI_STABLE → VOLATILE)
→ human presence (VOLATILE)

Why

PR #920 made the section list byte-deterministic. But the consumer (this builder) was injecting volatile content (human presence) BEFORE the tier-sorted loop and INVARIANT content (tool defs) AFTER. Result: the assembled prompt string still had non-stable bytes in the prefix region — Phase 1 alone wasn't enough for actual prefix-reuse.

This commit makes the assembled string byte-identical-prefix across requests for the same persona+recipe. Combined with future Phase 2 (per-persona DMR slot pinning), llama-server/DMR's prefix-KV-cache reuse fires for real and the ~70× prompt-eval speedup actualizes.

Verification

Sequencing

  1. PR fix(coord): InferenceCoordinator queues instead of denying — fixes #919 #921 lands (queue fix for Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919) — unblocks runtime testing
  2. PR feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920 lands (Phase 1)
  3. This PR lands (Phase 1.5) — completes the consumer side
  4. memento's ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917 lands → unblocks Phase 4
  5. Phase 2 (slot pinning) → and the ~70× actually shows up live

🤖 Generated with Claude Code

joelteply and others added 3 commits April 17, 2026 18:05
Adds PromptTier enum (INVARIANT / SEMI_STABLE / VOLATILE) and makes
every RAGSource declare its tier. RAGComposer sorts collected sections
deterministically by (tier, sourceName) before returning.

Why: today the composer's parallel section assembly produces a different
byte order on every chat call. llama-server / DMR's prefix-KV-cache
reuse never fires, so each turn reprocesses the full 14k-token prompt
from scratch (~35s prompt eval at 400 tok/s). With deterministic
ordering AND stable bytes within each tier, the unchanging INVARIANT
prefix gets reused — only the VOLATILE suffix needs evaluation.
Expected: ~70× faster prompt eval per turn for repeat-context turns.

Architecture (per docs/architecture/MULTIMODAL-WORKER-AND-PREFIX-REUSE.md):
- INVARIANT: persona identity, tool definitions, recipe rules, docs
  (PersonaIdentity, ToolDefinitions, CodeTool, Documentation,
   ToolMethodology, ProjectContext)
- SEMI_STABLE: history, memories, participants, governance — append-only
  (ConversationHistory, LiveRoomAwareness, Governance, OpenProposals,
   SentinelAwareness, GlobalAwareness, SocialMediaRAG, SemanticMemory)
- VOLATILE: latest message, audio chunks, current activity, UI state
  (ActivityContext, CodebaseSearch, MediaArtifact, VoiceConversation,
   WidgetContext)

Implementation note: tier is a class-level declaration on each RAGSource
(required field, no Option<>). Sources return Omit<RAGSection, 'tier'>
from load() and fromBatchResult(); RAGComposer injects the source's
declared tier when wrapping the section. Single-source-of-truth
classification per source — no per-return-statement repetition.

Phases 2 (slot pinning) and 3 (composition cache) build on this.
Phase 4 (multimodal content parts) depends on #917 ModelMetadata.

tsc clean. Branch: feature/prefix-reuse-and-multimodal off main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…boot

CodebaseIndexer ran 64-batches back-to-back with NO yield between
batches. Each batch ~1.5s + ~80MB RSS growth. With 5000+ chunks in
src/, that's 78+ batches × 1.5s = 2+ minutes of total event-loop
saturation immediately after every boot. Local personas couldn't
respond, voice couldn't connect, anything that needed the bus was
blocked until indexing finished.

Two changes:
- Batch size 64→16 (smaller per-batch RSS hit, ~4× more chances
  for other IO to interleave between IPC roundtrips)
- 50ms pause between batches via setTimeout (yields the event loop
  so chat/voice/personas can process while indexing runs)

The throughput cost is small (16 vs 64 chunks per IPC) and the
inter-batch pause is invisible at human timescales. The chat-arrival
latency win is huge — system is responsive within seconds of boot
instead of minutes.

The deeper fix is querying GpuPressureWatcher / ResourcePressureWatcher
before each batch and backing off when pressure is high — same
principle Joel called out for InferenceCoordinator slot capacity.
That's a follow-up; this is the floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ability (#918)

Phase 1 (already shipped in PR #920) sorted RAGComposer's section list
by (tier, sourceName). This commit makes ChatRAGBuilder respect that
order when assembling the final prompt string, so the byte-prefix
actually IS stable end-to-end.

Three reorderings in section 2.4 of buildContext():

1. Tool definitions injection moved from end to start (after identity).
   Tool defs are INVARIANT — they belong in the byte-stable prefix
   region, not after VOLATILE content.

2. The generic source loop already iterates Map in insertion order,
   which equals tier-sorted order from extractFromComposition (which
   inserts in result.sections order, which Phase 1 sorted). So the
   loop now produces INVARIANT → SEMI_STABLE → VOLATILE content
   automatically — no per-section sorting needed.

3. HumanPresenceTracker injection moved from before-the-loop to
   after-the-loop. Presence is volatile (changes when users switch
   rooms) and must live in the suffix, never in the byte-stable prefix.

Final assembly order:
  identity (INVARIANT, from PersonaIdentitySource)
  → tool definitions (INVARIANT)
  → loop in tier order (INVARIANT remaining → SEMI_STABLE → VOLATILE)
  → human presence (VOLATILE)
  → conversation history (already separate, lives in messages array)

Net effect for prefix-reuse: with the same persona+recipe, the
INVARIANT region of the prompt is byte-identical across thousands
of turns. llama-server / DMR's prefix-KV-cache match fires on the
INVARIANT prefix; only the VOLATILE suffix gets reprocessed.
Combined with future per-persona slot pinning (Phase 2), this is
the ~70× prompt-eval speedup the design doc promised.

tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 00:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the prompt prefix-reuse work for Chat RAG by introducing tier-aware section ordering and then reordering consumer-side prompt assembly to keep volatile bytes in the suffix (to enable KV-cache prefix reuse).

Changes:

  • Add PromptTier and thread tier metadata through RAGSource/RAGSection, injecting tier in RAGComposer and sorting sections deterministically by (tier, sourceName).
  • Update all RAG sources to declare a tier and to return Omit<RAGSection, 'tier'> so the composer is the single tier authority.
  • Reorder ChatRAGBuilder injections (tool definitions earlier; human presence later) and throttle codebase indexing embedding batches to reduce startup event-loop starvation.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/system/rag/builders/ChatRAGBuilder.ts Reorders prompt injections to improve stable-prefix behavior (tool defs earlier; presence later).
src/system/rag/services/CodebaseIndexer.ts Reduces embedding batch size and adds inter-batch pause to yield the event loop.
src/system/rag/shared/RAGComposer.ts Injects tiers into sections and sorts sections deterministically by tier + name.
src/system/rag/shared/RAGSource.ts Adds tier to RAGSource/RAGSection, changes load/fromBatchResult return types, and re-exports PromptTier.
src/system/rag/shared/RAGTypes.ts Introduces PromptTier and documents tier ordering contract.
src/system/rag/sources/ActivityContextSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/CodeToolSource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/CodebaseSearchSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/ConversationHistorySource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/DocumentationSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/GlobalAwarenessSource.ts Declares tier and updates load()/fromBatchResult()/helpers to omit tier.
src/system/rag/sources/GovernanceSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/LiveRoomAwarenessSource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/MediaArtifactSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/OpenProposalsSource.ts Declares tier and updates EMPTY_SECTION/load() to omit tier.
src/system/rag/sources/PersonaIdentitySource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/ProjectContextSource.ts Declares tier and updates cached section types + load()/helpers to omit tier.
src/system/rag/sources/SemanticMemorySource.ts Declares tier and updates load()/fromBatchResult()/helpers to omit tier.
src/system/rag/sources/SentinelAwarenessSource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/SocialMediaRAGSource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/ToolDefinitionsSource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/ToolMethodologySource.ts Declares tier and updates load() return type to omit tier.
src/system/rag/sources/VoiceConversationSource.ts Declares tier and updates load()/helpers to omit tier.
src/system/rag/sources/WidgetContextSource.ts Declares tier and updates load()/helpers to omit tier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 21 to +25
import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes';
import { PromptTier } from './RAGTypes';

// Re-export so source files only need one import
export { PromptTier } from './RAGTypes';
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file imports and re-exports PromptTier as a runtime symbol (import { PromptTier } ... + export { PromptTier } ...), but PromptTier is a const enum (erased at emit). In ESM this can cause a hard runtime failure when the re-exported binding doesn’t exist. Safer options: (a) make PromptTier a normal enum, or (b) remove the runtime re-export and have consumers import type { PromptTier } / use string literals.

Suggested change
import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes';
import { PromptTier } from './RAGTypes';
// Re-export so source files only need one import
export { PromptTier } from './RAGTypes';
import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy, PromptTier } from './RAGTypes';
// Re-export so source files only need one import
export type { PromptTier } from './RAGTypes';

Copilot uses AI. Check for mistakes.
Comment on lines 37 to 41
export class ToolDefinitionsSource implements RAGSource {
readonly name = 'tool-definitions';
readonly tier = PromptTier.INVARIANT;
readonly priority = 45;
readonly defaultBudgetPercent = 10;
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolDefinitionsSource is marked PromptTier.INVARIANT, but its output explicitly depends on context.options.currentMessage (contextual group selection + group hints). That makes the section change turn-to-turn, which defeats the stable-prefix goal and violates the tier contract. Either make tool definitions truly invariant (no currentMessage-dependent selection) or split into an invariant “tool catalog/specs” source and a volatile “tool hints for this turn” source (and mark this one SEMI_STABLE/VOLATILE accordingly).

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +34
* Within each tier, sources are sorted by name (alphabetical) so the byte
* order is fully deterministic.
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PromptTier doc says “Within each tier, sources are sorted by name (alphabetical)”, but ChatRAGBuilder now explicitly hoists tool-definitions ahead of other INVARIANT sources. Either update this doc to reflect the actual consumer-side ordering rules, or move all ordering decisions into the composer so the documented contract matches the emitted prompt order.

Suggested change
* Within each tier, sources are sorted by name (alphabetical) so the byte
* order is fully deterministic.
* Within a tier, source order must be deterministic. Consumers may apply
* tier-specific ordering rules before any fallback alphabetical ordering; for
* example, `tool-definitions` is hoisted ahead of other INVARIANT sources.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +51
export const enum PromptTier {
/** Persona system prompt, recipe rules, role identity, tool definitions.
* Changes ~weekly when persona/recipe is edited. Identical bytes across
* thousands of turns for the same persona+recipe. */
INVARIANT = 'invariant',
/** Conversation history, active genome adapters, participants, governance
* state. Grows monotonically — new content APPENDS to the existing
* prefix, doesn't rewrite earlier bytes. */
SEMI_STABLE = 'semi_stable',
/** Latest user message, audio chunks, current timestamp, last-second
* pressure observations. Changes every request. The only region the
* server actually has to reprocess token-by-token. */
VOLATILE = 'volatile',
}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PromptTier is declared as a const enum, but it’s also imported/re-exported as a value (export { PromptTier } ...). Since const enums are erased in JS output, this can break ESM/bundler consumers with “module does not provide an export named 'PromptTier'”. Consider switching PromptTier to a normal export enum (or as const object + union type), or make the re-export/imports type-only and stop re-exporting it as a runtime value.

Copilot uses AI. Check for mistakes.
@joelteply joelteply merged commit 3dfc3a8 into main Apr 18, 2026
8 checks passed
@joelteply joelteply deleted the feature/phase-1-5-consumer-ordering branch April 18, 2026 00:22
joelteply added a commit that referenced this pull request Apr 18, 2026
…onsumer-ordering"

This reverts commit 3dfc3a8, reversing
changes made to a6419b8.
joelteply added a commit that referenced this pull request Apr 18, 2026
Revert: Phase 1.5 ChatRAGBuilder consumer ordering (#922) — bisecting silence regression
@joelteply
Copy link
Copy Markdown
Contributor Author

Reverted via #926 — bisecting a silence regression observed on clean main after tonight's chain merge (#921#923#920#922). #922 was the most recent merge so reverted first. If retest after revert shows main responds, this PR's logic introduced the regression and the consumer-side reordering needs rework before re-merging. Original commits still in git history ().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants