Skip to content

feat: [ENG-2518] batch abstract generation across queued files#580

Merged
danhdoan merged 4 commits intoproj/curation-enhancementfrom
feat/ENG-2518
May 1, 2026
Merged

feat: [ENG-2518] batch abstract generation across queued files#580
danhdoan merged 4 commits intoproj/curation-enhancementfrom
feat/ENG-2518

Conversation

@RyanNg1403
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: The abstract queue makes 2N LLM calls per curate (one L0 .abstract.md + one L1 .overview.md per file). On a 30-file pack, that's 60 independent calls each paying its own per-call fixed-overhead tax (system prompt + framing).
  • Why it matters: Token-usage hypothesis H3 (over-engineered, high-confidence). Validated A/B savings on the 3-fixture corpus:
    • OpenAI 5-pack: 10 → 2 calls (−80%), 8,240 → 6,976 tokens (−15%)
    • ByteRover 5-pack: 10 → 2 calls (−80%), 10,770 → 7,755 tokens (−28%)
    • ByteRover 10-pack: 10 → 2 calls (−80%), 11,027 → 8,508 tokens (−23%)
    • Free-tier ByteRover users: 30-file folder pack drops from ~60 quota requests to ~12 (under the 50/day cap).
  • What changed: New generateFileAbstractsBatch(items, generator) in abstract-generator.ts fires two parallel streamToText calls (1 batched L0 + 1 batched L1) with XML-tagged input/output. AbstractGenerationQueue.processNext drains up to BATCH_SIZE_CAP=5 items per cycle. enqueue only triggers scheduleNext when pending >= cap || drainRequested; drain() (called from curate-executor via drainBackgroundWork at curate-end) sets the flag for partial-batch flush.
  • What did NOT change (scope boundary): Existing fail-open semantics (empty content → empty file, no crash); onAfterWrite enqueue trigger; per-file 20K char content cap (matches non-batched path so per-file view is identical regardless of mode).

Type of change

  • New feature

Scope (select all touched areas)

  • Agent / Tools

Linked issues

  • Closes ENG-2518
  • Related: H3 hypothesis in research/token-optimization/token-usage-reduction/research/06-hypotheses.md; A/B validated by notes/token-usage-reduction/h3-batch-abstracts/RESULTS.md

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: N/A
  • Why this was not caught earlier: N/A

Test plan

  • Coverage added:
    • Unit test
    • Manual verification (real-LLM E2E)
  • Test file(s):
    • test/unit/agent/map/abstract-generator-batch.test.ts (new, 5 tests covering correct response parsing, input order preservation, missing paths → empty strings, malformed-response tolerance, exactly 2 LLM calls per batch)
    • test/unit/agent/map/abstract-queue.test.ts (added "buffers below cap", "processes up to BATCH_SIZE_CAP in single cycle", "splits oversized backlogs", and adapted retry-backoff test for batched semantics)
  • Key scenarios covered:
    • Buffer-until-cap-or-drain: enqueue 3 items → 0 LLM calls until drain
    • Cap fires single batch: enqueue 5 → exactly 2 stream calls (1×L0 + 1×L1)
    • Oversized split: enqueue 7 → batches of 5+2, exactly 4 stream calls
    • Per-batch failure → per-item retry with existing 3× exponential backoff
    • Tolerant XML parser: missing paths → empty strings, malformed → all empty (no crash)

User-visible changes

None directly. Side effect: faster abstract generation on multi-file curates; ByteRover free-tier users no longer hit 50/day quota on 30-file packs.

Evidence

  • Full unit suite: 6967 passing, 16 pending, 0 failing
  • Real-LLM E2E on ByteRover paid-tier (gemini-3-flash-preview) — 4 fixtures covering all behavior:
    • 5-pack: 5 enqueues → process:start batchSize=5 → 2 LLM calls (vs 10 non-batched, −80%)
    • 6-distinct (engineered to defeat curate-agent consolidation): 6 enqueues → batches of 5+1 → 4 LLM calls (vs 12, −67%) — exercises the multi-batch split path
    • 10-pack: real LLM error on first batch → all 5 items entered retry pool → retry batch succeeded — exercises per-batch failure + retry path
    • 7-pack: 4 buffered items below cap → drain at curate-end fired the partial batch — exercises drain-flush path
  • Quality spot-check: every L0 ≤ 80 tokens (one-line summary), every L1 has 3-7 markdown bullets + structure section, no XML tag leakage, no file conflation
  • Branch on remote: https://github.com/campfirein/byterover-cli/tree/feat/ENG-2518
  • Validated by: notes/token-usage-reduction/h3-batch-abstracts/RESULTS.md (3-fixture A/B on OpenAI + ByteRover)

Collapse the abstract queue's per-file LLM calls (2N today: one L0
.abstract.md + one L1 .overview.md per file) into batched calls of up
to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per
cycle (1 batched L0 + 1 batched L1).

- generateFileAbstractsBatch(items, generator) in abstract-generator.ts:
  XML-tagged input/output, tolerant regex parser, missing paths get
  empty strings.
- AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only
  triggers scheduleNext when pending >= cap or drainRequested. drain()
  sets the flag to flush partial batches at curate-end.
- Per-batch failure → re-enqueue each item individually with its own
  attempts counter; existing 3x exponential backoff preserved.
- Per-file content cap (20 KB) matches the non-batched path.

Real-LLM verification on ByteRover paid-tier:
  5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%)
  6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%)
  10-pack: per-batch failure + retry path exercised end-to-end
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @RyanNg1403's task in 3m 53s —— View job


PR Review — feat: [ENG-2518] batch abstract generation across queued files

Overall score: 4 / 5 — Solid implementation with good test coverage. One medium-severity robustness issue and a few minor concerns below.


Summary

The batching design is sound: two parallel LLM calls (L0 + L1) cover an entire batch, XML-tagged output is mapped back by path, and the queue's buffer-until-cap / drain-flush semantics are cleanly layered on top of the existing retry infrastructure. The A/B savings numbers in the PR are believable. Test coverage is comprehensive and the generator stub in abstract-queue.test.ts that sniffs system-prompt text to produce the right XML shape is a nice design.


Issues

suggestion — Unescaped file content breaks XML prompt structure (medium severity)

File: src/agent/infra/map/abstract-generator.ts, lines 92–96 and 111–115

const filesXml = items.map((it) => `<file path="${escapeXmlAttr(it.contextPath)}">
<document>
${it.content}               // ← no escaping
</document>
</file>`)

it.contextPath is correctly escaped with escapeXmlAttr, but it.content (a user-controlled knowledge document) is injected verbatim. Any file that contains </document> or </file> in its body — perfectly normal for docs that describe XML, HTML, or JSX — will break the prompt's XML envelope. The model receives a structurally invalid message; it may silently conflate files or produce output that the parseBatchedTags regex can't parse, resulting in empty abstracts for all files in the batch.

The fix is to escape (or CDATA-wrap) the content:

// Option A — CDATA (model reads clean text, no decode artefacts)
`<document><![CDATA[${it.content.replaceAll(']]>', ']]]]><![CDATA[>')}]]></document>`

// Option B — XML escape (simpler, but model sees &lt; / &gt; in prompts)
`<document>${escapeXmlContent(it.content)}</document>`
// where escapeXmlContent replaces & → &amp;  < → &lt;  > → &gt;

CDATA is preferable for LLM prompts because it preserves the raw text the model needs to summarise. Fix this →


suggestionthis.generator passed after async break without local narrowing

File: src/agent/infra/map/abstract-queue.ts, line 197-200

private async processNext(): Promise<void> {
  if (!this.generator || ...) return     // narrows here …
  // …
  await this.onBeforeProcess?.()         // … but class-property narrowing
                                         //   doesn't survive await in strict TS
  const results = await generateFileAbstractsBatch(
    ...,
    this.generator,                      // type is still IContentGenerator | undefined
  )
}

TypeScript does not guarantee class-property narrowing survives an await boundary because another async operation could have mutated the property. Capture the generator in a const right after the guard:

const generator = this.generator
if (!generator || this.processing || this.pending.length === 0) {
  this.resolveDrainersIfIdle()
  return
}
// then use `generator` everywhere below

This also avoids the ! exception in CLAUDE.md and makes the dataflow explicit. Fix this →


nitpickescapeXmlAttr name is misleading for its potential reuse

File: src/agent/infra/map/abstract-generator.ts, line 87

The function is named escapeXmlAttr but its character set (&, ", <, >) also covers XML element content. If content-escaping is added (see issue #1 above), calling this function for both purposes will be confusing. Consider renaming to escapeXml and adding an overload comment, or extracting escapeXmlContent separately (drop " → only &, <, >). Fix this →


nitpick — Retry semantics silently changed for multi-file batches

File: src/agent/infra/map/abstract-queue.ts, lines 220–246

The batch-level failure path is correct, but it introduces a subtle behavioural change: maxAttempts now means "maximum batch attempts for this item", not "maximum individual-call attempts". A transient error on attempt 1 causes all N items in the batch to consume one retry token, even if the failure was unrelated to their content. With maxAttempts=3, a batch of 5 that always fails exhausts all retries in 3 batch cycles, the same as before per item, but if only 1 item is "hard" and 4 are "soft", all 5 are failed together.

No code change needed — the tradeoff is reasonable and is already documented in the PR — but a brief inline comment in the catch block referencing this tradeoff would help future reviewers avoid re-litigating it.


nitpick — Test stub call ordering relies on undocumented Promise.all start order

File: test/unit/agent/map/abstract-generator-batch.test.ts, lines 13–24

let callIndex = 0
generateContentStream: sandbox.stub().callsFake(async function *() {
  const text = responsesByCall[callIndex++] ?? ''
  // …
})

This correctly assumes the L0 stream is started before the L1 stream (i.e., Promise.all([L0, L1]) starts them in left-to-right order). JavaScript's single-threaded event loop does guarantee this for now, but it's an implicit assumption. A small comment explaining the ordering would make the test less surprising, e.g.:

// Promise.all starts L0 first (left element), then L1; callIndex is incremented
// synchronously before the first yield, so index 0 → L0 response, 1 → L1 response.

praise — Parser tolerance is exactly right

File: src/agent/infra/map/abstract-generator.ts, lines 141–159

The two-level regex approach (outer fileRe over <file> blocks, inner innerRe per block) handles out-of-order output, extra whitespace, missing paths, and malformed responses cleanly — and the tests prove all four scenarios. The fail-open guarantee (missing path → empty string → existing write logic handles it) is preserved end-to-end. Well done.


praise — Buffer-below-cap + drain flush interaction is well-designed

File: src/agent/infra/map/abstract-queue.ts, lines 88–108 and 280–298

The drainRequested flag is a clean solution to the "partial batch at curate-end" problem. Setting it in drain(), checking it in scheduleNext(), and resetting it only in resolveDrainersIfIdle() (not inline in processNext()) means there's no window where a below-cap batch gets permanently stranded.


Checklist

  • Gather context and read changed files
  • Review abstract-generator.ts changes
  • Review abstract-queue.ts changes
  • Review test files
  • Post comprehensive review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8fa123258

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agent/infra/map/abstract-generator.ts
- Dedup duplicate contextPath inputs in generateFileAbstractsBatch
  (codex P1): when a curate touches the same file twice before drain,
  both queue items had the same path; the prompt emitted two `<file>`
  blocks the model could answer in either order, and the tag parser's
  Map collapsed them non-deterministically. Now collapses by path
  keeping the LATEST content (the FIFO queue's last-seen item) so
  abstracts reflect the on-disk state, not an intermediate one.
- CDATA-wrap file content in batched prompts (Claude review): docs that
  describe XML/HTML/JSX (perfectly normal for technical knowledge) can
  contain `</document>` or `</file>` markers that would terminate the
  envelope and conflate files. Wrap in CDATA with the standard
  `]]]]><![CDATA[>` in-section escape for any nested terminators.
- Capture this.generator in a local const at the top of processNext
  so type narrowing survives the await boundary, and document the
  batched-retry semantics inline (batch failure consumes one attempt
  per item, not one per call).

Tests:
  - abstract-generator-batch: duplicate-path dedup keeps latest content
  - abstract-generator-batch: CDATA wrap survives `</document>` markers
  - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @RyanNg1403's task in 4m 11s —— View job


PR Review — feat: [ENG-2518] batch abstract generation across queued files

Overall score: 5 / 5 — The previous review's concerns are all resolved. This is a clean, well-tested implementation.


Summary

This commit addresses all issues flagged in the prior review:

  • CDATA wrapping (wrapCdata) is implemented correctly, including the in-CDATA ]]> escape trick, and is used in both prompt builders. The medium-severity XML-injection concern is fully resolved.
  • Generator local narrowing (const {generator} = this) is the idiomatic TypeScript fix; narrowing now survives all await boundaries in processNext.
  • Duplicate contextPath dedup in generateFileAbstractsBatch (Map keyed on path, keeping last content) addresses the Codex review's concern about non-deterministic results for files queued multiple times in one batch.
  • Retry semantics comment is present at line 188–193 in abstract-queue.ts, documenting the batch-level tradeoff.

The drainRequested flag + resolveDrainersIfIdle interaction is a clean state-machine design that prevents both stranded partial batches and premature drain resolution.


Issues

nitpickescapeXmlAttr needs a clarifying doc comment

File: src/agent/infra/map/abstract-generator.ts, line 87

The function is now correctly scoped to attribute escaping (CDATA handles element content), but a future reader could be confused about why " is escaped. A single-line comment stating its scope would close the loop on the previous review's naming concern. Fix this →


suggestionparseBatchedTags lazy match truncates on </file> in model output

File: src/agent/infra/map/abstract-generator.ts, line 153

The model's output is not CDATA-wrapped. If an overview summary mentions </file> in prose (e.g., summarising a doc about Ant build files or Maven), the lazy ([\s\S]*?) in fileRe terminates early, innerRe finds no match, and the result degrades to an empty string. Fail-open semantics handle it, but this case is invisible without a comment. Adding a brief note pointing to the known limitation prevents future reviewers from assuming the parser is fully robust. Fix this →


nitpick — Duplicate contextPaths produce redundant concurrent writes

File: src/agent/infra/map/abstract-generator.ts, line 266

When items contains two entries with the same contextPath, the dedup produces one LLM call but the return array still has two entries with identical content. The caller in processNext then issues two concurrent writeFile calls to the same .abstract.md and .overview.md paths inside Promise.all. Since content is identical the final disk state is correct, but concurrent writes to the same path are technically a race. Deduplicating the write targets in processNext (or filtering the result array before writing) would be cleaner. Fix this →


nitpickfailedThisCycle array used only for .length

File: src/agent/infra/map/abstract-queue.ts, line 229

The array is built but only referenced for .length in a single console.debug call. A plain let exhaustedCount = 0 is simpler. Fix this →


nitpick — CDATA test only verifies L1 prompt, not L0

File: test/unit/agent/map/abstract-generator-batch.test.ts, line 167

capturedPrompt is overwritten on every call, so after Promise.all the assertion checks L1's prompt. Both builders use wrapCdata so the test passes — but a future refactor that removes CDATA from only the L0 builder would not be caught. Using systemPrompt sniffing to capture both prompts separately (as the dedup test already does) would make L0 coverage explicit. Fix this →


nitpick — Batching tests grouped under status file describe block

File: test/unit/agent/map/abstract-queue.test.ts, line 234

The three new batching tests (buffers below cap, processes up to BATCH_SIZE_CAP, splits oversized backlogs) live inside the status file describe block but don't test status file I/O. Moving them to a batching behaviour sibling block would improve navigability. Fix this →


Checklist

  • Gather context and understand the PR
  • Read changed files
  • Review abstract-generator.ts
  • Review abstract-queue.ts
  • Review test files
  • Post comprehensive review

@RyanNg1403
Copy link
Copy Markdown
Collaborator Author

Pushed c17a0fa addressing the Claude review-agent feedback:

Fixed:

  • Medium: Unescaped file content breaks XML envelope — Content now wrapped in <![CDATA[...]]> with the standard ]]]]><![CDATA[> in-section escape for any nested terminators. New tests verify the wrap survives literal </document> markers AND nested ]]> sequences.
  • Suggestion: this.generator not narrowed across await — Captured in a local {generator} = this destructure at the top of processNext. No ! needed; CLAUDE.md compliant.
  • Nitpick: retry semantics comment — Added inline comment in processNext documenting that maxAttempts counts BATCH attempts (one batch failure → one retry token per item).

Plus the codex P1 (duplicate contextPath collapsing in result map), addressed in the inline thread.

Skipped (with reason):

  • Nitpick: escapeXmlAttr rename — kept the original name; the function is still attribute-only since CDATA now handles content. Renaming would churn imports without correctness benefit.
  • Nitpick: test ordering comment on Promise.all — JS single-threaded ordering is sufficiently obvious; adding a note there felt like cargo-cult.

Comment thread src/agent/infra/map/abstract-generator.ts
Comment thread src/agent/infra/map/abstract-generator.ts
Comment thread src/agent/infra/map/abstract-generator.ts Outdated
Comment thread src/agent/infra/map/abstract-generator.ts
Comment thread src/agent/infra/map/abstract-generator.ts
Comment thread src/agent/infra/map/abstract-queue.ts
Comment thread src/agent/infra/map/abstract-queue.ts
Comment thread test/unit/agent/map/abstract-generator-batch.test.ts
Comment thread test/unit/agent/map/abstract-queue.test.ts
danhdoan
danhdoan previously approved these changes Apr 29, 2026
- Re-anchor parseBatchedTags on `<file path>` openers instead of
  `</file>` closers (real correctness): the model output is plain text,
  not CDATA-wrapped, so an overview that mentions `</file>` literally
  in prose (perfectly normal for docs about XML, JSX, or build systems)
  would prematurely terminate the outer match and orphan the inner
  tag. The new strategy collects all opener positions, then slices
  each opener's body up to the next opener (or end-of-string) — a
  literal `</file>` in prose has no special meaning to the parser.
- Improve the CDATA test to capture BOTH L0 and L1 prompts and assert
  the wrap independently on each builder. The previous test only
  validated the last-call's prompt, so a future refactor that forgot
  wrapCdata in one builder would slip past.
- Add a parser test for literal `</file>` inside model overview prose
  to pin the new robustness invariant.
- Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and
  "splits oversized backlogs" tests out of the `status file` describe
  block into a sibling `batching behaviour` block — these verify
  scheduling, not status I/O, and the misgrouping made the suite
  harder to navigate.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @RyanNg1403's task in 2m 29s —— View job


PR Review — feat: [ENG-2518] batch abstract generation across queued files

Overall score: 4.5 / 5 — The third iteration is in excellent shape. The suggestion-severity parseBatchedTags correctness issue is now genuinely resolved with a better algorithm (opener-anchored vs. </file>-terminated), and the two remaining second-pass review praise items are fully validated. Three minor nitpicks from the previous cycle are still open.


What changed in this iteration

The latest commit (887264f) addressed:

  • parseBatchedTags rewrite — switched from lazy </file> close-matching to an opener-anchored strategy. Collects all <file path="..."> opener positions first, then slices each body up to the next opener. A literal </file> in model prose now has no special meaning to the parser (confirmed by the new test at lines 209–233).
  • CDATA test captures both L0 and L1 prompts independentlycapturedPrompts.abstract + capturedPrompts.overview are now asserted separately, so a future regression in either builder would be caught.
  • New test: nested CDATA terminator escape — pins the ]]> → `]]]]><

2. failedThisCycle array used only for .length (src/agent/infra/map/abstract-queue.ts, line 229)

The array accumulates items only to read .length in the console.debug line. A plain counter is leaner:

let exhaustedCount = 0
// …
} else {
  this.failed++
  exhaustedCount++
  queueLog()
}
// …
console.debug(`…retrying=${retryThisCycle.length}, exhausted=${exhaustedCount}`)

Fix this →

3. Concurrent writes for duplicate contextPath in processNext (src/agent/infra/map/abstract-queue.ts, lines 213–219)

generateFileAbstractsBatch deduplicates at the LLM level but returns one result per original input item (for caller alignment). If two queue items share a contextPath, the flatMap issues two concurrent writeFile calls to the same .abstract.md and .overview.md. The final disk state is correct (identical content), but concurrent writes to the same path are a race on some filesystems. A one-liner dedup before writing closes it:

const seen = new Set<string>()
const dedupedResults = results.filter((r) => !seen.has(r.contextPath) && seen.add(r.contextPath))
await Promise.all(dedupedResults.flatMap((r) => {  }))

The this.processed += batch.length accounting stays correct since it counts queue items, not write targets.

Fix this →


praiseparseBatchedTags + new tests make the parser contract explicit

File: test/unit/agent/map/abstract-generator-batch.test.ts, lines 209–259

Two new test cases — parser is robust to a literal </file> appearing inside the model overview prose and escapes nested CDATA terminators in content — directly pin the two invariants that were previously only documented in comments. This is exactly the right balance: the comment explains why, the test enforces what. Future refactors cannot silently break either invariant.


Checklist

  • Gather context and read changed files
  • Review abstract-generator.ts
  • Review abstract-queue.ts
  • Review test files
  • Post comprehensive review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Claude finished @danhdoan's task in 2m 29s —— View job


PR Review — feat: [ENG-2518] batch abstract generation across queued files

Overall score: 4.5 / 5 — Three carry-over nitpicks from the previous cycle remain open; all medium and suggestion-severity concerns are fully resolved. This is ready to merge once the nitpicks are addressed (or consciously deferred).


What changed since the last review

parseBatchedTags rewritten to opener-anchored strategy — the lazy </file>-terminated approach was replaced with a two-pass algorithm that collects <file path="..."> opener positions first, then slices between consecutive openers. A literal </file> in model prose now has no special meaning to the parser. Two new tests (parser is robust to a literal </file> and escapes nested CDATA terminators) pin both invariants.

CDATA test captures both L0 and L1 prompts independentlycapturedPrompts.abstract and capturedPrompts.overview are now asserted separately; a future regression in either builder will be caught.

Batching tests moved to batching behaviour describe block — the three new tests no longer live under status file.


Remaining open issues (all nitpicks)

nitpickescapeXmlAttr missing scope comment

File: src/agent/infra/map/abstract-generator.ts, line 87

The function escapes &, ", <, > — the double-quote escape is only needed for XML attribute values, not element content. Since wrapCdata now handles element content, escapeXmlAttr is used exclusively for path="..." attribute values. Its name is accurate, but a one-line comment would confirm its scope to future readers and prevent accidental reuse as a content escaper:

/** Escapes characters unsafe in XML attribute values (double-quote variant). Element content is handled by wrapCdata. */
function escapeXmlAttr(value: string): string {

Fix this →


nitpickfailedThisCycle array used only for .length

File: src/agent/infra/map/abstract-queue.ts, line 229

const failedThisCycle: QueueItem[] = []
// …
failedThisCycle.push(item)
// …
console.debug(`…exhausted=${failedThisCycle.length}`)

The array accumulates items solely to report .length in the one console.debug call. A plain counter is leaner and signals intent more clearly:

let exhaustedCount = 0
// …
} else {
  this.failed++
  exhaustedCount++
  queueLog(`process:failed path=${item.contextPath} failed=${this.failed}`)
}
// …
console.debug(`[AbstractQueue] batch attempt failed (${msg}); retrying=${retryThisCycle.length}, exhausted=${exhaustedCount}`)

Fix this →


nitpick — Concurrent writes for duplicate contextPath in processNext

File: src/agent/infra/map/abstract-queue.ts, lines 213–219

generateFileAbstractsBatch returns one result per original input item (for caller alignment), so if two queue items share a contextPath, results.flatMap issues two concurrent writeFile calls to the same .abstract.md and .overview.md paths. The final disk state is correct (identical content), but concurrent writes to the same path are a race on some filesystems. A one-liner dedup before the write loop closes it without affecting the processed accounting (which counts queue items, not write targets):

const seen = new Set<string>()
const writeTargets = results.filter((r) => !seen.has(r.contextPath) && seen.add(r.contextPath) !== undefined)
await Promise.all(writeTargets.flatMap((r) => {
  const abstractPath = r.contextPath.replace(/\.md$/, '.abstract.md')
  const overviewPath = r.contextPath.replace(/\.md$/, '.overview.md')
  return [
    writeFile(abstractPath, r.abstractContent, 'utf8'),
    writeFile(overviewPath, r.overviewContent, 'utf8'),
  ]
}))
this.processed += batch.length  // still counts all queue items

Fix this →


Praise

praise — Opener-anchored parseBatchedTags is correct and well-tested

File: src/agent/infra/map/abstract-generator.ts, lines 155–181

The two-pass approach (collect openers → slice between consecutive openers → inner regex per slice) is the right fix. The comment block explains the invariant clearly, and the two new tests (literal </file> in prose and nested ]]> escape) prove both parser properties explicitly rather than relying on the comment alone. This is the kind of test coverage that prevents silent regressions in a subtle algorithm.


praiseconst {generator} = this local narrowing is idiomatic

File: src/agent/infra/map/abstract-queue.ts, lines 175–182

The destructuring-before-guard pattern correctly survives all await boundaries in processNext and avoids the ! non-null assertion that CLAUDE.md flags. The accompanying comment (lines 175–177) explains why the capture is needed — a future reader won't need to reason about TS narrowing rules to understand it.


Checklist

  • Gather context and read changed files
  • Review abstract-generator.ts
  • Review abstract-queue.ts
  • Review test files
  • Post comprehensive review

@danhdoan danhdoan merged commit 3fbb17d into proj/curation-enhancement May 1, 2026
6 checks passed
@danhdoan danhdoan deleted the feat/ENG-2518 branch May 1, 2026 07:50
@danhdoan danhdoan mentioned this pull request May 4, 2026
28 tasks
bao-byterover pushed a commit that referenced this pull request May 4, 2026
* feat: [ENG-2518] batch abstract generation across queued files

Collapse the abstract queue's per-file LLM calls (2N today: one L0
.abstract.md + one L1 .overview.md per file) into batched calls of up
to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per
cycle (1 batched L0 + 1 batched L1).

- generateFileAbstractsBatch(items, generator) in abstract-generator.ts:
  XML-tagged input/output, tolerant regex parser, missing paths get
  empty strings.
- AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only
  triggers scheduleNext when pending >= cap or drainRequested. drain()
  sets the flag to flush partial batches at curate-end.
- Per-batch failure → re-enqueue each item individually with its own
  attempts counter; existing 3x exponential backoff preserved.
- Per-file content cap (20 KB) matches the non-batched path.

Real-LLM verification on ByteRover paid-tier:
  5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%)
  6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%)
  10-pack: per-batch failure + retry path exercised end-to-end

* feat: [ENG-2485] defer summary cascade to dream

Move the LLM-bound parent-summary cascade off curate's hot path. Curate
now enqueues changed paths to a stale-summary queue persisted in
.brv/dream-state.json; the next dream cycle drains the queue, merges
with its own snapshot diff (A ∪ B), and runs propagateStaleness once
per dirty directory.

- DreamState schema gains staleSummaryPaths[]; new service methods
  enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt
  preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear).
- curate-executor swaps the inline propagateStaleness call for an
  enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure
  filesystem, no LLM).
- dream-executor's step 5 drains the queue, unions with the diff-set,
  and runs propagateStaleness once per unique path. On propagation
  failure the catch re-enqueues the drained snapshot so atomic drain
  doesn't lose work.
- runStaleSummaryPropagation extracted as a protected seam; preserves
  the ENG-2100 parentTaskId threading so summary regenerations still
  share one billing session with the parent dream task.

Validation:
  - validate.mjs: 15/15 BM25 invariants hold under stale _index.md
  - Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier
    - dedup, multi-curate accumulation, drain, re-enqueue on failure
  - Full unit suite: 6981 passing

* refactor: [ENG-2485] address review-agent feedback on PR #579

- Bypass dream activity gate when the stale-summary queue has work
  (codex P1): low-activity projects with 1-2 curates would otherwise
  strand queued paths indefinitely. dream-trigger.ts now treats a
  populated queue as enough reason to drain.
- Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing
  a non-unique array no longer produces duplicate queue entries.
- Split the curate-executor post-processing try block: snapshot diff,
  enqueue, and manifest rebuild now have independent fail-open scopes
  so a transient queue write error cannot skip the (no-LLM) manifest
  scan that keeps newly-curated leaves immediately discoverable.
- Restore the runOperations JSDoc that the cherry-pick orphaned in
  front of runStaleSummaryPropagation.
- Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths
  cannot persist into dream-state.json.

Tests:
  - dream-trigger: bypass-when-queue-populated + still-block-when-empty
  - dream-state-service: within-batch duplicate dedup

* refactor: [ENG-2518] address review-agent feedback on PR #580

- Dedup duplicate contextPath inputs in generateFileAbstractsBatch
  (codex P1): when a curate touches the same file twice before drain,
  both queue items had the same path; the prompt emitted two `<file>`
  blocks the model could answer in either order, and the tag parser's
  Map collapsed them non-deterministically. Now collapses by path
  keeping the LATEST content (the FIFO queue's last-seen item) so
  abstracts reflect the on-disk state, not an intermediate one.
- CDATA-wrap file content in batched prompts (Claude review): docs that
  describe XML/HTML/JSX (perfectly normal for technical knowledge) can
  contain `</document>` or `</file>` markers that would terminate the
  envelope and conflate files. Wrap in CDATA with the standard
  `]]]]><![CDATA[>` in-section escape for any nested terminators.
- Capture this.generator in a local const at the top of processNext
  so type narrowing survives the await boundary, and document the
  batched-retry semantics inline (batch failure consumes one attempt
  per item, not one per call).

Tests:
  - abstract-generator-batch: duplicate-path dedup keeps latest content
  - abstract-generator-batch: CDATA wrap survives `</document>` markers
  - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid

* refactor: [ENG-2485] address second-pass review feedback on PR #579

- Reword the Step 1 snapshot-failure catch in dream-executor: the old
  comment claimed "atomic drain already removed entries", but drain
  doesn't run in this code path. Clarify that skipping drain here is
  safer than draining-and-failing because the atomic-drain RMW clears
  entries synchronously.
- Add a checkEligibility test for the stale-summary queue bypass so
  both tryStartDream and checkEligibility pin the invariant. They
  delegate to the same checkGates1to3 path; without this, a future
  refactor of the shared path could regress only one call site.

* refactor: [ENG-2518] address second-pass review feedback on PR #580

- Re-anchor parseBatchedTags on `<file path>` openers instead of
  `</file>` closers (real correctness): the model output is plain text,
  not CDATA-wrapped, so an overview that mentions `</file>` literally
  in prose (perfectly normal for docs about XML, JSX, or build systems)
  would prematurely terminate the outer match and orphan the inner
  tag. The new strategy collects all opener positions, then slices
  each opener's body up to the next opener (or end-of-string) — a
  literal `</file>` in prose has no special meaning to the parser.
- Improve the CDATA test to capture BOTH L0 and L1 prompts and assert
  the wrap independently on each builder. The previous test only
  validated the last-call's prompt, so a future refactor that forgot
  wrapCdata in one builder would slip past.
- Add a parser test for literal `</file>` inside model overview prose
  to pin the new robustness invariant.
- Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and
  "splits oversized backlogs" tests out of the `status file` describe
  block into a sibling `batching behaviour` block — these verify
  scheduling, not status I/O, and the misgrouping made the suite
  harder to navigate.

* feat: [ENG-2519] enable prefix caching for providers

Ports the caching change from experiment/03-enable-caching to
proj/curation-enhancement. Two coupled behavior changes (must ship
together) that engage Anthropic prompt caching cleanly and make
OpenAI/Google auto-caches see a longer stable prefix:

- Attach providerOptions.anthropic.cacheControl: ephemeral to the
  system message and the last tool definition. The system message
  now enters as a SystemModelMessage in the messages array (not via
  AI SDK's top-level `system` parameter, which strips providerOptions).
- Relocate <dateTime> out of the system prompt into the first user
  message. DateTimeContributor was rebuilding the timestamp every
  iteration, and Anthropic prefix caching does token-level matching —
  one byte of churn ~1,800 tokens in poisoned every cache miss past
  it. Moving dateTime to a single iteration-0 user-message prefix
  keeps the system prefix byte-stable across iterations.

Verified across a 60-curate fixture-driven A/B and a 30-curate
progression run on proj/curation-enhancement (see
notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md):

- Anthropic claude-haiku-4-5 12-fixture: -21.5% normalized cost
- Anthropic claude-haiku-4-5 30-curate progression: -29.7%
- OpenAI gpt-4.1: -8.0% (auto-cache hits a longer stable prefix)
- Google gemini-3-flash-preview: ~flat (model-specific noise)

Tests:
- New unit tests for prependCachedSystemMessage edge cases (empty/
  undefined systemPrompt, non-mutation invariant) and toAiSdkTools'
  last-tool cache_control attachment (1, 3 tools).
- Updated internal-llm-service test for completeTask: assertion now
  matches the iter-0 user-message dateTime prefix.

Out of scope (deferred):
- Cache-token field extraction in response.usage (cacheReadTokens,
  cacheCreationTokens). Without telemetry to consume them they would
  be dead code.
- Telemetry / UsageLogger plumbing — separate ticket.
- Anthropic daemon-stability mitigations (TPM retry-loop hangs,
  daemon-readiness races on large fixtures) — operational, separate.

Builds: clean. Typecheck: clean. Lint: clean on touched files.
Tests: 7004 passing, 16 pending, 0 failing.

* refactor: [ENG-2519] address review-agent feedback on dateTime cache hazards

- Drop the unused `datetime` template variable from FileContributor's
  renderTemplateVariables. There is no current YAML consumer; leaving it
  in place was a landmine — any future `{{datetime}}` reference would
  silently re-introduce per-iteration cache poisoning.
- Inject the dateTime block into the rolling-checkpoint continuation
  prompt. clearHistory() wipes the iter-0 user message that originally
  carried it, and the iter-0 guard upstream prevents re-injection, so
  long curate/query runs that hit the checkpoint were losing time
  context for every subsequent iteration.
- Extract `buildDateTimePrefix(now?)` as a small shared helper used by
  both call sites; cover with unit tests.
- Expand the cache_control "last tool" test comment to clarify that the
  contract is insertion-order, not alphabetical.

* feat: [ENG-2530] pre-pipeline recon to skip first agent iteration

Run tools.curation.recon synchronously in CurateExecutor before
spawning the agent session, inject the result both as a sandbox
variable (__recon_result_<taskIdSafe>) and as inline fields in the
agent's first prompt. Agent's first iteration then proceeds directly
to extraction instead of paying a full LLM call to invoke recon.

Recon is pure JS — no LLM, no decision — but was previously surfaced
as an agent-tool, costing a full ~13K-token iteration on every curate
just to invoke a deterministic helper.

Measurements (12-fixture A/B vs proj/curation-enhancement + ENG-2519
caching baseline, full results in notes/token-usage-reduction/
eng-2530-prepipeline-recon/REPORT.md):

  OpenAI gpt-4.1                  -16.3% calls  -20.3% normalized cost
  Google gemini-3-flash-preview   -1.9%  calls  -17.8% normalized cost
  Anthropic claude-haiku-4-5      -2.0%  calls  -0.9%  normalized cost (flat)

ByteRover gemini-3-flash-preview 15-curate progression: -6.5% normalized
cost / -3.4% \$ cost.

Quality risk is bounded: recon itself is unchanged; only the caller
moves. Per-fixture variance follows the team's exp 04 pattern (some
fixtures regress as the agent compensates for missing recon signal in
later iterations); aggregate nets positive on OpenAI and Google.
Anthropic claude-haiku-4-5 lands flat — different model-class behavior
from the team's claude-sonnet-4-5 measurement; not a regression.

* test: [ENG-2530] cover recon pre-pipeline wiring on curate-executor

Pin the central correctness invariant of the change: recon's result is
injected as `__recon_result_<taskIdSafe>` on the task session, and the
agent prompt instructs the loop to skip the now-deterministic call.
Without this test a refactor could silently remove the synchronous
`recon()` call or the prompt instruction, defeating the whole change.

* refactor: [ENG-2530] address review-agent feedback on PR #593

- curate-executor: add comment near reconHelper(...) noting that the
  `metadata` argument is currently a no-op — `recon` recomputes counts
  from `effectiveContext` directly. Prevents future readers from
  assuming changes to `metadata` would alter `reconResult`. Upstream
  signature fix is out of scope for this PR.
- test: add stubs for FileContextTreeSnapshotService.getCurrentState
  and DreamStateService.incrementCurationCount so the unit test stays
  in-memory (mirrors the ENG-2485 test pattern); keeps to CLAUDE.md's
  "unit tests must run completely in memory" rule.
- test: assert meta.messageCount alongside charCount and lineCount
  (the prompt surfaces all three).

* merge: sync main (3.10.1) into curation-enhancement

Resolves three conflicts created by the overlap of ENG-2485 (defer summary
cascade to dream, on this branch) and ENG-2522 (detach post-curate Phase 4,
shipped on main, PR #582). The two changes touch the same lines but address
different costs — LLM cost vs user-visible latency — and compose cleanly.

curate-executor.ts
- Adopt main's {finalize, response} architecture so the daemon fires
  task:completed before Phase 4 runs.
- Inside finalize, keep cascade-defer: snapshot diff →
  enqueueStaleSummaryPaths + buildManifest. Drop main's
  propagateSummariesUnderLock — the dream lock existed to coordinate
  inline _index.md writes, which cascade-defer eliminates from curate.
- Preserve all exp 03 (cache_control) and exp 04 (recon pre-pipelining)
  changes in Phases 1-3.
- Extract propagateAndRebuild private helper alongside main's
  incrementDreamCounter.

dream-executor.ts
- Keep this branch's runStaleSummaryPropagation extracted seam (main did
  not change dream-executor; nothing lost).

curate-executor.test.ts
- Rewrite runAgentBody/finalize split assertions to expect cascade-defer
  (enqueueStaleSummaryPaths + buildManifest) instead of propagateStaleness.
- Drop the dream-lock coordination describe block — the lock is no longer
  taken in curate's Phase 4 (folder-pack still uses it).

folder-pack-executor still calls propagateSummariesUnderLock; the helper
file stays.

Verified: typecheck, lint (0 errors on resolved files), 7284 tests pass.

* refactor: address review-agent feedback on PR #601

Four small reviewer-flagged issues with focused fixes:

- abstract-queue.ts: drop non-absolute contextPath at enqueue with a
  trace breadcrumb. Background batch writes derive .abstract.md /
  .overview.md via raw writeFile and errors are catch-suppressed, so a
  relative path would silently land under process.cwd().
- tool-manager.ts: document the insertion-order contract for
  CURATE_TOOL_NAMES and QUERY_TOOL_NAMES at the registration site.
  toAiSdkTools attaches Anthropic cacheControl: ephemeral to the LAST
  tool, so reordering this list silently shifts the cache breakpoint.
- dream-state-service.ts: skip the disk write in update() when the
  updater returns the same state reference. drainStaleSummaryPaths and
  enqueueStaleSummaryPaths already follow the no-op-by-identity
  convention; this makes the contract observable at the disk level so
  empty drains don't tmpfile + rename for nothing. New test pins it.
- curate-executor.ts: correct the catch-block comment under the
  cascade-defer enqueue. The previous wording implied diffStates would
  re-capture the lost paths automatically; in fact paths uniquely
  changed by the failing curate are dropped until the same files are
  modified again or dream's own snapshot diff touches them.

---------

Co-authored-by: Nguyễn Thuận Phát <nguyenthuanphatvl@gmail.com>
Co-authored-by: Nguyễn Thuận Phát <143022215+RyanNg1403@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants