feat: [ENG-2518] batch abstract generation across queued files by RyanNg1403 · Pull Request #580 · campfirein/byterover-cli

RyanNg1403 · 2026-04-29T03:13:20Z

Summary

Problem: The abstract queue makes 2N LLM calls per curate (one L0 .abstract.md + one L1 .overview.md per file). On a 30-file pack, that's 60 independent calls each paying its own per-call fixed-overhead tax (system prompt + framing).
Why it matters: Token-usage hypothesis H3 (over-engineered, high-confidence). Validated A/B savings on the 3-fixture corpus:
- OpenAI 5-pack: 10 → 2 calls (−80%), 8,240 → 6,976 tokens (−15%)
- ByteRover 5-pack: 10 → 2 calls (−80%), 10,770 → 7,755 tokens (−28%)
- ByteRover 10-pack: 10 → 2 calls (−80%), 11,027 → 8,508 tokens (−23%)
- Free-tier ByteRover users: 30-file folder pack drops from ~60 quota requests to ~12 (under the 50/day cap).
What changed: New generateFileAbstractsBatch(items, generator) in abstract-generator.ts fires two parallel streamToText calls (1 batched L0 + 1 batched L1) with XML-tagged input/output. AbstractGenerationQueue.processNext drains up to BATCH_SIZE_CAP=5 items per cycle. enqueue only triggers scheduleNext when pending >= cap || drainRequested; drain() (called from curate-executor via drainBackgroundWork at curate-end) sets the flag for partial-batch flush.
What did NOT change (scope boundary): Existing fail-open semantics (empty content → empty file, no crash); onAfterWrite enqueue trigger; per-file 20K char content cap (matches non-batched path so per-file view is identical regardless of mode).

Type of change

New feature

Scope (select all touched areas)

Agent / Tools

Linked issues

Closes ENG-2518
Related: H3 hypothesis in research/token-optimization/token-usage-reduction/research/06-hypotheses.md; A/B validated by notes/token-usage-reduction/h3-batch-abstracts/RESULTS.md

Root cause (bug fixes only, otherwise write `N/A`)

Root cause: N/A
Why this was not caught earlier: N/A

Test plan

Coverage added:
- Unit test
- Manual verification (real-LLM E2E)
Test file(s):
- test/unit/agent/map/abstract-generator-batch.test.ts (new, 5 tests covering correct response parsing, input order preservation, missing paths → empty strings, malformed-response tolerance, exactly 2 LLM calls per batch)
- test/unit/agent/map/abstract-queue.test.ts (added "buffers below cap", "processes up to BATCH_SIZE_CAP in single cycle", "splits oversized backlogs", and adapted retry-backoff test for batched semantics)
Key scenarios covered:
- Buffer-until-cap-or-drain: enqueue 3 items → 0 LLM calls until drain
- Cap fires single batch: enqueue 5 → exactly 2 stream calls (1×L0 + 1×L1)
- Oversized split: enqueue 7 → batches of 5+2, exactly 4 stream calls
- Per-batch failure → per-item retry with existing 3× exponential backoff
- Tolerant XML parser: missing paths → empty strings, malformed → all empty (no crash)

User-visible changes

None directly. Side effect: faster abstract generation on multi-file curates; ByteRover free-tier users no longer hit 50/day quota on 30-file packs.

Evidence

Full unit suite: 6967 passing, 16 pending, 0 failing
Real-LLM E2E on ByteRover paid-tier (gemini-3-flash-preview) — 4 fixtures covering all behavior:
- 5-pack: 5 enqueues → process:start batchSize=5 → 2 LLM calls (vs 10 non-batched, −80%)
- 6-distinct (engineered to defeat curate-agent consolidation): 6 enqueues → batches of 5+1 → 4 LLM calls (vs 12, −67%) — exercises the multi-batch split path
- 10-pack: real LLM error on first batch → all 5 items entered retry pool → retry batch succeeded — exercises per-batch failure + retry path
- 7-pack: 4 buffered items below cap → drain at curate-end fired the partial batch — exercises drain-flush path
Quality spot-check: every L0 ≤ 80 tokens (one-line summary), every L1 has 3-7 markdown bullets + structure section, no XML tag leakage, no file conflation
Branch on remote: https://github.com/campfirein/byterover-cli/tree/feat/ENG-2518
Validated by: notes/token-usage-reduction/h3-batch-abstracts/RESULTS.md (3-fixture A/B on OpenAI + ByteRover)

Collapse the abstract queue's per-file LLM calls (2N today: one L0 .abstract.md + one L1 .overview.md per file) into batched calls of up to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per cycle (1 batched L0 + 1 batched L1). - generateFileAbstractsBatch(items, generator) in abstract-generator.ts: XML-tagged input/output, tolerant regex parser, missing paths get empty strings. - AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only triggers scheduleNext when pending >= cap or drainRequested. drain() sets the flag to flush partial batches at curate-end. - Per-batch failure → re-enqueue each item individually with its own attempts counter; existing 3x exponential backoff preserved. - Per-file content cap (20 KB) matches the non-batched path. Real-LLM verification on ByteRover paid-tier: 5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%) 6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%) 10-pack: per-batch failure + retry path exercised end-to-end

github-actions · 2026-04-29T03:13:36Z

Claude finished @RyanNg1403's task in 3m 53s —— View job

PR Review — `feat: [ENG-2518] batch abstract generation across queued files`

Overall score: 4 / 5 — Solid implementation with good test coverage. One medium-severity robustness issue and a few minor concerns below.

Summary

The batching design is sound: two parallel LLM calls (L0 + L1) cover an entire batch, XML-tagged output is mapped back by path, and the queue's buffer-until-cap / drain-flush semantics are cleanly layered on top of the existing retry infrastructure. The A/B savings numbers in the PR are believable. Test coverage is comprehensive and the generator stub in abstract-queue.test.ts that sniffs system-prompt text to produce the right XML shape is a nice design.

Issues

`suggestion` — Unescaped file content breaks XML prompt structure (medium severity)

File: src/agent/infra/map/abstract-generator.ts, lines 92–96 and 111–115

const filesXml = items.map((it) => `<file path="${escapeXmlAttr(it.contextPath)}">
<document>
${it.content}               // ← no escaping
</document>
</file>`)

it.contextPath is correctly escaped with escapeXmlAttr, but it.content (a user-controlled knowledge document) is injected verbatim. Any file that contains </document> or </file> in its body — perfectly normal for docs that describe XML, HTML, or JSX — will break the prompt's XML envelope. The model receives a structurally invalid message; it may silently conflate files or produce output that the parseBatchedTags regex can't parse, resulting in empty abstracts for all files in the batch.

The fix is to escape (or CDATA-wrap) the content:

// Option A — CDATA (model reads clean text, no decode artefacts)
`<document><![CDATA[${it.content.replaceAll(']]>', ']]]]><![CDATA[>')}]]></document>`

// Option B — XML escape (simpler, but model sees &lt; / &gt; in prompts)
`<document>${escapeXmlContent(it.content)}</document>`
// where escapeXmlContent replaces & → &amp;  < → &lt;  > → &gt;

CDATA is preferable for LLM prompts because it preserves the raw text the model needs to summarise. Fix this →

`suggestion` — `this.generator` passed after async break without local narrowing

File: src/agent/infra/map/abstract-queue.ts, line 197-200

private async processNext(): Promise<void> {
  if (!this.generator || ...) return     // narrows here …
  // …
  await this.onBeforeProcess?.()         // … but class-property narrowing
                                         //   doesn't survive await in strict TS
  const results = await generateFileAbstractsBatch(
    ...,
    this.generator,                      // type is still IContentGenerator | undefined
  )
}

TypeScript does not guarantee class-property narrowing survives an await boundary because another async operation could have mutated the property. Capture the generator in a const right after the guard:

const generator = this.generator
if (!generator || this.processing || this.pending.length === 0) {
  this.resolveDrainersIfIdle()
  return
}
// then use `generator` everywhere below

This also avoids the ! exception in CLAUDE.md and makes the dataflow explicit. Fix this →

`nitpick` — `escapeXmlAttr` name is misleading for its potential reuse

File: src/agent/infra/map/abstract-generator.ts, line 87

The function is named escapeXmlAttr but its character set (&, ", <, >) also covers XML element content. If content-escaping is added (see issue #1 above), calling this function for both purposes will be confusing. Consider renaming to escapeXml and adding an overload comment, or extracting escapeXmlContent separately (drop " → only &, <, >). Fix this →

`nitpick` — Retry semantics silently changed for multi-file batches

File: src/agent/infra/map/abstract-queue.ts, lines 220–246

The batch-level failure path is correct, but it introduces a subtle behavioural change: maxAttempts now means "maximum batch attempts for this item", not "maximum individual-call attempts". A transient error on attempt 1 causes all N items in the batch to consume one retry token, even if the failure was unrelated to their content. With maxAttempts=3, a batch of 5 that always fails exhausts all retries in 3 batch cycles, the same as before per item, but if only 1 item is "hard" and 4 are "soft", all 5 are failed together.

No code change needed — the tradeoff is reasonable and is already documented in the PR — but a brief inline comment in the catch block referencing this tradeoff would help future reviewers avoid re-litigating it.

`nitpick` — Test stub call ordering relies on undocumented `Promise.all` start order

File: test/unit/agent/map/abstract-generator-batch.test.ts, lines 13–24

let callIndex = 0
generateContentStream: sandbox.stub().callsFake(async function *() {
  const text = responsesByCall[callIndex++] ?? ''
  // …
})

This correctly assumes the L0 stream is started before the L1 stream (i.e., Promise.all([L0, L1]) starts them in left-to-right order). JavaScript's single-threaded event loop does guarantee this for now, but it's an implicit assumption. A small comment explaining the ordering would make the test less surprising, e.g.:

// Promise.all starts L0 first (left element), then L1; callIndex is incremented
// synchronously before the first yield, so index 0 → L0 response, 1 → L1 response.

`praise` — Parser tolerance is exactly right

File: src/agent/infra/map/abstract-generator.ts, lines 141–159

The two-level regex approach (outer fileRe over <file> blocks, inner innerRe per block) handles out-of-order output, extra whitespace, missing paths, and malformed responses cleanly — and the tests prove all four scenarios. The fail-open guarantee (missing path → empty string → existing write logic handles it) is preserved end-to-end. Well done.

`praise` — Buffer-below-cap + drain flush interaction is well-designed

File: src/agent/infra/map/abstract-queue.ts, lines 88–108 and 280–298

The drainRequested flag is a clean solution to the "partial batch at curate-end" problem. Setting it in drain(), checking it in scheduleNext(), and resetting it only in resolveDrainersIfIdle() (not inline in processNext()) means there's no window where a below-cap batch gets permanently stranded.

Checklist

Gather context and read changed files
Review abstract-generator.ts changes
Review abstract-queue.ts changes
Review test files
Post comprehensive review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8fa123258

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Dedup duplicate contextPath inputs in generateFileAbstractsBatch (codex P1): when a curate touches the same file twice before drain, both queue items had the same path; the prompt emitted two `<file>` blocks the model could answer in either order, and the tag parser's Map collapsed them non-deterministically. Now collapses by path keeping the LATEST content (the FIFO queue's last-seen item) so abstracts reflect the on-disk state, not an intermediate one. - CDATA-wrap file content in batched prompts (Claude review): docs that describe XML/HTML/JSX (perfectly normal for technical knowledge) can contain `</document>` or `</file>` markers that would terminate the envelope and conflate files. Wrap in CDATA with the standard `]]]]><![CDATA[>` in-section escape for any nested terminators. - Capture this.generator in a local const at the top of processNext so type narrowing survives the await boundary, and document the batched-retry semantics inline (batch failure consumes one attempt per item, not one per call). Tests: - abstract-generator-batch: duplicate-path dedup keeps latest content - abstract-generator-batch: CDATA wrap survives `</document>` markers - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid

github-actions · 2026-04-29T03:35:51Z

Claude finished @RyanNg1403's task in 4m 11s —— View job

PR Review — `feat: [ENG-2518] batch abstract generation across queued files`

Overall score: 5 / 5 — The previous review's concerns are all resolved. This is a clean, well-tested implementation.

Summary

This commit addresses all issues flagged in the prior review:

CDATA wrapping (wrapCdata) is implemented correctly, including the in-CDATA ]]> escape trick, and is used in both prompt builders. The medium-severity XML-injection concern is fully resolved.
Generator local narrowing (const {generator} = this) is the idiomatic TypeScript fix; narrowing now survives all await boundaries in processNext.
Duplicate contextPath dedup in generateFileAbstractsBatch (Map keyed on path, keeping last content) addresses the Codex review's concern about non-deterministic results for files queued multiple times in one batch.
Retry semantics comment is present at line 188–193 in abstract-queue.ts, documenting the batch-level tradeoff.

The drainRequested flag + resolveDrainersIfIdle interaction is a clean state-machine design that prevents both stranded partial batches and premature drain resolution.

Issues

`nitpick` — `escapeXmlAttr` needs a clarifying doc comment

File: src/agent/infra/map/abstract-generator.ts, line 87

The function is now correctly scoped to attribute escaping (CDATA handles element content), but a future reader could be confused about why " is escaped. A single-line comment stating its scope would close the loop on the previous review's naming concern. Fix this →

`suggestion` — `parseBatchedTags` lazy match truncates on `</file>` in model output

File: src/agent/infra/map/abstract-generator.ts, line 153

The model's output is not CDATA-wrapped. If an overview summary mentions </file> in prose (e.g., summarising a doc about Ant build files or Maven), the lazy ([\s\S]*?) in fileRe terminates early, innerRe finds no match, and the result degrades to an empty string. Fail-open semantics handle it, but this case is invisible without a comment. Adding a brief note pointing to the known limitation prevents future reviewers from assuming the parser is fully robust. Fix this →

`nitpick` — Duplicate contextPaths produce redundant concurrent writes

File: src/agent/infra/map/abstract-generator.ts, line 266

When items contains two entries with the same contextPath, the dedup produces one LLM call but the return array still has two entries with identical content. The caller in processNext then issues two concurrent writeFile calls to the same .abstract.md and .overview.md paths inside Promise.all. Since content is identical the final disk state is correct, but concurrent writes to the same path are technically a race. Deduplicating the write targets in processNext (or filtering the result array before writing) would be cleaner. Fix this →

`nitpick` — `failedThisCycle` array used only for `.length`

File: src/agent/infra/map/abstract-queue.ts, line 229

The array is built but only referenced for .length in a single console.debug call. A plain let exhaustedCount = 0 is simpler. Fix this →

`nitpick` — CDATA test only verifies L1 prompt, not L0

File: test/unit/agent/map/abstract-generator-batch.test.ts, line 167

capturedPrompt is overwritten on every call, so after Promise.all the assertion checks L1's prompt. Both builders use wrapCdata so the test passes — but a future refactor that removes CDATA from only the L0 builder would not be caught. Using systemPrompt sniffing to capture both prompts separately (as the dedup test already does) would make L0 coverage explicit. Fix this →

`nitpick` — Batching tests grouped under `status file` describe block

File: test/unit/agent/map/abstract-queue.test.ts, line 234

The three new batching tests (buffers below cap, processes up to BATCH_SIZE_CAP, splits oversized backlogs) live inside the status file describe block but don't test status file I/O. Moving them to a batching behaviour sibling block would improve navigability. Fix this →

Checklist

RyanNg1403 · 2026-04-29T03:36:21Z

Pushed c17a0fa addressing the Claude review-agent feedback:

Fixed:

Medium: Unescaped file content breaks XML envelope — Content now wrapped in <![CDATA[...]]> with the standard ]]]]><![CDATA[> in-section escape for any nested terminators. New tests verify the wrap survives literal </document> markers AND nested ]]> sequences.
Suggestion: this.generator not narrowed across await — Captured in a local {generator} = this destructure at the top of processNext. No ! needed; CLAUDE.md compliant.
Nitpick: retry semantics comment — Added inline comment in processNext documenting that maxAttempts counts BATCH attempts (one batch failure → one retry token per item).

Plus the codex P1 (duplicate contextPath collapsing in result map), addressed in the inline thread.

Skipped (with reason):

Nitpick: escapeXmlAttr rename — kept the original name; the function is still attribute-only since CDATA now handles content. Renaming would churn imports without correctness benefit.
Nitpick: test ordering comment on Promise.all — JS single-threaded ordering is sufficiently obvious; adding a note there felt like cargo-cult.

- Re-anchor parseBatchedTags on `<file path>` openers instead of `</file>` closers (real correctness): the model output is plain text, not CDATA-wrapped, so an overview that mentions `</file>` literally in prose (perfectly normal for docs about XML, JSX, or build systems) would prematurely terminate the outer match and orphan the inner tag. The new strategy collects all opener positions, then slices each opener's body up to the next opener (or end-of-string) — a literal `</file>` in prose has no special meaning to the parser. - Improve the CDATA test to capture BOTH L0 and L1 prompts and assert the wrap independently on each builder. The previous test only validated the last-call's prompt, so a future refactor that forgot wrapCdata in one builder would slip past. - Add a parser test for literal `</file>` inside model overview prose to pin the new robustness invariant. - Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and "splits oversized backlogs" tests out of the `status file` describe block into a sibling `batching behaviour` block — these verify scheduling, not status I/O, and the misgrouping made the suite harder to navigate.

github-actions · 2026-04-29T04:14:27Z

Claude finished @RyanNg1403's task in 2m 29s —— View job

PR Review — `feat: [ENG-2518] batch abstract generation across queued files`

Overall score: 4.5 / 5 — The third iteration is in excellent shape. The suggestion-severity parseBatchedTags correctness issue is now genuinely resolved with a better algorithm (opener-anchored vs. </file>-terminated), and the two remaining second-pass review praise items are fully validated. Three minor nitpicks from the previous cycle are still open.

What changed in this iteration

The latest commit (887264f) addressed:

✅ parseBatchedTags rewrite — switched from lazy </file> close-matching to an opener-anchored strategy. Collects all <file path="..."> opener positions first, then slices each body up to the next opener. A literal </file> in model prose now has no special meaning to the parser (confirmed by the new test at lines 209–233).
✅ CDATA test captures both L0 and L1 prompts independently — capturedPrompts.abstract + capturedPrompts.overview are now asserted separately, so a future regression in either builder would be caught.
✅ New test: nested CDATA terminator escape — pins the ]]> → `]]]]><

2. failedThisCycle array used only for .length (src/agent/infra/map/abstract-queue.ts, line 229)

The array accumulates items only to read .length in the console.debug line. A plain counter is leaner:

let exhaustedCount = 0
// …
} else {
  this.failed++
  exhaustedCount++
  queueLog(…)
}
// …
console.debug(`…retrying=${retryThisCycle.length}, exhausted=${exhaustedCount}`)

Fix this →

3. Concurrent writes for duplicate contextPath in processNext (src/agent/infra/map/abstract-queue.ts, lines 213–219)

generateFileAbstractsBatch deduplicates at the LLM level but returns one result per original input item (for caller alignment). If two queue items share a contextPath, the flatMap issues two concurrent writeFile calls to the same .abstract.md and .overview.md. The final disk state is correct (identical content), but concurrent writes to the same path are a race on some filesystems. A one-liner dedup before writing closes it:

const seen = new Set<string>()
const dedupedResults = results.filter((r) => !seen.has(r.contextPath) && seen.add(r.contextPath))
await Promise.all(dedupedResults.flatMap((r) => { … }))

The this.processed += batch.length accounting stays correct since it counts queue items, not write targets.

Fix this →

`praise` — `parseBatchedTags` + new tests make the parser contract explicit

File: test/unit/agent/map/abstract-generator-batch.test.ts, lines 209–259

Two new test cases — parser is robust to a literal </file> appearing inside the model overview prose and escapes nested CDATA terminators in content — directly pin the two invariants that were previously only documented in comments. This is exactly the right balance: the comment explains why, the test enforces what. Future refactors cannot silently break either invariant.

Checklist

github-actions · 2026-05-01T07:47:26Z

Claude finished @danhdoan's task in 2m 29s —— View job

PR Review — `feat: [ENG-2518] batch abstract generation across queued files`

Overall score: 4.5 / 5 — Three carry-over nitpicks from the previous cycle remain open; all medium and suggestion-severity concerns are fully resolved. This is ready to merge once the nitpicks are addressed (or consciously deferred).

What changed since the last review

✅ parseBatchedTags rewritten to opener-anchored strategy — the lazy </file>-terminated approach was replaced with a two-pass algorithm that collects <file path="..."> opener positions first, then slices between consecutive openers. A literal </file> in model prose now has no special meaning to the parser. Two new tests (parser is robust to a literal </file> and escapes nested CDATA terminators) pin both invariants.

✅ CDATA test captures both L0 and L1 prompts independently — capturedPrompts.abstract and capturedPrompts.overview are now asserted separately; a future regression in either builder will be caught.

✅ Batching tests moved to batching behaviour describe block — the three new tests no longer live under status file.

Remaining open issues (all nitpicks)

`nitpick` — `escapeXmlAttr` missing scope comment

File: src/agent/infra/map/abstract-generator.ts, line 87

The function escapes &, ", <, > — the double-quote escape is only needed for XML attribute values, not element content. Since wrapCdata now handles element content, escapeXmlAttr is used exclusively for path="..." attribute values. Its name is accurate, but a one-line comment would confirm its scope to future readers and prevent accidental reuse as a content escaper:

/** Escapes characters unsafe in XML attribute values (double-quote variant). Element content is handled by wrapCdata. */
function escapeXmlAttr(value: string): string {

Fix this →

`nitpick` — `failedThisCycle` array used only for `.length`

File: src/agent/infra/map/abstract-queue.ts, line 229

const failedThisCycle: QueueItem[] = []
// …
failedThisCycle.push(item)
// …
console.debug(`…exhausted=${failedThisCycle.length}`)

The array accumulates items solely to report .length in the one console.debug call. A plain counter is leaner and signals intent more clearly:

let exhaustedCount = 0
// …
} else {
  this.failed++
  exhaustedCount++
  queueLog(`process:failed path=${item.contextPath} failed=${this.failed}`)
}
// …
console.debug(`[AbstractQueue] batch attempt failed (${msg}); retrying=${retryThisCycle.length}, exhausted=${exhaustedCount}`)

Fix this →

`nitpick` — Concurrent writes for duplicate `contextPath` in `processNext`

File: src/agent/infra/map/abstract-queue.ts, lines 213–219

generateFileAbstractsBatch returns one result per original input item (for caller alignment), so if two queue items share a contextPath, results.flatMap issues two concurrent writeFile calls to the same .abstract.md and .overview.md paths. The final disk state is correct (identical content), but concurrent writes to the same path are a race on some filesystems. A one-liner dedup before the write loop closes it without affecting the processed accounting (which counts queue items, not write targets):

const seen = new Set<string>()
const writeTargets = results.filter((r) => !seen.has(r.contextPath) && seen.add(r.contextPath) !== undefined)
await Promise.all(writeTargets.flatMap((r) => {
  const abstractPath = r.contextPath.replace(/\.md$/, '.abstract.md')
  const overviewPath = r.contextPath.replace(/\.md$/, '.overview.md')
  return [
    writeFile(abstractPath, r.abstractContent, 'utf8'),
    writeFile(overviewPath, r.overviewContent, 'utf8'),
  ]
}))
this.processed += batch.length  // still counts all queue items

Fix this →

Praise

`praise` — Opener-anchored `parseBatchedTags` is correct and well-tested

File: src/agent/infra/map/abstract-generator.ts, lines 155–181

The two-pass approach (collect openers → slice between consecutive openers → inner regex per slice) is the right fix. The comment block explains the invariant clearly, and the two new tests (literal </file> in prose and nested ]]> escape) prove both parser properties explicitly rather than relying on the comment alone. This is the kind of test coverage that prevents silent regressions in a subtle algorithm.

`praise` — `const {generator} = this` local narrowing is idiomatic

File: src/agent/infra/map/abstract-queue.ts, lines 175–182

The destructuring-before-guard pattern correctly survives all await boundaries in processNext and avoids the ! non-null assertion that CLAUDE.md flags. The accompanying comment (lines 175–177) explains why the capture is needed — a future reader won't need to reason about TS narrowing rules to understand it.

Checklist

* feat: [ENG-2518] batch abstract generation across queued files Collapse the abstract queue's per-file LLM calls (2N today: one L0 .abstract.md + one L1 .overview.md per file) into batched calls of up to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per cycle (1 batched L0 + 1 batched L1). - generateFileAbstractsBatch(items, generator) in abstract-generator.ts: XML-tagged input/output, tolerant regex parser, missing paths get empty strings. - AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only triggers scheduleNext when pending >= cap or drainRequested. drain() sets the flag to flush partial batches at curate-end. - Per-batch failure → re-enqueue each item individually with its own attempts counter; existing 3x exponential backoff preserved. - Per-file content cap (20 KB) matches the non-batched path. Real-LLM verification on ByteRover paid-tier: 5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%) 6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%) 10-pack: per-batch failure + retry path exercised end-to-end * feat: [ENG-2485] defer summary cascade to dream Move the LLM-bound parent-summary cascade off curate's hot path. Curate now enqueues changed paths to a stale-summary queue persisted in .brv/dream-state.json; the next dream cycle drains the queue, merges with its own snapshot diff (A ∪ B), and runs propagateStaleness once per dirty directory. - DreamState schema gains staleSummaryPaths[]; new service methods enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear). - curate-executor swaps the inline propagateStaleness call for an enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure filesystem, no LLM). - dream-executor's step 5 drains the queue, unions with the diff-set, and runs propagateStaleness once per unique path. On propagation failure the catch re-enqueues the drained snapshot so atomic drain doesn't lose work. - runStaleSummaryPropagation extracted as a protected seam; preserves the ENG-2100 parentTaskId threading so summary regenerations still share one billing session with the parent dream task. Validation: - validate.mjs: 15/15 BM25 invariants hold under stale _index.md - Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier - dedup, multi-curate accumulation, drain, re-enqueue on failure - Full unit suite: 6981 passing * refactor: [ENG-2485] address review-agent feedback on PR #579 - Bypass dream activity gate when the stale-summary queue has work (codex P1): low-activity projects with 1-2 curates would otherwise strand queued paths indefinitely. dream-trigger.ts now treats a populated queue as enough reason to drain. - Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing a non-unique array no longer produces duplicate queue entries. - Split the curate-executor post-processing try block: snapshot diff, enqueue, and manifest rebuild now have independent fail-open scopes so a transient queue write error cannot skip the (no-LLM) manifest scan that keeps newly-curated leaves immediately discoverable. - Restore the runOperations JSDoc that the cherry-pick orphaned in front of runStaleSummaryPropagation. - Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths cannot persist into dream-state.json. Tests: - dream-trigger: bypass-when-queue-populated + still-block-when-empty - dream-state-service: within-batch duplicate dedup * refactor: [ENG-2518] address review-agent feedback on PR #580 - Dedup duplicate contextPath inputs in generateFileAbstractsBatch (codex P1): when a curate touches the same file twice before drain, both queue items had the same path; the prompt emitted two `<file>` blocks the model could answer in either order, and the tag parser's Map collapsed them non-deterministically. Now collapses by path keeping the LATEST content (the FIFO queue's last-seen item) so abstracts reflect the on-disk state, not an intermediate one. - CDATA-wrap file content in batched prompts (Claude review): docs that describe XML/HTML/JSX (perfectly normal for technical knowledge) can contain `</document>` or `</file>` markers that would terminate the envelope and conflate files. Wrap in CDATA with the standard `]]]]><![CDATA[>` in-section escape for any nested terminators. - Capture this.generator in a local const at the top of processNext so type narrowing survives the await boundary, and document the batched-retry semantics inline (batch failure consumes one attempt per item, not one per call). Tests: - abstract-generator-batch: duplicate-path dedup keeps latest content - abstract-generator-batch: CDATA wrap survives `</document>` markers - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid * refactor: [ENG-2485] address second-pass review feedback on PR #579 - Reword the Step 1 snapshot-failure catch in dream-executor: the old comment claimed "atomic drain already removed entries", but drain doesn't run in this code path. Clarify that skipping drain here is safer than draining-and-failing because the atomic-drain RMW clears entries synchronously. - Add a checkEligibility test for the stale-summary queue bypass so both tryStartDream and checkEligibility pin the invariant. They delegate to the same checkGates1to3 path; without this, a future refactor of the shared path could regress only one call site. * refactor: [ENG-2518] address second-pass review feedback on PR #580 - Re-anchor parseBatchedTags on `<file path>` openers instead of `</file>` closers (real correctness): the model output is plain text, not CDATA-wrapped, so an overview that mentions `</file>` literally in prose (perfectly normal for docs about XML, JSX, or build systems) would prematurely terminate the outer match and orphan the inner tag. The new strategy collects all opener positions, then slices each opener's body up to the next opener (or end-of-string) — a literal `</file>` in prose has no special meaning to the parser. - Improve the CDATA test to capture BOTH L0 and L1 prompts and assert the wrap independently on each builder. The previous test only validated the last-call's prompt, so a future refactor that forgot wrapCdata in one builder would slip past. - Add a parser test for literal `</file>` inside model overview prose to pin the new robustness invariant. - Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and "splits oversized backlogs" tests out of the `status file` describe block into a sibling `batching behaviour` block — these verify scheduling, not status I/O, and the misgrouping made the suite harder to navigate. * feat: [ENG-2519] enable prefix caching for providers Ports the caching change from experiment/03-enable-caching to proj/curation-enhancement. Two coupled behavior changes (must ship together) that engage Anthropic prompt caching cleanly and make OpenAI/Google auto-caches see a longer stable prefix: - Attach providerOptions.anthropic.cacheControl: ephemeral to the system message and the last tool definition. The system message now enters as a SystemModelMessage in the messages array (not via AI SDK's top-level `system` parameter, which strips providerOptions). - Relocate <dateTime> out of the system prompt into the first user message. DateTimeContributor was rebuilding the timestamp every iteration, and Anthropic prefix caching does token-level matching — one byte of churn ~1,800 tokens in poisoned every cache miss past it. Moving dateTime to a single iteration-0 user-message prefix keeps the system prefix byte-stable across iterations. Verified across a 60-curate fixture-driven A/B and a 30-curate progression run on proj/curation-enhancement (see notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md): - Anthropic claude-haiku-4-5 12-fixture: -21.5% normalized cost - Anthropic claude-haiku-4-5 30-curate progression: -29.7% - OpenAI gpt-4.1: -8.0% (auto-cache hits a longer stable prefix) - Google gemini-3-flash-preview: ~flat (model-specific noise) Tests: - New unit tests for prependCachedSystemMessage edge cases (empty/ undefined systemPrompt, non-mutation invariant) and toAiSdkTools' last-tool cache_control attachment (1, 3 tools). - Updated internal-llm-service test for completeTask: assertion now matches the iter-0 user-message dateTime prefix. Out of scope (deferred): - Cache-token field extraction in response.usage (cacheReadTokens, cacheCreationTokens). Without telemetry to consume them they would be dead code. - Telemetry / UsageLogger plumbing — separate ticket. - Anthropic daemon-stability mitigations (TPM retry-loop hangs, daemon-readiness races on large fixtures) — operational, separate. Builds: clean. Typecheck: clean. Lint: clean on touched files. Tests: 7004 passing, 16 pending, 0 failing. * refactor: [ENG-2519] address review-agent feedback on dateTime cache hazards - Drop the unused `datetime` template variable from FileContributor's renderTemplateVariables. There is no current YAML consumer; leaving it in place was a landmine — any future `{{datetime}}` reference would silently re-introduce per-iteration cache poisoning. - Inject the dateTime block into the rolling-checkpoint continuation prompt. clearHistory() wipes the iter-0 user message that originally carried it, and the iter-0 guard upstream prevents re-injection, so long curate/query runs that hit the checkpoint were losing time context for every subsequent iteration. - Extract `buildDateTimePrefix(now?)` as a small shared helper used by both call sites; cover with unit tests. - Expand the cache_control "last tool" test comment to clarify that the contract is insertion-order, not alphabetical. * feat: [ENG-2530] pre-pipeline recon to skip first agent iteration Run tools.curation.recon synchronously in CurateExecutor before spawning the agent session, inject the result both as a sandbox variable (__recon_result_<taskIdSafe>) and as inline fields in the agent's first prompt. Agent's first iteration then proceeds directly to extraction instead of paying a full LLM call to invoke recon. Recon is pure JS — no LLM, no decision — but was previously surfaced as an agent-tool, costing a full ~13K-token iteration on every curate just to invoke a deterministic helper. Measurements (12-fixture A/B vs proj/curation-enhancement + ENG-2519 caching baseline, full results in notes/token-usage-reduction/ eng-2530-prepipeline-recon/REPORT.md): OpenAI gpt-4.1 -16.3% calls -20.3% normalized cost Google gemini-3-flash-preview -1.9% calls -17.8% normalized cost Anthropic claude-haiku-4-5 -2.0% calls -0.9% normalized cost (flat) ByteRover gemini-3-flash-preview 15-curate progression: -6.5% normalized cost / -3.4% \$ cost. Quality risk is bounded: recon itself is unchanged; only the caller moves. Per-fixture variance follows the team's exp 04 pattern (some fixtures regress as the agent compensates for missing recon signal in later iterations); aggregate nets positive on OpenAI and Google. Anthropic claude-haiku-4-5 lands flat — different model-class behavior from the team's claude-sonnet-4-5 measurement; not a regression. * test: [ENG-2530] cover recon pre-pipeline wiring on curate-executor Pin the central correctness invariant of the change: recon's result is injected as `__recon_result_<taskIdSafe>` on the task session, and the agent prompt instructs the loop to skip the now-deterministic call. Without this test a refactor could silently remove the synchronous `recon()` call or the prompt instruction, defeating the whole change. * refactor: [ENG-2530] address review-agent feedback on PR #593 - curate-executor: add comment near reconHelper(...) noting that the `metadata` argument is currently a no-op — `recon` recomputes counts from `effectiveContext` directly. Prevents future readers from assuming changes to `metadata` would alter `reconResult`. Upstream signature fix is out of scope for this PR. - test: add stubs for FileContextTreeSnapshotService.getCurrentState and DreamStateService.incrementCurationCount so the unit test stays in-memory (mirrors the ENG-2485 test pattern); keeps to CLAUDE.md's "unit tests must run completely in memory" rule. - test: assert meta.messageCount alongside charCount and lineCount (the prompt surfaces all three). * merge: sync main (3.10.1) into curation-enhancement Resolves three conflicts created by the overlap of ENG-2485 (defer summary cascade to dream, on this branch) and ENG-2522 (detach post-curate Phase 4, shipped on main, PR #582). The two changes touch the same lines but address different costs — LLM cost vs user-visible latency — and compose cleanly. curate-executor.ts - Adopt main's {finalize, response} architecture so the daemon fires task:completed before Phase 4 runs. - Inside finalize, keep cascade-defer: snapshot diff → enqueueStaleSummaryPaths + buildManifest. Drop main's propagateSummariesUnderLock — the dream lock existed to coordinate inline _index.md writes, which cascade-defer eliminates from curate. - Preserve all exp 03 (cache_control) and exp 04 (recon pre-pipelining) changes in Phases 1-3. - Extract propagateAndRebuild private helper alongside main's incrementDreamCounter. dream-executor.ts - Keep this branch's runStaleSummaryPropagation extracted seam (main did not change dream-executor; nothing lost). curate-executor.test.ts - Rewrite runAgentBody/finalize split assertions to expect cascade-defer (enqueueStaleSummaryPaths + buildManifest) instead of propagateStaleness. - Drop the dream-lock coordination describe block — the lock is no longer taken in curate's Phase 4 (folder-pack still uses it). folder-pack-executor still calls propagateSummariesUnderLock; the helper file stays. Verified: typecheck, lint (0 errors on resolved files), 7284 tests pass. * refactor: address review-agent feedback on PR #601 Four small reviewer-flagged issues with focused fixes: - abstract-queue.ts: drop non-absolute contextPath at enqueue with a trace breadcrumb. Background batch writes derive .abstract.md / .overview.md via raw writeFile and errors are catch-suppressed, so a relative path would silently land under process.cwd(). - tool-manager.ts: document the insertion-order contract for CURATE_TOOL_NAMES and QUERY_TOOL_NAMES at the registration site. toAiSdkTools attaches Anthropic cacheControl: ephemeral to the LAST tool, so reordering this list silently shifts the cache breakpoint. - dream-state-service.ts: skip the disk write in update() when the updater returns the same state reference. drainStaleSummaryPaths and enqueueStaleSummaryPaths already follow the no-op-by-identity convention; this makes the contract observable at the disk level so empty drains don't tmpfile + rename for nothing. New test pins it. - curate-executor.ts: correct the catch-block comment under the cascade-defer enqueue. The previous wording implied diffStates would re-capture the lost paths automatically; in fact paths uniquely changed by the failing curate are dropped until the same files are modified again or dream's own snapshot diff touches them. --------- Co-authored-by: Nguyễn Thuận Phát <nguyenthuanphatvl@gmail.com> Co-authored-by: Nguyễn Thuận Phát <143022215+RyanNg1403@users.noreply.github.com>

RyanNg1403 requested review from bao-byterover, cuongdo-byterover, danhdoan, leehpham and ngduyanhece as code owners April 29, 2026 03:13

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/agent/infra/map/abstract-generator.ts