Skip to content

fix(engine): subgraph handler bypassed BudgetGuard + dropped child Usage#187

Merged
clintecker merged 3 commits intomainfrom
fix/subgraph-budget-propagation
Apr 24, 2026
Merged

fix(engine): subgraph handler bypassed BudgetGuard + dropped child Usage#187
clintecker merged 3 commits intomainfrom
fix/subgraph-budget-propagation

Conversation

@clintecker
Copy link
Copy Markdown
Collaborator

@clintecker clintecker commented Apr 23, 2026

Closes #183.

Before

Pre-fix, operator-configured --max-tokens / --max-cost ceilings were silently non-binding for any node placed inside a subgraph. Two independent escape hatches, either alone enough to bust the budget:

  1. The child pipeline.Engine was constructed without WithBudgetGuard, so its between-node checks were no-ops. The child ran to completion no matter how much it burned.
  2. `SubgraphHandler.Execute` returned an `Outcome` with no usage rollup. The parent trace's `AggregateUsage` missed all child spend, so the parent's own guard never saw the breach either.

Adversarial review on PR #182 demonstrated this can be chained with other gaps in the ACP estimator (reasoning + tool-call payloads uncounted) to reach a synthetic 100–1000× real/declared cost ratio in a single run. Fixing the subgraph-level bypass is independent of and more impactful than any per-backend accuracy work.

Change

Schema

  • `Outcome.ChildUsage *UsageSummary` — populated by handlers that launch a child run.
  • `TraceEntry.ChildUsage *UsageSummary` with `json:"child_usage,omitempty"` — backwards-compatible JSON extension for `status.json` and `activity.jsonl`.
  • `Trace.AggregateUsage` folds `TraceEntry.ChildUsage` into both running totals and per-provider buckets, preserving provider attribution. Extracted helpers `foldStatsIntoSummary` / `foldChildUsageIntoSummary`.

Engine

  • `ChildRunContext` + `ChildRunContextFromContext` — a ctx.Value channel for exposing the current `BudgetGuard` + baseline usage to handlers that launch child engines.
  • `WithBaselineUsage(*UsageSummary)` — pre-loads a parent's consumed usage so the child's `checkBudgetAfterEmit` evaluates `baseline + trace.AggregateUsage` against limits.
  • Handler dispatch stamps `outcome.ChildUsage` onto `traceEntry`.
  • `combinedUsageForBudget` + `cloneUsageSummary` — deep-clone the baseline per check so folds don't mutate the shared snapshot.

Subgraph handler

  • Reads `ChildRunContextFromContext`; passes `WithBudgetGuard` + `WithBaselineUsage` into the child engine.
  • Returns `Outcome.ChildUsage = result.Usage` regardless of child outcome.
  • Child-side `OutcomeBudgetExceeded` is mapped to parent `OutcomeSuccess` (not `OutcomeFail`): the strict-failure-edges rule would otherwise halt the parent before `checkBudgetAfterEmit` could fold child usage into the aggregate. With the success mapping, the parent's own guard fires on the next between-node check with the correct `OutcomeBudgetExceeded` status and `BudgetLimitsHit` populated.

Tests (4 new regressions)

  • `TestSubgraph_BudgetBypass_Fix_UsageRollup` — child stats land in parent `ProviderTotals`, not `"unknown"`
  • `TestSubgraph_BudgetBypass_Fix_ParentGuardHaltsAfterOverspend` — subgraph overspends; parent's between-node check halts before the following node runs
  • `TestSubgraph_BudgetBypass_Fix_ChildGuardHaltsMidSubgraph` — parent pre-consumes 50, subgraph consumes 60 (combined 110 > 100); child halts mid-subgraph on baseline + partial trace
  • `TestSubgraph_BudgetBypass_Fix_NestedSubgraph` — two-level nesting, usage rolls all the way up to the grandparent

Not in this fix

Test plan

  • `go build ./...`
  • `go test ./... -short` — 17 packages pass
  • `make fmt-check`
  • Race detector clean on `pipeline/` and `pipeline/handlers/`
  • All four new regression tests exercise the enforcement paths

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Fixed nested execution bypass where parent budget checks missed child spend; parent checks now consider child usage so cost limits are enforced.
  • New Features

    • Child-run usage is propagated and folded into trace and result rollups; budget evaluations use parent baseline + child trace and provider attribution is preserved.
    • Budget-exceeded outcomes from child runs are reported in a way that allows parent guards to respond appropriately.
  • Tests

    • Added regression tests for nested/multi-stage enforcement, usage rollups, and budget-exceeded diagnostics.

Closes #183.

Pre-fix, operator-configured --max-tokens / --max-cost ceilings were
silently non-binding for any node placed inside a subgraph. Two
independent escape hatches, either alone enough to bust the budget:

  1. The child pipeline.Engine was constructed without WithBudgetGuard,
     so its between-node checks were no-ops. The child ran to completion
     no matter how much it burned.
  2. SubgraphHandler.Execute returned an Outcome with no usage rollup.
     The parent trace's AggregateUsage missed all child spend, so the
     parent's own guard never saw the breach either.

Adversarial review on PR #182 demonstrated this can be chained with
other gaps in the ACP estimator (reasoning + tool-call payloads
uncounted) to reach a synthetic 100-1000x real/declared cost ratio in
a single run. Fixing the subgraph-level bypass is independent of and
more impactful than any per-backend accuracy work.

Change:

  Schema (pipeline/handler.go, pipeline/trace.go):
    - Outcome.ChildUsage *UsageSummary — populated by handlers that
      launch a child run.
    - TraceEntry.ChildUsage *UsageSummary with `json:"child_usage,omitempty"`
      — backwards-compatible JSON extension for status.json and
      activity.jsonl.
    - Trace.AggregateUsage folds TraceEntry.ChildUsage into both the
      running totals and per-provider buckets, preserving provider
      attribution. Extracted helpers foldStatsIntoSummary /
      foldChildUsageIntoSummary.

  Engine (pipeline/engine.go, pipeline/engine_run.go):
    - ChildRunContext + ChildRunContextFromContext: a ctx.Value
      channel for exposing the current BudgetGuard + baseline usage
      to handlers that launch child engines. The engine now stashes
      these on ctx before every handler.Execute.
    - WithBaselineUsage(*UsageSummary) EngineOption: pre-loads a
      parent's consumed usage so the child's checkBudgetAfterEmit
      evaluates baseline + trace.AggregateUsage against limits.
    - combinedUsageForBudget + cloneUsageSummary: deep-clone the
      baseline per check so folds don't mutate the shared snapshot.
    - handler dispatch stamps outcome.ChildUsage onto traceEntry.

  Subgraph (pipeline/subgraph.go):
    - Reads ChildRunContextFromContext; passes WithBudgetGuard and
      WithBaselineUsage into the child engine.
    - Returns Outcome.ChildUsage = result.Usage regardless of child
      status.
    - Child-side OutcomeBudgetExceeded is mapped to parent
      OutcomeSuccess (not OutcomeFail): the strict-failure-edges rule
      would otherwise halt the parent before checkBudgetAfterEmit
      folded the child usage into the aggregate. With the success
      mapping, the parent's own guard fires on the next between-node
      check with the correct OutcomeBudgetExceeded status and
      BudgetLimitsHit populated.

  Tests (pipeline/subgraph_test.go, 4 new):
    - Fix_UsageRollup — child stats land in parent ProviderTotals,
      not "unknown"
    - Fix_ParentGuardHaltsAfterOverspend — subgraph overspends;
      parent's between-node check halts before the following node
      runs
    - Fix_ChildGuardHaltsMidSubgraph — parent pre-consumes 50,
      subgraph consumes 60 (combined 110 > 100); child halts
      mid-subgraph on baseline + partial trace
    - Fix_NestedSubgraph — two-level nesting, usage rolls all the
      way up to the grandparent

Not in this fix (filed/noted separately):
  - manager_loop handler has the same shape and likely needs the
    same treatment.
  - Mid-stream enforcement inside a single Prompt() call — guard
    still fires only between nodes.

Verification: go build, go test ./... -short (17 packages),
make fmt-check; the four regression tests cover the three enforcement
paths and the nested case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 16709236-32f0-445e-abba-4310acfa2f60

📥 Commits

Reviewing files that changed from the base of the PR and between 2bbc24b and ac8092f.

📒 Files selected for processing (2)
  • pipeline/engine_run.go
  • pipeline/subgraph_test.go

Walkthrough

Propagates parent budget guard and a baseline usage snapshot into child/subgraph engines, records child-reported usage on trace entries, and changes aggregation and budget checks so parent guards consider parent + child spend during enforcement and rollups.

Changes

Cohort / File(s) Summary
Engine: Child-run context & options
pipeline/engine.go
Adds exported ChildRunContext, ChildRunContextFromContext, and WithBaselineUsage engine option; stores baselineUsage on Engine to preload parent-consumed usage into child runs.
Execution & budget checks
pipeline/engine_run.go
Budget evaluations and emissions use a combined snapshot (baseline + local aggregate); executeNode injects ChildRunContext (budget guard + baseline) into node context when present and records outcome.ChildUsage onto TraceEntry.
Handler outcome extension
pipeline/handler.go
Adds ChildUsage *UsageSummary to Outcome to surface aggregated child/subgraph usage to parents.
Subgraph execution changes
pipeline/subgraph.go
Child engine built with parent BudgetGuard and baseline (via ChildRunContext); child OutcomeBudgetExceeded maps to OutcomeSuccess (so parent can observe child usage); parent outcome includes ChildUsage from child result.
Trace & aggregation
pipeline/trace.go
Adds TraceEntry.ChildUsage and refactors Trace.AggregateUsage to fold per-entry SessionStats then ChildUsage so child spend contributes to parent UsageSummary and ProviderTotals.
Tests: regression coverage
pipeline/subgraph_test.go
Adds five regression tests validating provider rollup, parent halting after child overspend, mid-subgraph enforcement against baseline+child usage, nested two-level aggregation/enforcement, and budget-exceeded diagnostics reporting combined snapshot.
Changelog
CHANGELOG.md
Adds Unreleased entry describing child usage propagation, budget guard propagation, baseline-aware enforcement, outcome mapping, and regression tests.

Sequence Diagram(s)

sequenceDiagram
    participant Parent as Parent Engine
    participant Ctx as Context
    participant Child as Child Engine
    participant Registry as Subgraph Handler
    participant Budget as BudgetGuard

    Parent->>Ctx: Snapshot BudgetGuard & baseline usage
    Parent->>Parent: Build ChildRunContext (guard + baseline)
    Parent->>Ctx: Put ChildRunContext into context
    Parent->>Registry: Invoke subgraph handler (execute node)
    Registry->>Ctx: Retrieve ChildRunContext
    Registry->>Child: Create child Engine with WithBudgetGuard & WithBaselineUsage
    Child->>Budget: Check combined (baseline + local trace) usage
    Budget-->>Child: Approve or deny
    Child->>Child: Run child pipeline
    Child-->>Registry: Return EngineResult (with aggregated usage)
    Registry-->>Parent: Return Outcome with ChildUsage included
    Parent->>Parent: Record ChildUsage in trace entry and aggregate into totals
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through code with pockets wide,

carried baselines, guards, and counts inside.
Child usage rolled up, no spend shall hide,
subgraphs checked now, across each tide.
Hoppity hop — budgets and traces aligned!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main fix: subgraph handler now respects BudgetGuard and propagates child Usage to parent, matching the core problem and solution.
Linked Issues check ✅ Passed All key objectives from issue #183 are addressed: BudgetGuard propagation into child engines [183], child Usage propagation via Outcome.ChildUsage and TraceEntry.ChildUsage [183], Trace.AggregateUsage updated to fold child spend [183], nested subgraph support [183], and comprehensive regression tests covering mid-subgraph halting, parent halting, and nested aggregation [183].
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the subgraph budget bypass: schema updates (Outcome.ChildUsage, TraceEntry.ChildUsage), engine baseline/context support, subgraph handler propagation logic, aggregation helpers, and targeted regression tests.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/subgraph-budget-propagation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-13: Move this bullet (the long "Subgraph nodes no longer bypass
`--max-tokens` / `--max-cost` budgets" entry) into the existing "Unreleased"
section's later "### Fixed" block so there is only one "### Fixed" heading under
Unreleased; delete the earlier duplicate "### Fixed" header and its
empty/duplicate block, keeping the bullet content intact under the canonical
"Unreleased" -> "### Fixed" grouping to conform to Keep a Changelog structure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de811b74-7e80-42f4-b600-f4e9a4c25918

📥 Commits

Reviewing files that changed from the base of the PR and between 79e9db4 and f40e1aa.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • pipeline/engine.go
  • pipeline/engine_run.go
  • pipeline/handler.go
  • pipeline/subgraph.go
  • pipeline/subgraph_test.go
  • pipeline/trace.go

Comment thread CHANGELOG.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a budget-enforcement and accounting gap where subgraph-executed nodes could bypass --max-tokens / --max-cost limits and have their usage omitted from the parent run’s aggregated usage.

Changes:

  • Add child-run usage propagation (Outcome.ChildUsageTraceEntry.ChildUsage) and fold it into Trace.AggregateUsage.
  • Propagate budget enforcement into child engines via a context-carried ChildRunContext and a new WithBaselineUsage option to enforce on parent+child combined usage.
  • Add regression tests covering rollup, parent halt after subgraph overspend, mid-subgraph child halt using baseline usage, and nested subgraphs; document the fix in the changelog.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pipeline/trace.go Extends trace schema with ChildUsage and folds child usage into aggregated totals/provider buckets.
pipeline/handler.go Adds Outcome.ChildUsage for handlers that launch child runs to return aggregated usage upstream.
pipeline/engine.go Introduces ChildRunContext plumbing and WithBaselineUsage to support combined budget checks in child engines.
pipeline/engine_run.go Uses combined (baseline+local) usage for budget checks; stamps Outcome.ChildUsage onto trace entries; stashes child-run context before handler dispatch.
pipeline/subgraph.go Propagates parent budget guard + baseline to child engine and returns child usage via Outcome.ChildUsage; maps child budget halt to success to allow parent guard to fire.
pipeline/subgraph_test.go Adds four regression tests to pin the fixed behavior across rollup/enforcement/nesting scenarios.
CHANGELOG.md Documents the bug, impact, and the applied fix at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pipeline/engine.go Outdated
// check sees parent spend + child trace combined, preventing the "subgraph
// sandbox" escape where an operator's --max-tokens / --max-cost ceiling
// would otherwise be silently non-binding for nodes nested in a subgraph.
// Nil baselines are no-ops; zero-token baselines are treated as no-ops.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WithBaselineUsage doc says “zero-token baselines are treated as no-ops”, but the option currently assigns baseline unconditionally (only nil is a no-op). Either update the comment to match behavior, or ignore empty baselines in WithBaselineUsage (e.g., when all totals are 0) to keep the API contract accurate.

Suggested change
// Nil baselines are no-ops; zero-token baselines are treated as no-ops.
// Nil baselines are no-ops.

Copilot uses AI. Check for mistakes.
Comment thread pipeline/engine_run.go Outdated
Comment on lines +438 to +441
childRunCtx := context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{
BudgetGuard: e.budgetGuard,
Baseline: e.combinedUsageForBudget(s),
})
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChildRunContext always stashes a Baseline computed from combinedUsageForBudget(s), even when e.budgetGuard is nil. That baseline then gets propagated into subgraph engines (via WithBaselineUsage) and forces extra clone/fold work on every between-node budget check despite budgets being disabled. Consider only setting/passing Baseline when a BudgetGuard is non-nil (or gating WithBaselineUsage on BudgetGuard) so child runs don’t pay the baseline-merging cost unless budget enforcement is active.

Suggested change
childRunCtx := context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{
BudgetGuard: e.budgetGuard,
Baseline: e.combinedUsageForBudget(s),
})
childRunCtx := ctx
if e.budgetGuard != nil {
childRunCtx = context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{
BudgetGuard: e.budgetGuard,
Baseline: e.combinedUsageForBudget(s),
})
}

Copilot uses AI. Check for mistakes.
Three bot findings, all accepted:

1. CodeRabbit (CHANGELOG.md): Unreleased had two consecutive `### Fixed`
   sub-sections — the subgraph bullet sat in its own block above the
   Added/Changed groups while the rest of the fixes from this cycle
   were already under a later `### Fixed`. Folded the subgraph bullet
   into the canonical Fixed block at the end of Unreleased.

2. Copilot (pipeline/engine.go:130): WithBaselineUsage godoc claimed
   "zero-token baselines are treated as no-ops" but the implementation
   assigned unconditionally — only nil was a no-op. Corrected the doc
   to match actual behavior.

3. Copilot (pipeline/engine_run.go:441): the engine was allocating
   ChildRunContext and computing combinedUsageForBudget on every
   handler dispatch even when e.budgetGuard was nil — clone/fold work
   for no benefit on unbudgeted runs. Gated the ctx.Value stash on
   e.budgetGuard != nil. Subgraph handler already tolerates a nil
   ChildRunContextFromContext (no-ops the baseline wiring), so no
   downstream change was needed.

Verification: go build, go test ./... -short (17 packages;
TestManagerLoopHandler_CtxCancellation is a pre-existing flake that
passes on retry), make fmt-check, all four subgraph-budget
regressions green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pipeline/engine_run.go
Comment on lines 122 to 128
func (e *Engine) checkBudgetAfterEmit(s *runState) *loopResult {
breach := e.budgetGuard.Check(s.trace.AggregateUsage(), s.trace.StartTime)
breach := e.budgetGuard.Check(e.combinedUsageForBudget(s), s.trace.StartTime)
if breach.Kind == BudgetOK {
return nil
}
lr := e.haltForBudget(s, breach)
return &lr
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BudgetGuard now checks combinedUsageForBudget, but haltForBudget (and emitCostUpdate) still use s.trace.AggregateUsage() when building the EventBudgetExceeded cost snapshot and EngineResult.Usage. When baselineUsage is set, a child engine can halt with a breach even though the emitted/returned usage snapshot is below the configured ceiling, which is confusing for diagnostics/logs. Consider computing the usage snapshot once in checkBudgetAfterEmit and passing it through so the emitted budget-exceeded snapshot reflects the same combined usage that triggered the breach (or otherwise include both baseline+local in the budget event/result).

Copilot uses AI. Check for mistakes.
…the guard

Addresses Copilot review finding on PR #187.

Prior to this commit, BudgetGuard.Check saw the combined parent-baseline
+ child-trace snapshot (the whole point of the #183 fix), but the
subsequent EventBudgetExceeded and EventCostUpdated events built their
CostSnapshot from s.trace.AggregateUsage() — child-local only. A child
engine halting mid-subgraph on 50 (baseline) + 60 (trace) = 110 > 100
would emit "budget exceeded" alongside a snapshot showing 60, which
looks like the emitter was wrong about whether to halt.

Fix: emitCostUpdate and haltForBudget's CostSnapshot both now call
e.combinedUsageForBudget(s), which is what BudgetGuard.Check already
uses. Events report the same value that triggered the halt. At the
top level (no baseline), combinedUsageForBudget returns the local
aggregate unchanged — no visible change for unnested runs.

EngineResult.Usage intentionally keeps the child-local aggregate via
s.trace.AggregateUsage(). The subgraph handler copies this onto
Outcome.ChildUsage and the parent trace's AggregateUsage folds it back
in; substituting the combined snapshot would double-count the parent's
own spend once the parent aggregates a second time. Call sites and
call semantics are documented inline.

New regression test TestSubgraph_BudgetExceededEvent_ReportsCombinedSnapshot
captures EventBudgetExceeded emissions from a nested overspend and
asserts that at least one reports the combined total (≥110). This
would have failed under the pre-fix code since both the child's and
the parent's emissions would have shown their local sub-totals only.

Verification: go build, go test ./... -short (17 packages; existing
subgraph-budget regressions all still green), make fmt-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clintecker clintecker merged commit 8526d62 into main Apr 24, 2026
2 checks passed
clintecker added a commit that referenced this pull request Apr 24, 2026
…opagation

fix(engine): manager_loop bypassed BudgetGuard + dropped child Usage (sibling of #187)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(engine): subgraph handler bypasses BudgetGuard and drops child Usage from parent rollup

2 participants