fix(engine): subgraph handler bypassed BudgetGuard + dropped child Usage#187
fix(engine): subgraph handler bypassed BudgetGuard + dropped child Usage#187clintecker merged 3 commits intomainfrom
Conversation
Closes #183. Pre-fix, operator-configured --max-tokens / --max-cost ceilings were silently non-binding for any node placed inside a subgraph. Two independent escape hatches, either alone enough to bust the budget: 1. The child pipeline.Engine was constructed without WithBudgetGuard, so its between-node checks were no-ops. The child ran to completion no matter how much it burned. 2. SubgraphHandler.Execute returned an Outcome with no usage rollup. The parent trace's AggregateUsage missed all child spend, so the parent's own guard never saw the breach either. Adversarial review on PR #182 demonstrated this can be chained with other gaps in the ACP estimator (reasoning + tool-call payloads uncounted) to reach a synthetic 100-1000x real/declared cost ratio in a single run. Fixing the subgraph-level bypass is independent of and more impactful than any per-backend accuracy work. Change: Schema (pipeline/handler.go, pipeline/trace.go): - Outcome.ChildUsage *UsageSummary — populated by handlers that launch a child run. - TraceEntry.ChildUsage *UsageSummary with `json:"child_usage,omitempty"` — backwards-compatible JSON extension for status.json and activity.jsonl. - Trace.AggregateUsage folds TraceEntry.ChildUsage into both the running totals and per-provider buckets, preserving provider attribution. Extracted helpers foldStatsIntoSummary / foldChildUsageIntoSummary. Engine (pipeline/engine.go, pipeline/engine_run.go): - ChildRunContext + ChildRunContextFromContext: a ctx.Value channel for exposing the current BudgetGuard + baseline usage to handlers that launch child engines. The engine now stashes these on ctx before every handler.Execute. - WithBaselineUsage(*UsageSummary) EngineOption: pre-loads a parent's consumed usage so the child's checkBudgetAfterEmit evaluates baseline + trace.AggregateUsage against limits. - combinedUsageForBudget + cloneUsageSummary: deep-clone the baseline per check so folds don't mutate the shared snapshot. - handler dispatch stamps outcome.ChildUsage onto traceEntry. Subgraph (pipeline/subgraph.go): - Reads ChildRunContextFromContext; passes WithBudgetGuard and WithBaselineUsage into the child engine. - Returns Outcome.ChildUsage = result.Usage regardless of child status. - Child-side OutcomeBudgetExceeded is mapped to parent OutcomeSuccess (not OutcomeFail): the strict-failure-edges rule would otherwise halt the parent before checkBudgetAfterEmit folded the child usage into the aggregate. With the success mapping, the parent's own guard fires on the next between-node check with the correct OutcomeBudgetExceeded status and BudgetLimitsHit populated. Tests (pipeline/subgraph_test.go, 4 new): - Fix_UsageRollup — child stats land in parent ProviderTotals, not "unknown" - Fix_ParentGuardHaltsAfterOverspend — subgraph overspends; parent's between-node check halts before the following node runs - Fix_ChildGuardHaltsMidSubgraph — parent pre-consumes 50, subgraph consumes 60 (combined 110 > 100); child halts mid-subgraph on baseline + partial trace - Fix_NestedSubgraph — two-level nesting, usage rolls all the way up to the grandparent Not in this fix (filed/noted separately): - manager_loop handler has the same shape and likely needs the same treatment. - Mid-stream enforcement inside a single Prompt() call — guard still fires only between nodes. Verification: go build, go test ./... -short (17 packages), make fmt-check; the four regression tests cover the three enforcement paths and the nested case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughPropagates parent budget guard and a baseline usage snapshot into child/subgraph engines, records child-reported usage on trace entries, and changes aggregation and budget checks so parent guards consider parent + child spend during enforcement and rollups. Changes
Sequence Diagram(s)sequenceDiagram
participant Parent as Parent Engine
participant Ctx as Context
participant Child as Child Engine
participant Registry as Subgraph Handler
participant Budget as BudgetGuard
Parent->>Ctx: Snapshot BudgetGuard & baseline usage
Parent->>Parent: Build ChildRunContext (guard + baseline)
Parent->>Ctx: Put ChildRunContext into context
Parent->>Registry: Invoke subgraph handler (execute node)
Registry->>Ctx: Retrieve ChildRunContext
Registry->>Child: Create child Engine with WithBudgetGuard & WithBaselineUsage
Child->>Budget: Check combined (baseline + local trace) usage
Budget-->>Child: Approve or deny
Child->>Child: Run child pipeline
Child-->>Registry: Return EngineResult (with aggregated usage)
Registry-->>Parent: Return Outcome with ChildUsage included
Parent->>Parent: Record ChildUsage in trace entry and aggregate into totals
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 10-13: Move this bullet (the long "Subgraph nodes no longer bypass
`--max-tokens` / `--max-cost` budgets" entry) into the existing "Unreleased"
section's later "### Fixed" block so there is only one "### Fixed" heading under
Unreleased; delete the earlier duplicate "### Fixed" header and its
empty/duplicate block, keeping the bullet content intact under the canonical
"Unreleased" -> "### Fixed" grouping to conform to Keep a Changelog structure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: de811b74-7e80-42f4-b600-f4e9a4c25918
📒 Files selected for processing (7)
CHANGELOG.mdpipeline/engine.gopipeline/engine_run.gopipeline/handler.gopipeline/subgraph.gopipeline/subgraph_test.gopipeline/trace.go
There was a problem hiding this comment.
Pull request overview
Fixes a budget-enforcement and accounting gap where subgraph-executed nodes could bypass --max-tokens / --max-cost limits and have their usage omitted from the parent run’s aggregated usage.
Changes:
- Add child-run usage propagation (
Outcome.ChildUsage→TraceEntry.ChildUsage) and fold it intoTrace.AggregateUsage. - Propagate budget enforcement into child engines via a context-carried
ChildRunContextand a newWithBaselineUsageoption to enforce on parent+child combined usage. - Add regression tests covering rollup, parent halt after subgraph overspend, mid-subgraph child halt using baseline usage, and nested subgraphs; document the fix in the changelog.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pipeline/trace.go | Extends trace schema with ChildUsage and folds child usage into aggregated totals/provider buckets. |
| pipeline/handler.go | Adds Outcome.ChildUsage for handlers that launch child runs to return aggregated usage upstream. |
| pipeline/engine.go | Introduces ChildRunContext plumbing and WithBaselineUsage to support combined budget checks in child engines. |
| pipeline/engine_run.go | Uses combined (baseline+local) usage for budget checks; stamps Outcome.ChildUsage onto trace entries; stashes child-run context before handler dispatch. |
| pipeline/subgraph.go | Propagates parent budget guard + baseline to child engine and returns child usage via Outcome.ChildUsage; maps child budget halt to success to allow parent guard to fire. |
| pipeline/subgraph_test.go | Adds four regression tests to pin the fixed behavior across rollup/enforcement/nesting scenarios. |
| CHANGELOG.md | Documents the bug, impact, and the applied fix at a high level. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // check sees parent spend + child trace combined, preventing the "subgraph | ||
| // sandbox" escape where an operator's --max-tokens / --max-cost ceiling | ||
| // would otherwise be silently non-binding for nodes nested in a subgraph. | ||
| // Nil baselines are no-ops; zero-token baselines are treated as no-ops. |
There was a problem hiding this comment.
The WithBaselineUsage doc says “zero-token baselines are treated as no-ops”, but the option currently assigns baseline unconditionally (only nil is a no-op). Either update the comment to match behavior, or ignore empty baselines in WithBaselineUsage (e.g., when all totals are 0) to keep the API contract accurate.
| // Nil baselines are no-ops; zero-token baselines are treated as no-ops. | |
| // Nil baselines are no-ops. |
| childRunCtx := context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{ | ||
| BudgetGuard: e.budgetGuard, | ||
| Baseline: e.combinedUsageForBudget(s), | ||
| }) |
There was a problem hiding this comment.
ChildRunContext always stashes a Baseline computed from combinedUsageForBudget(s), even when e.budgetGuard is nil. That baseline then gets propagated into subgraph engines (via WithBaselineUsage) and forces extra clone/fold work on every between-node budget check despite budgets being disabled. Consider only setting/passing Baseline when a BudgetGuard is non-nil (or gating WithBaselineUsage on BudgetGuard) so child runs don’t pay the baseline-merging cost unless budget enforcement is active.
| childRunCtx := context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{ | |
| BudgetGuard: e.budgetGuard, | |
| Baseline: e.combinedUsageForBudget(s), | |
| }) | |
| childRunCtx := ctx | |
| if e.budgetGuard != nil { | |
| childRunCtx = context.WithValue(ctx, childRunContextKey{}, &ChildRunContext{ | |
| BudgetGuard: e.budgetGuard, | |
| Baseline: e.combinedUsageForBudget(s), | |
| }) | |
| } |
Three bot findings, all accepted: 1. CodeRabbit (CHANGELOG.md): Unreleased had two consecutive `### Fixed` sub-sections — the subgraph bullet sat in its own block above the Added/Changed groups while the rest of the fixes from this cycle were already under a later `### Fixed`. Folded the subgraph bullet into the canonical Fixed block at the end of Unreleased. 2. Copilot (pipeline/engine.go:130): WithBaselineUsage godoc claimed "zero-token baselines are treated as no-ops" but the implementation assigned unconditionally — only nil was a no-op. Corrected the doc to match actual behavior. 3. Copilot (pipeline/engine_run.go:441): the engine was allocating ChildRunContext and computing combinedUsageForBudget on every handler dispatch even when e.budgetGuard was nil — clone/fold work for no benefit on unbudgeted runs. Gated the ctx.Value stash on e.budgetGuard != nil. Subgraph handler already tolerates a nil ChildRunContextFromContext (no-ops the baseline wiring), so no downstream change was needed. Verification: go build, go test ./... -short (17 packages; TestManagerLoopHandler_CtxCancellation is a pre-existing flake that passes on retry), make fmt-check, all four subgraph-budget regressions green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (e *Engine) checkBudgetAfterEmit(s *runState) *loopResult { | ||
| breach := e.budgetGuard.Check(s.trace.AggregateUsage(), s.trace.StartTime) | ||
| breach := e.budgetGuard.Check(e.combinedUsageForBudget(s), s.trace.StartTime) | ||
| if breach.Kind == BudgetOK { | ||
| return nil | ||
| } | ||
| lr := e.haltForBudget(s, breach) | ||
| return &lr |
There was a problem hiding this comment.
BudgetGuard now checks combinedUsageForBudget, but haltForBudget (and emitCostUpdate) still use s.trace.AggregateUsage() when building the EventBudgetExceeded cost snapshot and EngineResult.Usage. When baselineUsage is set, a child engine can halt with a breach even though the emitted/returned usage snapshot is below the configured ceiling, which is confusing for diagnostics/logs. Consider computing the usage snapshot once in checkBudgetAfterEmit and passing it through so the emitted budget-exceeded snapshot reflects the same combined usage that triggered the breach (or otherwise include both baseline+local in the budget event/result).
…the guard Addresses Copilot review finding on PR #187. Prior to this commit, BudgetGuard.Check saw the combined parent-baseline + child-trace snapshot (the whole point of the #183 fix), but the subsequent EventBudgetExceeded and EventCostUpdated events built their CostSnapshot from s.trace.AggregateUsage() — child-local only. A child engine halting mid-subgraph on 50 (baseline) + 60 (trace) = 110 > 100 would emit "budget exceeded" alongside a snapshot showing 60, which looks like the emitter was wrong about whether to halt. Fix: emitCostUpdate and haltForBudget's CostSnapshot both now call e.combinedUsageForBudget(s), which is what BudgetGuard.Check already uses. Events report the same value that triggered the halt. At the top level (no baseline), combinedUsageForBudget returns the local aggregate unchanged — no visible change for unnested runs. EngineResult.Usage intentionally keeps the child-local aggregate via s.trace.AggregateUsage(). The subgraph handler copies this onto Outcome.ChildUsage and the parent trace's AggregateUsage folds it back in; substituting the combined snapshot would double-count the parent's own spend once the parent aggregates a second time. Call sites and call semantics are documented inline. New regression test TestSubgraph_BudgetExceededEvent_ReportsCombinedSnapshot captures EventBudgetExceeded emissions from a nested overspend and asserts that at least one reports the combined total (≥110). This would have failed under the pre-fix code since both the child's and the parent's emissions would have shown their local sub-totals only. Verification: go build, go test ./... -short (17 packages; existing subgraph-budget regressions all still green), make fmt-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…opagation fix(engine): manager_loop bypassed BudgetGuard + dropped child Usage (sibling of #187)
Closes #183.
Before
Pre-fix, operator-configured
--max-tokens/--max-costceilings were silently non-binding for any node placed inside a subgraph. Two independent escape hatches, either alone enough to bust the budget:pipeline.Enginewas constructed withoutWithBudgetGuard, so its between-node checks were no-ops. The child ran to completion no matter how much it burned.Adversarial review on PR #182 demonstrated this can be chained with other gaps in the ACP estimator (reasoning + tool-call payloads uncounted) to reach a synthetic 100–1000× real/declared cost ratio in a single run. Fixing the subgraph-level bypass is independent of and more impactful than any per-backend accuracy work.
Change
Schema
Engine
Subgraph handler
Tests (4 new regressions)
Not in this fix
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Tests