fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544
fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544
Conversation
…ut not .result) Follow-up to #529. That PR migrated `uipath-human-in-the-loop` to v1.0 (`output`, not `result`) for HITL value access, but left the maestro-flow plugin docs and 3 maestro-flow HITL test criteria on the stale `.result` pattern. As a result, agents that followed the canonical v1.0 schema failed maestro-flow tests; agents that followed the stale maestro-flow doc passed those tests but emitted non-v1.0 flows. The two doc surfaces contradicted each other. Root-causes from coder_eval run 2026-05-04_04-05-27: MST-9345 (skill-flow-hitl-quality-schema-design, FAIL 0.93): agent emitted `$vars.poReview1.output.decision` per v1.0 canonical; criterion enforced `.result` per stale maestro-flow doc. Comparison across the 6 maestro-flow HITL tasks in the same run showed `.output` vs `.result` was effectively 50/50 stochastic depending on which doc the agent attended to. Cross-impact MST-9301 (skill-flow-hitl-quality-result-downstream, FAIL 0.80): same drift. Both tests had stale criteria; this PR resolves both without any budget bumps. Changes: - impl.md: JSON `outputs` block now uses `var: "output"` and `source: "=result.Action"` for status, matching #529's quickform.md. "Output variables" section uses `.output.<field>`. Note added explaining the v1.0 vs pre-v1.0 distinction. - planning.md: same — Output Variables and planner annotation align with v1.0. - quality_01_schema_design.yaml: criterion `.result`→`.output`; placeholder disambiguates field-value access from outcome-status access (the latter drove a residual stochastic miss). - quality_02_result_downstream.yaml: criterion + prompt example `.result`→`.output`. - quality_03_boolean_decision.yaml: criterion + prompt examples `.result.approved`→`.output.approved`. - smoke_03_multi_outcome_routing.yaml: criterion + prompt example `.result`→`.output`. Verified with the e2e experiment (`SKILLS_REPO_PATH` pointing at this worktree, plugin loaded via `tests/experiments/e2e.yaml`): - quality_01_schema_design: 5/5 SUCCESS post-disambiguation, all emitted `$vars.poReview1.output.decision` (turns 38–77). - quality_02_result_downstream: 2/2 SUCCESS at 1.0. - quality_03_boolean_decision: 2/2 SUCCESS at 1.0 (regression check — passed before, still passes after). - smoke_03_multi_outcome_routing: 2/2 SUCCESS at 1.0. Total: 11/11 SUCCESS post-fix on the affected surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @tmatup's task in 2m 35s —— View job Review: PR #544 — fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema
SummaryReplaces stale Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | N/A | No frontmatter changes |
| E2E Tests | OK | All 4 test criteria correctly aligned to v1.0 .output pattern |
| Skill Body | N/A | SKILL.md not modified |
| References & Assets | OK | Both impl.md and planning.md now consistent with canonical hitl-node-quickform.md |
| Repo Hygiene | OK | Changes scoped to the affected surface; no secrets, no unrelated files |
Issues for Manual Review
- The
outputsJSON inimpl.md(lines 56–59) omits some fields present in the full canonical example (hitl-node-quickform.mdfull node JSON at lines 77–133 doesn't includeoutputsat node level — it's in theoutputDefinition). The structure shown is the node instanceoutputs, which is a correct simplified representation for the maestro-flow context. Just confirming this is intentional and matches runtime behavior.
Conclusion
Clean, well-scoped fix. The .result → .output migration is consistent across all 6 files, and the outputs JSON now matches the canonical v1.0 schema. The only nit is one leftover natural-language "HITL result" at quality_02:27 — Low severity, but easy to clean up. Approve with optional nit.
…y_02 prompt PR #544 bot review nit: quality_02_result_downstream.yaml line 27 still said "reads the reviewer's decision from the HITL result" while line 29 already says "the HITL output". Replace for full consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review ResponseThanks for the careful pass. Both items addressed below. 1. Fixed in 2. Manual review item — Keeping current implementation. The structure shown is the node-instance Verified against a real passing v1.0 flow — Could optionally extend 🤖 Generated with Claude Code Co-Authored-By: Claude noreply@anthropic.com |
Summary
Follow-up to #529 — that PR migrated
uipath-human-in-the-loopto v1.0 (output, notresult) for HITL value access, but left the maestro-flow plugin docs and 3 maestro-flow HITL test criteria on the stale.resultpattern. The two doc surfaces contradicted each other, so agents flip-flopped between.resultand.outputdepending on which doc dominated their context window.This PR realigns the maestro-flow side to v1.0:
skills/uipath-maestro-flow/.../hitl/{impl,planning}.md— outputs JSON, "Output Variables" section, planner annotation now use.output(matching fix(hitl): align HITL skill and tests to v1.0 node schema #529'shitl-node-quickform.md). Added v1.0 transition notes..result→.output.quality_01placeholder also disambiguates field-value access vs outcome-status access (the latter drove a residual stochastic miss during verification).Tickets resolved
quality_01_schema_design) — primary subject of this PR.quality_02_result_downstream"max_turns shortfall") — turned out to be the same stale-criterion drift, not a budget issue. Resolves without any budget bump.smoke_03_multi_outcome_routingturn_timeout +core.control.endvalidator). The criterion-side stale.resultis fixed here; the validator inconsistency remains for a separate follow-up.Root cause (deep dive)
In coder_eval run
2026-05-04_04-05-27, the 6 maestro-flow HITL tasks split roughly 50/50 between agents emitting.result(matching the stale maestro-flowhitl/impl.md) and.output(matching the canonical v1.0hitl-node-quickform.md). Same agent, same docs, same flow shape — different outcome depending on which doc the agent attended to.skill-flow-hitl-quality-boolean-decision$vars.complianceReview1.result.approvedskill-flow-hitl-quality-schema-design$vars.poReview1.output.decision.result)skill-flow-hitl-quality-result-downstream$vars.expenseReview1.output.approved.result)The criterion was the wrong artifact, not the agent.
Test plan
coder-eval run quality_01_schema_design.yaml --repeats 5 --experiment tests/experiments/e2e.yaml→ 5/5 SUCCESS, all emitted$vars.poReview1.output.decision.coder-eval run quality_02_result_downstream.yaml --repeats 2→ 2/2 SUCCESS at 1.0 (was MST-9301, resolves with no budget bump).coder-eval run quality_03_boolean_decision.yaml --repeats 2→ 2/2 SUCCESS (regression check — passed before, still passes after).coder-eval run smoke_03_multi_outcome_routing.yaml --repeats 2→ 2/2 SUCCESS.🤖 Generated with Claude Code