Skip to content

fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544

Closed
tmatup wants to merge 2 commits intomainfrom
tmatup/fix-mst-9345-hitl-result-access-pattern
Closed

fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544
tmatup wants to merge 2 commits intomainfrom
tmatup/fix-mst-9345-hitl-result-access-pattern

Conversation

@tmatup
Copy link
Copy Markdown
Member

@tmatup tmatup commented May 4, 2026

Summary

Follow-up to #529 — that PR migrated uipath-human-in-the-loop to v1.0 (output, not result) for HITL value access, but left the maestro-flow plugin docs and 3 maestro-flow HITL test criteria on the stale .result pattern. The two doc surfaces contradicted each other, so agents flip-flopped between .result and .output depending on which doc dominated their context window.

This PR realigns the maestro-flow side to v1.0:

  • Docs: skills/uipath-maestro-flow/.../hitl/{impl,planning}.md — outputs JSON, "Output Variables" section, planner annotation now use .output (matching fix(hitl): align HITL skill and tests to v1.0 node schema #529's hitl-node-quickform.md). Added v1.0 transition notes.
  • Tests: 4 maestro-flow HITL test YAMLs — criterion includes + prompt examples .result.output. quality_01 placeholder also disambiguates field-value access vs outcome-status access (the latter drove a residual stochastic miss during verification).

Tickets resolved

  • MST-9345 (HITL value-access drift in quality_01_schema_design) — primary subject of this PR.
  • MST-9301 (quality_02_result_downstream "max_turns shortfall") — turned out to be the same stale-criterion drift, not a budget issue. Resolves without any budget bump.
  • Partially helps MST-9346 (smoke_03_multi_outcome_routing turn_timeout + core.control.end validator). The criterion-side stale .result is fixed here; the validator inconsistency remains for a separate follow-up.

Root cause (deep dive)

In coder_eval run 2026-05-04_04-05-27, the 6 maestro-flow HITL tasks split roughly 50/50 between agents emitting .result (matching the stale maestro-flow hitl/impl.md) and .output (matching the canonical v1.0 hitl-node-quickform.md). Same agent, same docs, same flow shape — different outcome depending on which doc the agent attended to.

task_id HITL access pattern Pre-fix result
skill-flow-hitl-quality-boolean-decision $vars.complianceReview1.result.approved PASS 1.0 (stale criterion)
skill-flow-hitl-quality-schema-design $vars.poReview1.output.decision FAIL 0.93 (criterion expected .result)
skill-flow-hitl-quality-result-downstream $vars.expenseReview1.output.approved FAIL 0.80 (criterion expected .result)

The criterion was the wrong artifact, not the agent.

Test plan

  • coder-eval run quality_01_schema_design.yaml --repeats 5 --experiment tests/experiments/e2e.yaml → 5/5 SUCCESS, all emitted $vars.poReview1.output.decision.
  • coder-eval run quality_02_result_downstream.yaml --repeats 2 → 2/2 SUCCESS at 1.0 (was MST-9301, resolves with no budget bump).
  • coder-eval run quality_03_boolean_decision.yaml --repeats 2 → 2/2 SUCCESS (regression check — passed before, still passes after).
  • coder-eval run smoke_03_multi_outcome_routing.yaml --repeats 2 → 2/2 SUCCESS.
  • Total: 11/11 SUCCESS post-fix on the affected surface.
  • CI smoke-skills + smoke-rpa-skills checks pass on this branch.

🤖 Generated with Claude Code

…ut not .result)

Follow-up to #529. That PR migrated `uipath-human-in-the-loop` to v1.0
(`output`, not `result`) for HITL value access, but left the maestro-flow
plugin docs and 3 maestro-flow HITL test criteria on the stale `.result`
pattern. As a result, agents that followed the canonical v1.0 schema failed
maestro-flow tests; agents that followed the stale maestro-flow doc passed
those tests but emitted non-v1.0 flows. The two doc surfaces contradicted
each other.

Root-causes from coder_eval run 2026-05-04_04-05-27:

MST-9345 (skill-flow-hitl-quality-schema-design, FAIL 0.93): agent emitted
`$vars.poReview1.output.decision` per v1.0 canonical; criterion enforced
`.result` per stale maestro-flow doc. Comparison across the 6 maestro-flow
HITL tasks in the same run showed `.output` vs `.result` was effectively
50/50 stochastic depending on which doc the agent attended to.

Cross-impact MST-9301 (skill-flow-hitl-quality-result-downstream, FAIL 0.80):
same drift. Both tests had stale criteria; this PR resolves both without
any budget bumps.

Changes:
- impl.md: JSON `outputs` block now uses `var: "output"` and
  `source: "=result.Action"` for status, matching #529's quickform.md.
  "Output variables" section uses `.output.<field>`. Note added explaining
  the v1.0 vs pre-v1.0 distinction.
- planning.md: same — Output Variables and planner annotation align with
  v1.0.
- quality_01_schema_design.yaml: criterion `.result`→`.output`; placeholder
  disambiguates field-value access from outcome-status access (the latter
  drove a residual stochastic miss).
- quality_02_result_downstream.yaml: criterion + prompt example
  `.result`→`.output`.
- quality_03_boolean_decision.yaml: criterion + prompt examples
  `.result.approved`→`.output.approved`.
- smoke_03_multi_outcome_routing.yaml: criterion + prompt example
  `.result`→`.output`.

Verified with the e2e experiment (`SKILLS_REPO_PATH` pointing at this
worktree, plugin loaded via `tests/experiments/e2e.yaml`):

- quality_01_schema_design: 5/5 SUCCESS post-disambiguation, all emitted
  `$vars.poReview1.output.decision` (turns 38–77).
- quality_02_result_downstream: 2/2 SUCCESS at 1.0.
- quality_03_boolean_decision: 2/2 SUCCESS at 1.0 (regression check —
  passed before, still passes after).
- smoke_03_multi_outcome_routing: 2/2 SUCCESS at 1.0.

Total: 11/11 SUCCESS post-fix on the affected surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Claude finished @tmatup's task in 2m 35s —— View job


Review: PR #544 — fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema

  • Read review criteria and project conventions
  • Analyze full diff (9500990...5af4b9d)
  • Read existing PR conversation
  • Read each changed file in full for context
  • Apply skill-specific and content quality checks
  • Post review

Summary

Replaces stale .result access pattern with .output across 2 maestro-flow HITL reference docs and 4 test task YAMLs, aligning with the v1.0 schema established by PR #529 in uipath-human-in-the-loop. Also corrects the outputs JSON example in impl.md to match the canonical outputDefinition (fixing both var and source values).

Change-by-Change Review

1. skills/uipath-maestro-flow/references/author/references/plugins/hitl/impl.md

Severity: OK

Three changes:

  • Outputs JSON (lines 57–58): "var": "result""var": "output" and "source": "=status""source": "=result.Action". Both now match the canonical outputDefinition in hitl-node-quickform.md:194-195. Correct.
  • Output variables section (lines 67–70): .result.output, status description updated to outcome action semantics ("Continue" / "End" instead of "completed"). Correct and consistent.
  • v1.0 transition note (line 72): New blockquote with cross-skill reference. Follows the established pattern (existing cross-skill refs at lines 12, 14 already have explicit self-containment disclaimers). Informative without duplicating content from the linked doc.

2. skills/uipath-maestro-flow/references/author/references/plugins/hitl/planning.md

Severity: OK

Three changes:

  • Output variables section (lines 44–46): .result.output, matching impl.md. Correct.
  • v1.0 transition note (line 48): New blockquote paralleling the one in impl.md. Includes practical guidance for downstream consumers. Good addition.
  • Planning annotation (line 100): result, statusoutput, status in the node table example. Correct.

3. tests/tasks/uipath-maestro-flow/hitl/quality_01_schema_design.yaml

Severity: OK

  • result_variable placeholder hint updated to clarify "must reference an output field value, not the outcome status" (line 45). Good disambiguation — directly addresses the root cause described in the PR.
  • file_contains criterion (lines 92–96): .result.output. Correct.

4. tests/tasks/uipath-maestro-flow/hitl/quality_02_result_downstream.yaml

Severity: Low

  • Description, initial prompt example, and all 3 criteria correctly updated .result.output.

  • Residual inconsistency at line 27: initial prompt still reads "reads the reviewer's decision from the HITL result" — the word "result" here is used as natural language (not as an access pattern), but it's inconsistent with the rest of the file which now exclusively uses "output" terminology. This is the one place in the four test files where the agent might pick up a stale signal. Line 29 already says "reference the HITL output", so "result" at line 27 is vestigial.

    Suggested fix: Change "from the HITL result""from the HITL output" at line 27.

    Fix this →

5. tests/tasks/uipath-maestro-flow/hitl/quality_03_boolean_decision.yaml

Severity: OK

All 7 occurrences updated consistently: description, initial prompt (3 places), and both file_contains criteria. Clean.

6. tests/tasks/uipath-maestro-flow/hitl/smoke_03_multi_outcome_routing.yaml

Severity: OK

All 6 occurrences updated consistently: initial prompt (3 places), example in report JSON, and file_contains criterion description + includes. Clean.

What's Missing

  • Line 27 of quality_02_result_downstream.yaml: One remaining natural-language "HITL result" in the initial prompt that should say "HITL output" for full consistency (Low — unlikely to affect agent behavior given line 29 already says "HITL output", but worth cleaning up).
  • Nothing else identified. CODEOWNERS doesn't need updating (no new skill paths). No secrets or personal paths in any changed file.

Area Ratings

Area Status Notes
Frontmatter N/A No frontmatter changes
E2E Tests OK All 4 test criteria correctly aligned to v1.0 .output pattern
Skill Body N/A SKILL.md not modified
References & Assets OK Both impl.md and planning.md now consistent with canonical hitl-node-quickform.md
Repo Hygiene OK Changes scoped to the affected surface; no secrets, no unrelated files

Issues for Manual Review

  • The outputs JSON in impl.md (lines 56–59) omits some fields present in the full canonical example (hitl-node-quickform.md full node JSON at lines 77–133 doesn't include outputs at node level — it's in the outputDefinition). The structure shown is the node instance outputs, which is a correct simplified representation for the maestro-flow context. Just confirming this is intentional and matches runtime behavior.

Conclusion

Clean, well-scoped fix. The .result.output migration is consistent across all 6 files, and the outputs JSON now matches the canonical v1.0 schema. The only nit is one leftover natural-language "HITL result" at quality_02:27 — Low severity, but easy to clean up. Approve with optional nit.

…y_02 prompt

PR #544 bot review nit: quality_02_result_downstream.yaml line 27 still said
"reads the reviewer's decision from the HITL result" while line 29 already
says "the HITL output". Replace for full consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tmatup
Copy link
Copy Markdown
Member Author

tmatup commented May 4, 2026

Review Response

Thanks for the careful pass. Both items addressed below.

1. quality_02_result_downstream.yaml:27 — vestigial "HITL result" natural-language

Fixed in a19643b2. Changed "reads the reviewer's decision from the HITL result""reads the reviewer's decision from the HITL output" for consistency with line 29 and the rest of the file. Confirmed via grep -nE 'HITL result|the result\b' across all 4 test YAMLs — clean.

2. Manual review item — impl.md:56–59 outputs JSON differs from canonical outputDefinition

Keeping current implementation. The structure shown is the node-instance outputs block (what goes into workflow.nodes[]), which is intentionally distinct from the type definition outputDefinition (what goes into workflow.definitions[]). They are different JSON locations with different field names by design.

Verified against a real passing v1.0 flow — runs/2026-05-04_04-05-27/.../skill-flow-hitl-quality-boolean-decision/00/artifacts/.../VendorApproval.flow shows the HITL node instance carries an outputs block alongside inputs, while the definition entry in the same file carries outputDefinition. Both coexist; the runtime resolves variables from the instance outputs.<port>.var when present. So the maestro-flow impl.md showing "outputs": { "output": { ..., "var": "output" } } on the instance is correct for this layer.

Could optionally extend impl.md later to also document the definition-side outputDefinition for completeness, but that's redundant with hitl-node-quickform.md (already cross-referenced) and out of scope for this PR.


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant