fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result) by tmatup · Pull Request #544 · UiPath/skills

tmatup · 2026-05-04T17:44:46Z

Summary

Follow-up to #529 — that PR migrated uipath-human-in-the-loop to v1.0 (output, not result) for HITL value access, but left the maestro-flow plugin docs and 3 maestro-flow HITL test criteria on the stale .result pattern. The two doc surfaces contradicted each other, so agents flip-flopped between .result and .output depending on which doc dominated their context window.

This PR realigns the maestro-flow side to v1.0:

Docs: skills/uipath-maestro-flow/.../hitl/{impl,planning}.md — outputs JSON, "Output Variables" section, planner annotation now use .output (matching fix(hitl): align HITL skill and tests to v1.0 node schema #529's hitl-node-quickform.md). Added v1.0 transition notes.
Tests: 4 maestro-flow HITL test YAMLs — criterion includes + prompt examples .result→.output. quality_01 placeholder also disambiguates field-value access vs outcome-status access (the latter drove a residual stochastic miss during verification).

Tickets resolved

MST-9345 (HITL value-access drift in quality_01_schema_design) — primary subject of this PR.
MST-9301 (quality_02_result_downstream "max_turns shortfall") — turned out to be the same stale-criterion drift, not a budget issue. Resolves without any budget bump.
Partially helps MST-9346 (smoke_03_multi_outcome_routing turn_timeout + core.control.end validator). The criterion-side stale .result is fixed here; the validator inconsistency remains for a separate follow-up.

Root cause (deep dive)

In coder_eval run 2026-05-04_04-05-27, the 6 maestro-flow HITL tasks split roughly 50/50 between agents emitting .result (matching the stale maestro-flow hitl/impl.md) and .output (matching the canonical v1.0 hitl-node-quickform.md). Same agent, same docs, same flow shape — different outcome depending on which doc the agent attended to.

task_id	HITL access pattern	Pre-fix result
`skill-flow-hitl-quality-boolean-decision`	`$vars.complianceReview1.result.approved`	PASS 1.0 (stale criterion)
`skill-flow-hitl-quality-schema-design`	`$vars.poReview1.output.decision`	FAIL 0.93 (criterion expected `.result`)
`skill-flow-hitl-quality-result-downstream`	`$vars.expenseReview1.output.approved`	FAIL 0.80 (criterion expected `.result`)

The criterion was the wrong artifact, not the agent.

Test plan

coder-eval run quality_01_schema_design.yaml --repeats 5 --experiment tests/experiments/e2e.yaml → 5/5 SUCCESS, all emitted $vars.poReview1.output.decision.
coder-eval run quality_02_result_downstream.yaml --repeats 2 → 2/2 SUCCESS at 1.0 (was MST-9301, resolves with no budget bump).
coder-eval run quality_03_boolean_decision.yaml --repeats 2 → 2/2 SUCCESS (regression check — passed before, still passes after).
coder-eval run smoke_03_multi_outcome_routing.yaml --repeats 2 → 2/2 SUCCESS.
Total: 11/11 SUCCESS post-fix on the affected surface.
CI smoke-skills + smoke-rpa-skills checks pass on this branch.

🤖 Generated with Claude Code

…ut not .result) Follow-up to #529. That PR migrated `uipath-human-in-the-loop` to v1.0 (`output`, not `result`) for HITL value access, but left the maestro-flow plugin docs and 3 maestro-flow HITL test criteria on the stale `.result` pattern. As a result, agents that followed the canonical v1.0 schema failed maestro-flow tests; agents that followed the stale maestro-flow doc passed those tests but emitted non-v1.0 flows. The two doc surfaces contradicted each other. Root-causes from coder_eval run 2026-05-04_04-05-27: MST-9345 (skill-flow-hitl-quality-schema-design, FAIL 0.93): agent emitted `$vars.poReview1.output.decision` per v1.0 canonical; criterion enforced `.result` per stale maestro-flow doc. Comparison across the 6 maestro-flow HITL tasks in the same run showed `.output` vs `.result` was effectively 50/50 stochastic depending on which doc the agent attended to. Cross-impact MST-9301 (skill-flow-hitl-quality-result-downstream, FAIL 0.80): same drift. Both tests had stale criteria; this PR resolves both without any budget bumps. Changes: - impl.md: JSON `outputs` block now uses `var: "output"` and `source: "=result.Action"` for status, matching #529's quickform.md. "Output variables" section uses `.output.<field>`. Note added explaining the v1.0 vs pre-v1.0 distinction. - planning.md: same — Output Variables and planner annotation align with v1.0. - quality_01_schema_design.yaml: criterion `.result`→`.output`; placeholder disambiguates field-value access from outcome-status access (the latter drove a residual stochastic miss). - quality_02_result_downstream.yaml: criterion + prompt example `.result`→`.output`. - quality_03_boolean_decision.yaml: criterion + prompt examples `.result.approved`→`.output.approved`. - smoke_03_multi_outcome_routing.yaml: criterion + prompt example `.result`→`.output`. Verified with the e2e experiment (`SKILLS_REPO_PATH` pointing at this worktree, plugin loaded via `tests/experiments/e2e.yaml`): - quality_01_schema_design: 5/5 SUCCESS post-disambiguation, all emitted `$vars.poReview1.output.decision` (turns 38–77). - quality_02_result_downstream: 2/2 SUCCESS at 1.0. - quality_03_boolean_decision: 2/2 SUCCESS at 1.0 (regression check — passed before, still passes after). - smoke_03_multi_outcome_routing: 2/2 SUCCESS at 1.0. Total: 11/11 SUCCESS post-fix on the affected surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-04T17:45:06Z

Claude finished @tmatup's task in 2m 35s —— View job

Review: PR #544 — fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema

Read review criteria and project conventions
Analyze full diff (9500990...5af4b9d)
Read existing PR conversation
Read each changed file in full for context
Apply skill-specific and content quality checks
Post review

Summary

Replaces stale .result access pattern with .output across 2 maestro-flow HITL reference docs and 4 test task YAMLs, aligning with the v1.0 schema established by PR #529 in uipath-human-in-the-loop. Also corrects the outputs JSON example in impl.md to match the canonical outputDefinition (fixing both var and source values).

Change-by-Change Review

1. `skills/uipath-maestro-flow/references/author/references/plugins/hitl/impl.md`

Severity: OK

Three changes:

Outputs JSON (lines 57–58): "var": "result" → "var": "output" and "source": "=status" → "source": "=result.Action". Both now match the canonical outputDefinition in hitl-node-quickform.md:194-195. Correct.
Output variables section (lines 67–70): .result → .output, status description updated to outcome action semantics ("Continue" / "End" instead of "completed"). Correct and consistent.
v1.0 transition note (line 72): New blockquote with cross-skill reference. Follows the established pattern (existing cross-skill refs at lines 12, 14 already have explicit self-containment disclaimers). Informative without duplicating content from the linked doc.

2. `skills/uipath-maestro-flow/references/author/references/plugins/hitl/planning.md`

Severity: OK

Three changes:

Output variables section (lines 44–46): .result → .output, matching impl.md. Correct.
v1.0 transition note (line 48): New blockquote paralleling the one in impl.md. Includes practical guidance for downstream consumers. Good addition.
Planning annotation (line 100): result, status → output, status in the node table example. Correct.

3. `tests/tasks/uipath-maestro-flow/hitl/quality_01_schema_design.yaml`

Severity: OK

result_variable placeholder hint updated to clarify "must reference an output field value, not the outcome status" (line 45). Good disambiguation — directly addresses the root cause described in the PR.
file_contains criterion (lines 92–96): .result → .output. Correct.

4. `tests/tasks/uipath-maestro-flow/hitl/quality_02_result_downstream.yaml`

Severity: Low

Description, initial prompt example, and all 3 criteria correctly updated .result → .output.
Residual inconsistency at line 27: initial prompt still reads "reads the reviewer's decision from the HITL result" — the word "result" here is used as natural language (not as an access pattern), but it's inconsistent with the rest of the file which now exclusively uses "output" terminology. This is the one place in the four test files where the agent might pick up a stale signal. Line 29 already says "reference the HITL output", so "result" at line 27 is vestigial.

Suggested fix: Change "from the HITL result" → "from the HITL output" at line 27.

Fix this →

5. `tests/tasks/uipath-maestro-flow/hitl/quality_03_boolean_decision.yaml`

Severity: OK

All 7 occurrences updated consistently: description, initial prompt (3 places), and both file_contains criteria. Clean.

6. `tests/tasks/uipath-maestro-flow/hitl/smoke_03_multi_outcome_routing.yaml`

Severity: OK

All 6 occurrences updated consistently: initial prompt (3 places), example in report JSON, and file_contains criterion description + includes. Clean.

What's Missing

Line 27 of quality_02_result_downstream.yaml: One remaining natural-language "HITL result" in the initial prompt that should say "HITL output" for full consistency (Low — unlikely to affect agent behavior given line 29 already says "HITL output", but worth cleaning up).
Nothing else identified. CODEOWNERS doesn't need updating (no new skill paths). No secrets or personal paths in any changed file.

Area Ratings

Area	Status	Notes
Frontmatter	N/A	No frontmatter changes
E2E Tests	OK	All 4 test criteria correctly aligned to v1.0 `.output` pattern
Skill Body	N/A	SKILL.md not modified
References & Assets	OK	Both `impl.md` and `planning.md` now consistent with canonical `hitl-node-quickform.md`
Repo Hygiene	OK	Changes scoped to the affected surface; no secrets, no unrelated files

Issues for Manual Review

The outputs JSON in impl.md (lines 56–59) omits some fields present in the full canonical example (hitl-node-quickform.md full node JSON at lines 77–133 doesn't include outputs at node level — it's in the outputDefinition). The structure shown is the node instance outputs, which is a correct simplified representation for the maestro-flow context. Just confirming this is intentional and matches runtime behavior.

Conclusion

Clean, well-scoped fix. The .result → .output migration is consistent across all 6 files, and the outputs JSON now matches the canonical v1.0 schema. The only nit is one leftover natural-language "HITL result" at quality_02:27 — Low severity, but easy to clean up. Approve with optional nit.

…y_02 prompt PR #544 bot review nit: quality_02_result_downstream.yaml line 27 still said "reads the reviewer's decision from the HITL result" while line 29 already says "the HITL output". Replace for full consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tmatup · 2026-05-04T17:54:10Z

Review Response

Thanks for the careful pass. Both items addressed below.

1. quality_02_result_downstream.yaml:27 — vestigial "HITL result" natural-language

Fixed in a19643b2. Changed "reads the reviewer's decision from the HITL result" → "reads the reviewer's decision from the HITL output" for consistency with line 29 and the rest of the file. Confirmed via grep -nE 'HITL result|the result\b' across all 4 test YAMLs — clean.

2. Manual review item — impl.md:56–59 outputs JSON differs from canonical outputDefinition

Keeping current implementation. The structure shown is the node-instance outputs block (what goes into workflow.nodes[]), which is intentionally distinct from the type definition outputDefinition (what goes into workflow.definitions[]). They are different JSON locations with different field names by design.

Verified against a real passing v1.0 flow — runs/2026-05-04_04-05-27/.../skill-flow-hitl-quality-boolean-decision/00/artifacts/.../VendorApproval.flow shows the HITL node instance carries an outputs block alongside inputs, while the definition entry in the same file carries outputDefinition. Both coexist; the runtime resolves variables from the instance outputs.<port>.var when present. So the maestro-flow impl.md showing "outputs": { "output": { ..., "var": "output" } } on the instance is correct for this layer.

Could optionally extend impl.md later to also document the definition-side outputDefinition for completeness, but that's redundant with hitl-node-quickform.md (already cross-referenced) and out of scope for this PR.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

tmatup requested review from akshaylive, bai-uipath, baishalighosh, gozhang2, jiyangzh, nikhil-maryala and rockymadden as code owners May 4, 2026 17:44

tmatup closed this May 4, 2026

tmatup mentioned this pull request May 4, 2026

fix(hitl): replace vestigial 'HITL result' natural-language in quality_02 prompt #547

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544

fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema (.output not .result)#544
tmatup wants to merge 2 commits intomainfrom
tmatup/fix-mst-9345-hitl-result-access-pattern

tmatup commented May 4, 2026 •

edited by atlassian Bot

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

tmatup commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmatup commented May 4, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tickets resolved

Root cause (deep dive)

Test plan

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #544 — fix(hitl): align maestro-flow HITL docs + tests to v1.0 schema

Summary

Change-by-Change Review

1. skills/uipath-maestro-flow/references/author/references/plugins/hitl/impl.md

2. skills/uipath-maestro-flow/references/author/references/plugins/hitl/planning.md

3. tests/tasks/uipath-maestro-flow/hitl/quality_01_schema_design.yaml

4. tests/tasks/uipath-maestro-flow/hitl/quality_02_result_downstream.yaml

5. tests/tasks/uipath-maestro-flow/hitl/quality_03_boolean_decision.yaml

6. tests/tasks/uipath-maestro-flow/hitl/smoke_03_multi_outcome_routing.yaml

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

tmatup commented May 4, 2026

Review Response

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmatup commented May 4, 2026 •

edited by atlassian Bot

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

1. `skills/uipath-maestro-flow/references/author/references/plugins/hitl/impl.md`

2. `skills/uipath-maestro-flow/references/author/references/plugins/hitl/planning.md`

3. `tests/tasks/uipath-maestro-flow/hitl/quality_01_schema_design.yaml`

4. `tests/tasks/uipath-maestro-flow/hitl/quality_02_result_downstream.yaml`

5. `tests/tasks/uipath-maestro-flow/hitl/quality_03_boolean_decision.yaml`

6. `tests/tasks/uipath-maestro-flow/hitl/smoke_03_multi_outcome_routing.yaml`