Skip to content

skills(activation): close coverage gaps + tighten descriptions#673

Merged
gabrielavaduva merged 2 commits into
mainfrom
bai/activation-coverage-gaps
May 11, 2026
Merged

skills(activation): close coverage gaps + tighten descriptions#673
gabrielavaduva merged 2 commits into
mainfrom
bai/activation-coverage-gaps

Conversation

@bai-uipath

@bai-uipath bai-uipath commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Tightens skill activation by rewriting all 20 skill description fields with the "Always invoke for X" anchor pattern (unique file extensions and CLI tokens), merges the two stale governance skills (gov-access-policy + gov-aops-policy) into one curated uipath-governance test set, and adds activation test coverage for two previously untested skills (uipath-maestro-bpmn, uipath-llm-configuration-byo-connections).

Mean per-skill recall.yes moved from 0.483 → 0.683 (+0.20 absolute) and negative TN rate from 96% → 100%, with no precision cost (FP rate held at 0.14%).

bai-uipath and others added 2 commits May 8, 2026 14:20
- Add uipath-maestro-bpmn.jsonl (50) and uipath-llm-configuration-byo-connections.jsonl (45)
  for two skills that landed after the activation eval (#596) without coverage.
- Replace uipath-gov-access-policy.jsonl + uipath-gov-aops-policy.jsonl with a single
  uipath-governance.jsonl (50 curated rows) — the two skills were merged into a unified
  uipath-governance skill in #601, so the split criteria could never produce true positives.
- Update activation.yaml dataset.paths and success_criteria (now 20 skills, was 19) and
  refresh README counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix stale `uipath-gov-aops-policy` redirect (skill merged to uipath-governance in #601):
  llm-configuration-byo-connections.
- Rewrite uipath-diagnostics description: lead with brand, quote, narrow scope,
  add → redirects to maestro-flow / maestro-bpmn / rpa / platform (per repo rule).
- Drop forbidden `[PREVIEW]` prefix from uipath-review (rule: preview status
  belongs in body, not description); add Preview callout under H1.
- Add "Always invoke for X" anchors (gold-standard pattern from uipath-maestro-flow)
  to skills with deterministic file/CLI signals: rpa (.xaml/.cs), maestro-bpmn
  (.bpmn + 5 package files), coded-apps (app.config.json/action-schema.json),
  maestro-case (caseplan.json), agents (agent.json), solution-design
  (sdd.md/pdd.md), rpa-legacy (project.json targetFramework=Legacy).
- Add cross-redirects between sibling collisions:
  human-in-the-loop ↔ tasks (authoring vs operating);
  maestro-case adds .bpmn redirect; platform adds workflow-code redirects to
  rpa / maestro-flow / maestro-bpmn / agents.
- Tighten uipath-planner trigger ("Use only for ambiguous or multi-product…")
  and uipath-rpa-legacy trigger (drop "or missing" — too broad).
- Add → uipath-diagnostics redirect from uipath-feedback to disambiguate
  "something is wrong" between report-bug and investigate-error.

All 20 descriptions pass the 1024-char validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

⚠️ Smoke test coverage gap — the following changed skills have no tests under tests/tasks/:

  • uipath-coded-apps
  • uipath-feedback
  • uipath-llm-configuration-byo-connections
  • uipath-planner

Consider adding smoke tests before merging.

@bai-uipath bai-uipath changed the title tests(activation): close skill coverage gaps skills(activation): close coverage gaps + tighten descriptions May 8, 2026
@bai-uipath bai-uipath marked this pull request as ready for review May 8, 2026 22:58
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @bai-uipath's task in 3m 51s —— View job


PR Review: skills(activation) — close coverage gaps + tighten descriptions

  • Gather context (read review criteria, diff, PR comments)
  • Review SKILL.md description changes (15 files)
  • Review activation test changes (new/deleted/modified test files)
  • Review activation.yaml config changes
  • Apply skill-specific checks (links, cross-refs, secrets, CODEOWNERS)
  • Post final review

Summary

Rewrites 15 skill description fields to use the "Always invoke for X" anchor pattern (deterministic file/CLI signals), merges two stale governance activation test files into one uipath-governance.jsonl, adds activation test coverage for uipath-maestro-bpmn and uipath-llm-configuration-byo-connections, and moves the [PREVIEW] tag from uipath-review's description into the SKILL.md body as a blockquote callout.

Change-by-Change Review

1. uipath-maestro-bpmn/SKILL.md — "Always invoke" anchor on shared filenames

Severity: Medium

The description claims "Always invoke for .bpmn, project.uiproj, entry-points.json, operate.json, bindings_v2.json, or package-descriptor.json files." However, only .bpmn is unique to BPMN projects. The other five files (project.uiproj, entry-points.json, operate.json, bindings_v2.json, package-descriptor.json) are shared across Flow, Case, and Agent project types — confirmed by references in uipath-maestro-flow, uipath-maestro-case, uipath-agents, uipath-coded-apps, uipath-data-fabric, and uipath-review SKILL.md files.

Compare with uipath-maestro-flow which correctly anchors only on .flow (its unique signal). This PR tightens the wording from "Activates on" to the stronger "Always invoke for" while keeping the same ambiguous file list, which could increase false-positive activations.

Recommendation: Narrow the "Always invoke" anchor to .bpmn only (the unique signal), and keep the other shared filenames as contextual mentions without the "Always invoke" imperative. E.g.: "Always invoke for .bpmn files. UiPath Maestro BPMN / Process Orchestration (project.uiproj, entry-points.json, operate.json, bindings_v2.json, package-descriptor.json) — ..." Fix this →

2. SKILL.md description changes — 14 other skills

Severity: OK

All changes are well-formed:

  • uipath-agents: Adds agent.json anchor — unique to agent projects. Clean.
  • uipath-coded-apps: Adds app.config.json / action-schema.json anchor — unique. Clean.
  • uipath-diagnostics: Rewrites to lead with brand identity, adds 4 cross-redirects (→uipath-maestro-flow, →uipath-maestro-bpmn, →uipath-rpa, →uipath-platform). Clean.
  • uipath-feedback: Adds →uipath-diagnostics redirect to disambiguate "something is wrong". Clean.
  • uipath-human-in-the-loop: Adds →uipath-tasks redirect for authoring vs operating distinction. Clean.
  • uipath-llm-configuration-byo-connections: Fixes stale →uipath-gov-aops-policy redirect to →uipath-governance. Clean.
  • uipath-maestro-case: Adds caseplan.json anchor, adds →uipath-maestro-bpmn redirect. Clean.
  • uipath-planner: Tightens trigger to "ambiguous or multi-product" only, adds .bpmn. Clean.
  • uipath-platform: Expands redirects with explicit file-extension mapping. Clean.
  • uipath-review: Drops [PREVIEW] prefix (per repo rule), adds .bpmn to audit scope. Clean.
  • uipath-rpa: Adds .xaml / .cs anchor. Clean.
  • uipath-rpa-legacy: Adds project.json targetFramework: Legacy anchor. Clean.
  • uipath-solution-design: Adds sdd.md / pdd.md anchor, adds →uipath-maestro-case. Clean.
  • uipath-tasks: Adds →uipath-human-in-the-loop cross-redirect. Clean.

3. uipath-review/SKILL.md — Preview callout moved to body

Severity: OK

[PREVIEW] removed from description and replaced with > **Preview** — skill is under active development; review surface and report format may change. in the markdown body at line 10. Follows the repo rule that preview status belongs in the body, not frontmatter description. Clean.

4. tests/tasks/activation/uipath-governance.jsonl (new, replaces 2 deleted files)

Severity: OK

50 curated rows merging the former uipath-gov-access-policy.jsonl (48 rows) and uipath-gov-aops-policy.jsonl (48 rows) into a single file matching the merged uipath-governance skill. All rows have unique sequential IDs (uipath-governance-001 through -050), correct expected_skill: "uipath-governance", valid JSON. Good coverage of both AOps policy and access policy domains.

5. tests/tasks/activation/uipath-llm-configuration-byo-connections.jsonl (new)

Severity: OK

45 rows covering CRUD operations, product config discovery, provider-specific wiring (OpenAI, Azure OpenAI, Bedrock, Anthropic, Vertex, Mistral), troubleshooting, multi-model mapping. IDs are unique (-001 through -045), expected_skill is correct. Good prompt diversity.

6. tests/tasks/activation/uipath-maestro-bpmn.jsonl (new)

Severity: OK

50 rows covering the full BPMN lifecycle (create, author, edit, validate, package, publish, operate, diagnose). Includes BPMN-specific constructs (gateways, boundary events, subprocesses, signal events, multi-instance markers). IDs unique, expected_skill correct.

7. tests/tasks/activation/activation.yaml

Severity: OK

  • dataset.paths: updated to 20 skill files + negative.jsonl (21 total). Old uipath-gov-access-policy.jsonl and uipath-gov-aops-policy.jsonl removed, replaced with uipath-governance.jsonl. New uipath-llm-configuration-byo-connections.jsonl and uipath-maestro-bpmn.jsonl added. Alphabetical ordering preserved.
  • success_criteria: 20 skill_triggered entries — one per skill. Two old governance criteria consolidated into one. Two new criteria added. Comment updated from "19" to "20".
  • All suite_thresholds: {recall.yes: 0.70} — consistent.

8. tests/tasks/activation/README.md

Severity: OK

Count references updated from 19 → 20. Removed hardcoded "19→20th" in instructions, replaced with generic "append a new" phrasing. Clean.

9. Deleted: uipath-gov-access-policy.jsonl + uipath-gov-aops-policy.jsonl

Severity: OK

Both old files correctly removed. The underlying skills were merged into uipath-governance in #601, so split test files could never produce true positives. Replacement uipath-governance.jsonl covers both domains.

What's Missing

  • Smoke/e2e tests for changed skills (pre-existing gap): The CI bot flagged that uipath-coded-apps, uipath-feedback, uipath-llm-configuration-byo-connections, and uipath-planner have no tests under tests/tasks/. These directories don't exist. This is a pre-existing gap — not introduced by this PR — and is expected for a description-only change. No action needed in this PR, but worth tracking as a follow-up.
  • 5 skills have changed descriptions but no activation test coverage: uipath-diagnostics, uipath-human-in-the-loop, uipath-platform, uipath-solution-design, and uipath-tasks had their descriptions changed but already have existing activation test files, so this is not a concern.

Area Ratings

Area Status Notes
Frontmatter OK All 15 SKILL.md files have valid YAML, name matches folder, descriptions well under 1024 chars
E2E Tests N/A This PR adds activation-level (classification) tests, not behavior tests. Pre-existing smoke gaps are noted above
Skill Body OK Only uipath-review body changed (Preview callout added) — correct
References & Assets N/A No reference or asset files changed
Repo Hygiene OK No secrets, no cross-skill references, CODEOWNERS already covers uipath-governance. Scoped changes

Issues for Manual Review

  • Activation false-positive risk from shared filenames in uipath-maestro-bpmn: The "Always invoke for project.uiproj, entry-points.json, operate.json, bindings_v2.json, package-descriptor.json" anchor claims 5 non-unique filenames. Whether this causes real false activations in practice depends on the skill matching implementation — worth monitoring in the activation benchmark results.
  • uipath-planner narrowing: The description was tightened from "non-trivial or ambiguous" to "ambiguous or multi-product … where the project type is unclear." Verify this doesn't under-activate for legitimate multi-step requests where the project type IS known but the user needs planning help.

Conclusion

Clean, well-structured PR. The governance test consolidation, two new activation test sets, and description tightening are all correct and internally consistent. One Medium concern: the uipath-maestro-bpmn description's "Always invoke" anchor claims 5 shared filenames alongside .bpmn — recommend narrowing the imperative to .bpmn only to avoid false activations, consistent with how uipath-maestro-flow anchors only on .flow. Everything else looks good to merge.


@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @bai-uipath's task in 1m 28s —— View job


Coder-eval task lint (advisory)

1 task YAML changed; verdict counts: 0 Critical, 0 High, 0 Medium, 0 Low, 1 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

✅ Author confirms passing run: "Mean per-skill recall.yes moved from 0.483 → 0.683 (+0.20 absolute) and negative TN rate from 96% → 100%, with no precision cost (FP rate held at 0.14%)."

Per-task lint

tests/tasks/activation/activation.yaml — verdict: OK

This is a classification eval using skill_triggered criteria with dataset-driven prompts (${row.prompt}). The six rubric axes (self-report, over-specification, meaningful coverage, wrong-reason, near-duplicate, validate-only) are all clean:

  • No self-report files — criteria check actual skill invocation, not agent-written summaries
  • No prompt over-specification — initial_prompt is a template variable from JSONL rows
  • Meaningful coverage — skill_triggered with suite_thresholds: {recall.yes: 0.70} on each of 20 skills validates real activation behavior
  • Not gameable — skill_triggered measures system-level skill invocation; can't be faked by writing files
  • No siblings to duplicate against (only YAML in activation/)
  • Not a flow task — validate-only axis does not apply

Changes in this PR are well-scoped: merged two stale governance datasets into one, added two new skill datasets (byo-connections, maestro-bpmn), and updated criteria to match (19 → 20 skills).

Within-PR duplicates

Not applicable — only one task YAML was changed.

Conclusion

✅ All changed tasks pass the rubric. Evidence of passing run confirmed.


@uipreliga uipreliga left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@gabrielavaduva gabrielavaduva merged commit ef1b409 into main May 11, 2026
12 of 13 checks passed
@gabrielavaduva gabrielavaduva deleted the bai/activation-coverage-gaps branch May 11, 2026 14:51
charlesliu9 pushed a commit that referenced this pull request May 19, 2026
* tests(activation): close skill coverage gaps

- Add uipath-maestro-bpmn.jsonl (50) and uipath-llm-configuration-byo-connections.jsonl (45)
  for two skills that landed after the activation eval (#596) without coverage.
- Replace uipath-gov-access-policy.jsonl + uipath-gov-aops-policy.jsonl with a single
  uipath-governance.jsonl (50 curated rows) — the two skills were merged into a unified
  uipath-governance skill in #601, so the split criteria could never produce true positives.
- Update activation.yaml dataset.paths and success_criteria (now 20 skills, was 19) and
  refresh README counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(skills): tighten activation descriptions

- Fix stale `uipath-gov-aops-policy` redirect (skill merged to uipath-governance in #601):
  llm-configuration-byo-connections.
- Rewrite uipath-diagnostics description: lead with brand, quote, narrow scope,
  add → redirects to maestro-flow / maestro-bpmn / rpa / platform (per repo rule).
- Drop forbidden `[PREVIEW]` prefix from uipath-review (rule: preview status
  belongs in body, not description); add Preview callout under H1.
- Add "Always invoke for X" anchors (gold-standard pattern from uipath-maestro-flow)
  to skills with deterministic file/CLI signals: rpa (.xaml/.cs), maestro-bpmn
  (.bpmn + 5 package files), coded-apps (app.config.json/action-schema.json),
  maestro-case (caseplan.json), agents (agent.json), solution-design
  (sdd.md/pdd.md), rpa-legacy (project.json targetFramework=Legacy).
- Add cross-redirects between sibling collisions:
  human-in-the-loop ↔ tasks (authoring vs operating);
  maestro-case adds .bpmn redirect; platform adds workflow-code redirects to
  rpa / maestro-flow / maestro-bpmn / agents.
- Tighten uipath-planner trigger ("Use only for ambiguous or multi-product…")
  and uipath-rpa-legacy trigger (drop "or missing" — too broad).
- Add → uipath-diagnostics redirect from uipath-feedback to disambiguate
  "something is wrong" between report-bug and investigate-error.

All 20 descriptions pass the 1024-char validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants