Skip to content

Trim and re-trigger azure-policy-advisor SKILL.md — add DO NOT USE FOR + cut tokens #158

@suuus

Description

@suuus

Follow-up from PR #157 (eval suite landing) and a fresh waza analysis run on 2026-06-04.

Why now

PR #157 shipped the first eval suite for azure-policy-advisor. Two independent signals converged on the same finding: the SKILL.md is verbose AND under-bounded on routing.

Signal 1 — Empirical (smoke trial in PR #157)

The negative-naming-question task originally failed with a trigger score of 0.57 (just over the 0.50 threshold). The naming prompt mentioned resource, compliance, and governance — overlapping vocabulary with this skill's description. Worked around in PR #157 by rewriting the negative prompt to pure naming-string trivia, but the underlying ambiguity is in the SKILL.md, not the eval.

Signal 2 — LLM judge (waza quality)

Dimension Score
Completeness 5 / 5
Clarity 4 / 5
Trigger precision 3 / 5 ← lowest score
Scope coverage 4 / 5
Anti-patterns 4 / 5
Overall 4.0 / 5

Judge verdict on trigger precision:

The 'When to Use' section lists four reasonable triggers, but there is no 'DO NOT USE FOR' section — omitting exclusions makes it hard to distinguish this skill from the security-analyzer or RBAC skills, and risks mis-routing requests like 'check if my template is secure' that belong elsewhere.

Signal 3 — Token budget (waza check / waza tokens)

  • SKILL.md is 6,233 tokens vs the 500-token soft target / 1,300-token hard fallback in .waza.yaml.
  • waza tokens suggest identifies ~3,582 reclaimable tokens, dominated by:
    • Line 508: 108-line JSON output schema block (~1,568 tokens)
    • Line 303: 97-line Part 2 markdown report template (~1,392 tokens)
    • 71 decorative emojis (~142 tokens)
  • 637-line body exceeds the 500-line progressive-disclosure heuristic; 2 code blocks exceed 50 lines.

Scope

Land via /skill-improve azure-policy-advisor (per umbrella #93 contributor loop).

Concrete edits

  1. Add a ## When NOT to Use (or DO NOT USE FOR:) section listing adjacent skills and what they own. Suggested phrasing:
    • For per-resource security configuration assessment → use azure-security-analyzer
    • For RBAC role recommendations → use azure-role-selector
    • For CAF naming abbreviations / name-string constraints → use azure-naming-research
    • For pricing / cost estimation → use azure-cost-estimator
    • For generating new ARM templates → use azure-template-generator
  2. Extract giant code blocks to references/ (waza's progressive-disclosure pattern):
    • references/policy-recommendations-schema.json ← the 108-line JSON schema currently at line 508
    • references/policy-assessment-template.md ← the 97-line Part 2 markdown template currently at line 303
    • SKILL.md references them with a short link + the smallest illustrative excerpt
  3. Clarify scope on existing resourcesargument-hint says "ARM template JSON or resource types to assess", but the procedure focuses entirely on template parsing. Add a short note on whether/how the skill assesses live deployed resources (or split into a separate skill if not).
  4. Add discovery guidance for placeholders{subscription-id}, {mg-name} appear in CLI snippets without instruction on how the agent should obtain them when not provided by the user.
  5. Trim decorative emojis — 73 emojis in the file (~142 tokens). Keep the severity tier icons (🔴🟠🟡🔵) and status markers (✅⚠️) which carry semantic weight; drop the rest.

Acceptance

  • waza check .github/skills/azure-policy-advisor compliance score moves from Low → Medium or higher.
  • Token count under the 1,300 hard fallback (preferably under 3,000 as a realistic intermediate target — full 500 target is unrealistic for this skill's procedural depth).
  • waza quality trigger_precision dimension improves from 3/5 → 4/5 or 5/5.
  • Re-run the PR feat(skill+evals): land azure-policy-advisor with eval suite + trim + spec compliance (#108, #158) #157 eval suite: the original negative-naming-question wording (with 'resource' / 'compliance' / 'governance') should now score < 0.50 on trigger heuristic.
  • PR description includes before/after waza scores.

Out of scope

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions