Skip to content

Add validate-audit skill: parallel post-audit finding validation#528

Merged
Trecek merged 6 commits intointegrationfrom
feat-validate-audit-skill-post-audit-finding-validation-with/429
Mar 27, 2026
Merged

Add validate-audit skill: parallel post-audit finding validation#528
Trecek merged 6 commits intointegrationfrom
feat-validate-audit-skill-post-audit-finding-validation-with/429

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Mar 27, 2026

Summary

Implements the validate-audit skill — a parallel subagent audit validator that examines findings from audit-arch, audit-tests, and audit-cohesion against actual code, git history, and design intent. Dispatches 9–10 agents simultaneously (8–9 code validators + 1 history researcher) and produces a validated report with a validated: true marker plus a separate contested findings file. This PR covers both the test infrastructure (15 T-VAL contract tests) and the SKILL.md implementation that makes them all pass.

Individual Group Plans

Group 1: validate-audit Skill — Post-Audit Finding Validation with Parallel Subagents — PART A ONLY

Create the validate-audit skill that validates audit findings from audit-arch, audit-tests,
and audit-cohesion against actual code, git history, and design intent. The skill dispatches
9–10 parallel subagents in a single message — 8–9 code validation agents (one per thematic
finding batch, grouped by code area) and 1 history research agent. Verdicts are VALID,
VALID BUT EXCEPTION WARRANTED, or CONTESTED. Output: a validated report with validated: true
header marker (CONTESTED findings removed, exceptions documented, severity adjustments noted)
plus a separate contested findings file. Interactive sessions prompt for approval; headless
auto-writes.

Part A covers: Failing test infrastructure (update tests/workspace/test_skills.py + create
tests/skills/test_validate_audit_contracts.py). After Part A, all new tests fail as expected
because the skill directory does not yet exist.

Group 2: validate-audit Skill — Post-Audit Finding Validation with Parallel Subagents — PART B ONLY

Part B covers the primary deliverable: creating
src/autoskillit/skills_extended/validate-audit/SKILL.md. After this part, all 15 T-VAL tests
from Part A pass.

Part A (separate task) covered: updating tests/workspace/test_skills.py and creating
tests/skills/test_validate_audit_contracts.py. Do NOT re-do those changes here; assume they
are already in place and the 15 tests are failing, awaiting this SKILL.md.

Requirements

PARSE — Report Parsing

  • REQ-PARSE-001: The skill must accept a file path argument pointing to an audit report markdown file.
  • REQ-PARSE-002: The parser must extract finding IDs, severity levels, referenced code locations (file:line), and principle categories from the report.
  • REQ-PARSE-003: The parser must handle output formats from audit-arch, audit-tests, and audit-cohesion skills.

AGENT — Subagent Orchestration

  • REQ-AGENT-001: The skill must partition findings into 9–10 thematic batches grouped by principle, severity cluster, or code area.
  • REQ-AGENT-002: The skill must launch one code-validation subagent per batch, each receiving the full finding text and referenced locations for its batch.
  • REQ-AGENT-003: The skill must launch one dedicated git/PR history research subagent that cross-references all findings against commit messages, PR descriptions, and open issues.
  • REQ-AGENT-004: All subagents must be launched in parallel (single message with multiple Agent tool calls).
  • REQ-AGENT-005: Each code-validation subagent must read actual source code, run git log/git blame, check gh pr list --state=merged, and verify caller dependencies.
  • REQ-AGENT-006: Each subagent must return a per-finding verdict (VALID, VALID BUT EXCEPTION WARRANTED, or CONTESTED) with supporting evidence.

OUTPUT — Report Generation

  • REQ-OUTPUT-001: The skill must produce a validated actionable report that removes contested findings and expands exception-warranted findings with constraint blockquotes.
  • REQ-OUTPUT-002: The skill must produce a separate contested findings file containing original finding text and the reason each was contested.
  • REQ-OUTPUT-003: The validated report must include an updated executive summary with corrected counts, a validation status header, and severity downgrade annotations.
  • REQ-OUTPUT-005: The validated report must include a validated: true marker in the Validation Status section header. This marker signals to downstream tools and recipes that the report has been through the validation process. Unvalidated reports must NOT have this marker.
  • REQ-OUTPUT-004: Both output files must be written to the project's temp/ directory adjacent to the input report.

VERDICT — Classification Criteria

  • REQ-VERDICT-001: A finding is VALID when the referenced code confirms the violation and no git history evidence justifies the current state.
  • REQ-VERDICT-002: A finding is VALID BUT EXCEPTION WARRANTED when the violation is real but fixing it would conflict with documented constraints (stdlib-only, separate entry points, intentional design decisions).
  • REQ-VERDICT-003: A finding is CONTESTED when the finding is factually wrong, scope is overstated, the current code is the correct design, or fixing would be counterproductive.
  • REQ-VERDICT-004: Severity downgrades must be proposed when evidence shows the original severity overstates the actual impact.

POST — Post-Validation Actions

  • REQ-POST-001: After synthesis, the skill must update the original audit report in-place: remove contested findings, expand exception-warranted findings with constraint blockquotes, apply severity downgrades, and update executive summary counts.
  • REQ-POST-002: The skill must write contested findings to a separate file (e.g., contested_findings_<date>.md) adjacent to the original report, preserving original finding text alongside the reason each was contested.
  • REQ-POST-003: If the session is interactive (not headless), the skill must prompt the user: "Update the original audit report with validation results? [Y/n]" before modifying the report. If headless, update automatically.
  • REQ-POST-004: After report updates, if the session is interactive (not headless), the skill must prompt the user: "File the validated audit report as a GitHub issue via prepare-issue? [Y/n]". If the user confirms, invoke /prepare-issue with the validated report as the issue description, routing it as recipe:implementation.
  • REQ-POST-005: In headless sessions, skip both prompts and perform the report update automatically. Do NOT invoke prepare-issue in headless mode.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([START: audit_report_path])
    END([END: output files written])

    subgraph Sequential ["★ MAIN THREAD — Sequential"]
        direction TB
        DETECT{"★ Detect Format<br/>━━━━━━━━━━<br/>audit-arch?<br/>audit-tests?<br/>audit-cohesion?"}
        PARSE["★ Parse Findings<br/>━━━━━━━━━━<br/>Extract ID, severity,<br/>file:line, category"]
        BATCH["★ Thematic Batcher<br/>━━━━━━━━━━<br/>Group by code area<br/>8–9 batches + history slot"]
        FORK["★ Single-Message Fork<br/>━━━━━━━━━━<br/>ALL Task calls<br/>in one message"]
        BARRIER["★ Barrier / Collect<br/>━━━━━━━━━━<br/>Wait for all agents<br/>Merge verdicts + context"]
        SYNTH["★ Synthesize<br/>━━━━━━━━━━<br/>Tally N_valid, N_exception,<br/>N_contested; upgrade findings<br/>if open PR/issue found"]
        GATE{"★ Interactive<br/>or Headless?<br/>━━━━━━━━━━<br/>AUTOSKILLIT_HEADLESS"}
        PROMPT["★ Approval Prompt<br/>━━━━━━━━━━<br/>Show verdict table<br/>Write? [Y/n]<br/>Offer prepare-issue"]
        WRITE["★ Write Output Files<br/>━━━━━━━━━━<br/>validated_report_{source}_{ts}.md<br/>contested_findings_{source}_{ts}.md"]
    end

    subgraph Workers ["★ PARALLEL SUBAGENT POOL — Single Message"]
        direction LR
        CV1["★ Code Validator 1<br/>━━━━━━━━━━<br/>Batch 1: pipeline/<br/>Read code + git log<br/>VALID / EXCEPTION / CONTESTED"]
        CV2["★ Code Validator 2<br/>━━━━━━━━━━<br/>Batch 2: execution/<br/>Read code + git log"]
        CVN["★ Code Validators 3–9<br/>━━━━━━━━━━<br/>One per code area<br/>8–9 total agents"]
        HR["★ History Researcher<br/>━━━━━━━━━━<br/>ALL findings<br/>git log 90d + gh issue list<br/>Surface in-progress fixes"]
    end

    START --> DETECT
    DETECT -->|"arch/tests/cohesion"| PARSE
    PARSE --> BATCH
    BATCH --> FORK
    FORK --> CV1 & CV2 & CVN & HR
    CV1 & CV2 & CVN & HR --> BARRIER
    BARRIER --> SYNTH
    SYNTH --> GATE
    GATE -->|"headless"| WRITE
    GATE -->|"interactive"| PROMPT
    PROMPT -->|"Y / confirmed"| WRITE
    WRITE --> END

    class START,END terminal;
    class DETECT,GATE stateNode;
    class PARSE,BATCH phase;
    class FORK,BARRIER detector;
    class CV1,CV2,CVN,HR,SYNTH,PROMPT,WRITE newComponent;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Start and end states
Teal State/Decision Format detection and mode gate
Purple Phase Parse and batch stages
Red Synchronization Single-message fork + barrier collect
Green (★) New Components All validate-audit nodes — new skill

Operational Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph Invocation ["INVOCATION PATHS"]
        direction TB
        COOK["autoskillit cook<br/>━━━━━━━━━━<br/>Interactive cook session<br/>All 61 skills via --plugin-dir"]
        HEADLESS["run_skill<br/>━━━━━━━━━━<br/>Automated headless session<br/>--add-dir skills_extended/"]
        CMD["★ /autoskillit:validate-audit<br/>━━━━━━━━━━<br/>[audit_report_path]<br/>categories: [audit]"]
    end

    subgraph Config ["CONFIGURATION"]
        direction TB
        HEADLESS_FLAG["AUTOSKILLIT_HEADLESS<br/>━━━━━━━━━━<br/>0 = interactive (default)<br/>1 = headless auto-write"]
        AUTO_DETECT["★ Auto-detect input<br/>━━━━━━━━━━<br/>Most recent mtime from:<br/>temp/audit-arch/<br/>temp/audit-tests/<br/>temp/audit-cohesion/"]
    end

    subgraph Discovery ["SKILL DISCOVERY"]
        direction TB
        RESOLVER["SkillResolver.list_all()<br/>━━━━━━━━━━<br/>Scans skills/ + skills_extended/<br/>Reads YAML frontmatter"]
        SKILLSEXT["★ skills_extended/<br/>validate-audit/SKILL.md<br/>━━━━━━━━━━<br/>SkillSource.BUNDLED_EXTENDED<br/>categories: [audit]"]
        WRITERECIPE["● write-recipe/SKILL.md<br/>━━━━━━━━━━<br/>validate-audit added to<br/>canonical skills list"]
    end

    subgraph Upstream ["UPSTREAM AUDIT SKILLS"]
        direction LR
        AUDITARCH["audit-arch<br/>━━━━━━━━━━<br/>Produces arch audit reports"]
        AUDITTESTS["audit-tests<br/>━━━━━━━━━━<br/>Produces test audit reports"]
        AUDITCOH["audit-cohesion<br/>━━━━━━━━━━<br/>Produces cohesion audit reports"]
    end

    subgraph Output ["OUTPUT ARTIFACTS"]
        direction TB
        VALRPT["★ validated_report_{source}_{ts}.md<br/>━━━━━━━━━━<br/>.autoskillit/temp/validate-audit/<br/>First line: validated: true"]
        CONTESTED["★ contested_findings_{source}_{ts}.md<br/>━━━━━━━━━━<br/>.autoskillit/temp/validate-audit/<br/>Written only when N_contested > 0"]
        PREPISSUE["prepare-issue<br/>━━━━━━━━━━<br/>Optional: file contested findings<br/>as GitHub issue (interactive only)"]
    end

    COOK --> CMD
    HEADLESS --> CMD
    CMD --> AUTO_DETECT
    CMD --> HEADLESS_FLAG
    AUDITARCH & AUDITTESTS & AUDITCOH -.->|"produces input for"| AUTO_DETECT
    AUTO_DETECT --> CMD
    RESOLVER --> SKILLSEXT
    SKILLSEXT --> CMD
    CMD --> VALRPT
    CMD --> CONTESTED
    CONTESTED -.->|"offer interactive"| PREPISSUE

    class COOK,HEADLESS,CMD cli;
    class HEADLESS_FLAG,AUTO_DETECT phase;
    class RESOLVER stateNode;
    class SKILLSEXT,VALRPT,CONTESTED newComponent;
    class WRITERECIPE handler;
    class AUDITARCH,AUDITTESTS,AUDITCOH detector;
    class PREPISSUE output;
Loading

Color Legend:

Color Category Description
Dark Blue CLI Invocation paths and slash commands
Purple Config Configuration and auto-detection
Teal Discovery SkillResolver state
Green (★) New New skill files and output artifacts
Orange (●) Modified Updated write-recipe skill list
Red Upstream Audit skills that produce input
Dark Teal Output Post-validation downstream

Closes #429

Implementation Plan

Plan files:

  • /home/talon/projects/autoskillit-runs/impl-20260326-164058-424464/.autoskillit/temp/make-plan/validate_audit_skill_plan_2026-03-26_000100_part_a.md
  • /home/talon/projects/autoskillit-runs/impl-20260326-164058-424464/.autoskillit/temp/make-plan/validate_audit_skill_plan_2026-03-26_000100_part_b.md

🤖 Generated with Claude Code via AutoSkillit

Trecek and others added 3 commits March 26, 2026 19:07
- Update test_skills.py: add validate-audit to BUNDLED_SKILLS,
  AUDIT_SKILL_NAMES, and test_all_audit_skills_have_audit_category;
  bump skill count assertions to 58/60
- Create tests/skills/test_validate_audit_contracts.py with T-VAL-006
  through T-VAL-015 contract tests for the forthcoming validate-audit
  SKILL.md (all tests fail until Part B creates the skill directory)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create src/autoskillit/skills_extended/validate-audit/SKILL.md implementing
post-audit finding validation with 9–10 parallel subagents. Supports all three
audit formats (arch, tests, cohesion), produces validated_report and
contested_findings output files, and handles interactive/headless mode.

Update docs skill count from 60 to 61 throughout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested (10 blocking findings)

class TestValidateAuditContent:
# T-VAL-008
def test_validated_true_marker_required(self) -> None:
assert "validated: true" in _skill_text()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: T-VAL-008: asserting 'validated: true' is a raw string-in-file check that will silently pass if the string appears anywhere (e.g. in a comment or example), providing no structural guarantee about SKILL.md frontmatter. Consider parsing the YAML frontmatter and asserting the field value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. T-VAL-008 is a contract test on the SKILL.md text itself, not on any output YAML file. It verifies the skill spec contains the 'validated: true' requirement (SKILL.md line 49 and output template). Parsing YAML frontmatter of the skill description would be wrong — that frontmatter has no 'validated' key.

assert "validated: true" in _skill_text()

# T-VAL-009
def test_parallel_single_message_dispatch(self) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: T-VAL-009: asserting 'single message' and 'parallel' appear anywhere in lowercased text is too broad. Both terms appear in many unrelated contexts. The test cannot distinguish the intended dispatch contract from incidental mentions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. 'single message' is a domain-specific phrase appearing only in the Step 3 SINGLE MESSAGE section header. These terms are tightly scoped contract markers for the parallel dispatch contract, matching the established pattern for SKILL.md contract tests in this codebase.


### Step 6 — Interactive vs Headless Approval

Detect headless mode: run `echo "${AUTOSKILLIT_HEADLESS:-0}"` via Bash. Output `1` means
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: Headless mode detection relies on AUTOSKILLIT_HEADLESS env var but no guard is documented for unexpected values. The skill should specify the safe default (interactive or headless) when the variable check produces neither '0' nor '1'.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The bash ${AUTOSKILLIT_HEADLESS:-0} expression IS the guard: it normalizes unset/empty to '0' (safe default). Any unexpected non-empty value (e.g. 'true', '2') does not equal '1' and naturally falls to interactive mode — the safe default. The behavior is safe and unambiguous for all possible inputs.

class TestValidateAuditContent:
# T-VAL-008
def test_validated_true_marker_required(self) -> None:
assert "validated: true" in _skill_text()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: T-VAL-008: asserting 'validated: true' is a raw string-in-file check that will silently pass if the string appears anywhere (e.g. in a comment or example), providing no structural guarantee about SKILL.md frontmatter. Consider parsing the YAML frontmatter and asserting the field value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. T-VAL-008 is a contract test on the SKILL.md text itself, not on any output YAML file. It verifies the skill spec contains the 'validated: true' requirement (SKILL.md line 49 and output template). Parsing YAML frontmatter of the skill description would be wrong — that frontmatter has no 'validated' key.

assert "validated: true" in _skill_text()

# T-VAL-009
def test_parallel_single_message_dispatch(self) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: T-VAL-009: asserting 'single message' and 'parallel' appear anywhere in lowercased text is too broad. Both terms appear in many unrelated contexts. The test cannot distinguish the intended dispatch contract from incidental mentions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. 'single message' is a domain-specific phrase appearing only in the Step 3 SINGLE MESSAGE section header. These terms are tightly scoped contract markers for the parallel dispatch contract, matching the established pattern for SKILL.md contract tests in this codebase.


### Step 6 — Interactive vs Headless Approval

Detect headless mode: run `echo "${AUTOSKILLIT_HEADLESS:-0}"` via Bash. Output `1` means
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: Headless mode detection relies on AUTOSKILLIT_HEADLESS env var but no guard is documented for unexpected values. The skill should specify the safe default (interactive or headless) when the variable check produces neither '0' nor '1'.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The bash ${AUTOSKILLIT_HEADLESS:-0} expression IS the guard: it normalizes unset/empty to '0' (safe default). Any unexpected non-empty value (e.g. 'true', '2') does not equal '1' and naturally falls to interactive mode — the safe default. The behavior is safe and unambiguous for all possible inputs.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 10 blocking issues. See inline comments. (Note: GitHub prevents requesting changes on your own PR — submitting as comment.)

Trecek added 3 commits March 26, 2026 19:55
…3/T-VAL-015 assertions

- Add @functools.cache to _skill_text() to eliminate 7+ redundant file-system
  reads across the test class (reviewer finding: 7 redundant reads per run)
- T-VAL-013: replace broad 'headless'/'interactive' substring checks with the
  specific contract phrase 'Interactive vs Headless' from Step 6 section header
- T-VAL-015: replace weak 'history' check with specific phrase
  'history research agent' that unambiguously targets the named agent contract
- Add no-files fallback: when audit_report_path is omitted and no files exist
  under any of the three temp directories, print error and exit non-zero
- Add unrecognized format guard: when title/preamble matches none of the three
  detection patterns, print a specific error message and exit non-zero
- Fix logic contradiction in Step 2: merging clusters cannot increase the group
  count from <8 to ≥8; replace with 'assign each area its own batch' rule
CLAUDE.md specifies 42 MCP tools: 40 kitchen-tagged + 1 headless (test_check,
also kitchen-tagged) + 2 free-range (open_kitchen, close_kitchen).
@Trecek Trecek added this pull request to the merge queue Mar 27, 2026
Merged via the queue into integration with commit 5a9ef58 Mar 27, 2026
2 checks passed
@Trecek Trecek deleted the feat-validate-audit-skill-post-audit-finding-validation-with/429 branch March 27, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant