Implement post-run pipeline health analysis with developer config toggle#3006
Conversation
Add diagnostics.post_run_analysis config toggle (default: false) that runs an analyze-pipeline-health skill after every pipeline completes. The skill spawns Haiku scanner subagents per step group to read session logs, investigate anomalies, and report findings with adversarial validation. - Add DiagnosticsConfig dataclass with post_run_analysis field - Wire DiagnosticsConfig into AutomationConfig and config loading - Inject post_run_diagnostics and kitchen_id into ingredient_overrides at both open_kitchen and load_recipe call sites - Add pipeline-health-scanner agent definition - Add analyze-pipeline-health skill to tier3 and skills_extended - Add run_diagnostic* steps to implementation, remediation, merge-prs recipes with skip_when_false routing - Update documentation counts (27→28 dataclasses, 135→136 skills) - Add comprehensive tests for new config, ingredient defaults, injection, agent/skill existence, and recipe step validation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix SyntaxError: unterminated string literals in test_tools_kitchen_envelope.py and test_tools_recipe.py (single-quoted multi-line strings → escaped \n) - Fix wrong import: autoskillit.recipe.yaml_loader → autoskillit.recipe.io in test_diagnostic_steps.py - Add model: "" field to all run_diagnostic* steps in implementation.yaml, remediation.yaml, merge-prs.yaml - Add analyze-pipeline-health to skill_contracts.yaml with read_only: true - Add anti-fabrication guard language to analyze-pipeline-health/SKILL.md NEVER block - Add analyze-pipeline-health to write-recipe SKILL.md bundled skills list - Update test_bundled_recipes_general.py: accept run_diagnostic routing from register_clone_success - Update test_check_ci_already_passed_routing.py: accept run_diagnostic_no_ci as intermediary - Update test_implementation.py and test_remediation_recipe.py: accept run_diagnostic_unconfirmed routing - Update test_mcp_overrides.py: check user overrides are included (not exact match with auto-injected keys) - Update test_schema_version_convention.py: update tools_kitchen.py allowlist 608→613 - Add analyze-pipeline-health to KNOWN_EXCEPTIONS in test_callable_skill_parity.py - Regenerate contract cards and JSON compiled recipes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…redient_overrides open_kitchen() calls ctx.kitchen_id = resolve_kitchen_id() which overwrites the mock's pre-set value. Patch resolve_kitchen_id to return the expected test value so the assert validates that the injected kitchen_id matches what open_kitchen assigned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested
| on_success: run_diagnostic_no_ci | ||
| on_failure: run_diagnostic_no_ci | ||
|
|
||
| run_diagnostic_no_ci: |
There was a problem hiding this comment.
[warning] cohesion: Four near-identical diagnostic step blocks (run_diagnostic, run_diagnostic_error, run_diagnostic_no_ci, run_diagnostic_unconfirmed) are copy-pasted verbatim across implementation.yaml, remediation.yaml, and merge-prs.yaml (12 copies total). The only differences are on_success/on_failure targets. Consider whether a sub-recipe or shared step convention was intentional or if this was an oversight.
There was a problem hiding this comment.
Valid observation — flagged for design decision. The four run_diagnostic* step blocks are near-identical by design to handle different pipeline exit paths (success, error, no-ci, unconfirmed). Refactoring to YAML anchors/aliases would reduce repetition but reduce readability. Deferred to a separate cleanup PR.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review: 1 critical finding and 11 warnings detected. See inline comments — changes required before merge.
Verdict: changes_requested (posted as COMMENT — own-PR guard active)
Critical (1):
src/autoskillit/config/settings.py:483[defense]:bool("false")coercion bug — string "false" evaluates to True, potentially enabling diagnostics when disabled in config
Warnings (11):
tools_kitchen.py:402[arch]: Duplicated auto-override injection logic across two IL-3 server toolstools_kitchen.py:403[defense]: kitchen_id injected without None-guard/str() coerciontools_recipe.py:201[arch]: Same auto-override duplication as tools_kitchen.pytools_recipe.py:202[defense]: Same kitchen_id None-guard missinganalyze-pipeline-health/SKILL.md:45[arch]: Hardcoded Linux-only log path bypasses platform-aware resolutiontest_config.py:292[tests]: test_diagnostics_config_defaults duplicated verbatim (L50 and L292)test_config.py:345[tests]: test_load_diagnostics_config duplicated verbatim (L175 and L345)test_helpers.py:76[tests]: Assertion weakened from == 'auto' to is not None_config_dataclasses.py:292[cohesion]: Config field 'post_run_analysis' asymmetric with ingredient name 'post_run_diagnostics'implementation.yaml:941[cohesion]: 12 near-identical diagnostic step copies across 3 recipes — may warrant a sub-recipetest_tools_kitchen_envelope.py:257[tests]: resolve_kitchen_id patched but not asserted calledtest_doc_counts.py:312[slop]: Stale comment arithmetic (3+132=135, not 136)
Trecek
left a comment
There was a problem hiding this comment.
Observations accumulated from 1 local review rounds:
Summary
Add a
diagnostics.post_run_analysisconfig toggle (default: false) that runs ananalyze-pipeline-healthskill after every pipeline completes. The skill spawns Haiku scanner subagents per step group to read session logs, investigate anomalies, and report findings. Scanners use Sonnet adversarial subagents to validate significant findings before reporting. A hiddenpost_run_diagnosticsingredient gates the recipe step viaskip_when_false, andkitchen_idis injected into recipeingredient_overridesfor session log scoping.Architecture Impact
This feature introduces a post-run diagnostic step to the pipeline execution flow. The diagnostics are gated behind a developer config toggle and use a hierarchical agent scanning approach (Haiku scanners with Sonnet adversarial validators) to investigate session logs per step group.
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-20260525-164706-654844/.autoskillit/temp/make-plan/post_run_diagnostic_analysis_plan_2026-05-25_170500.mdChanged Files
New (★):
★
src/autoskillit/agents/pipeline-health-scanner.md★
src/autoskillit/skills_extended/analyze-pipeline-health/SKILL.md★
tests/recipe/test_diagnostic_steps.pyModified (●):
●
AGENTS.md●
README.md●
docs/README.md●
docs/developer/end-turn-hazards.md●
docs/execution/architecture.md●
docs/skills/catalog.md●
docs/skills/visibility.md●
src/autoskillit/agents/CLAUDE.md●
src/autoskillit/config/CLAUDE.md●
src/autoskillit/config/__init__.py●
src/autoskillit/config/_config_dataclasses.py●
src/autoskillit/config/defaults.yaml●
src/autoskillit/config/ingredient_defaults.py●
src/autoskillit/config/settings.py●
src/autoskillit/recipe/skill_contracts.yaml●
src/autoskillit/recipes/implementation.json●
src/autoskillit/recipes/implementation.yaml●
src/autoskillit/recipes/merge-prs.json●
src/autoskillit/recipes/merge-prs.yaml●
src/autoskillit/recipes/remediation.json●
src/autoskillit/recipes/remediation.yaml●
src/autoskillit/server/tools/tools_kitchen.py●
src/autoskillit/server/tools/tools_recipe.py●
src/autoskillit/skills_extended/write-recipe/SKILL.md●
tests/config/test_config.py●
tests/config/test_helpers.py●
tests/contracts/test_callable_skill_parity.py●
tests/docs/test_claude_md_structure.py●
tests/docs/test_doc_counts.py●
tests/infra/test_schema_version_convention.py●
tests/recipe/CLAUDE.md●
tests/recipe/test_bundled_recipes_general.py●
tests/recipe/test_check_ci_already_passed_routing.py●
tests/recipe/test_implementation.py●
tests/recipe/test_remediation_recipe.py●
tests/server/test_mcp_overrides.py●
tests/server/test_tools_agents.py●
tests/server/test_tools_kitchen_envelope.py●
tests/server/test_tools_recipe.pyCloses #2968
🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary
* Step used a non-Anthropic provider; caching behavior may differ.
Token Efficiency
Model Usage Breakdown