Implement post-run pipeline health analysis with developer config toggle by Trecek · Pull Request #3006 · TalonT-Org/AutoSkillit

Trecek · 2026-05-26T01:20:44Z

Summary

Add a diagnostics.post_run_analysis config toggle (default: false) that runs an analyze-pipeline-health skill after every pipeline completes. The skill spawns Haiku scanner subagents per step group to read session logs, investigate anomalies, and report findings. Scanners use Sonnet adversarial subagents to validate significant findings before reporting. A hidden post_run_diagnostics ingredient gates the recipe step via skip_when_false, and kitchen_id is injected into recipe ingredient_overrides for session log scoping.

Architecture Impact

This feature introduces a post-run diagnostic step to the pipeline execution flow. The diagnostics are gated behind a developer config toggle and use a hierarchical agent scanning approach (Haiku scanners with Sonnet adversarial validators) to investigate session logs per step group.

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260525-164706-654844/.autoskillit/temp/make-plan/post_run_diagnostic_analysis_plan_2026-05-25_170500.md

Changed Files

New (★):

★ src/autoskillit/agents/pipeline-health-scanner.md
★ src/autoskillit/skills_extended/analyze-pipeline-health/SKILL.md
★ tests/recipe/test_diagnostic_steps.py

Modified (●):

● AGENTS.md
● README.md
● docs/README.md
● docs/developer/end-turn-hazards.md
● docs/execution/architecture.md
● docs/skills/catalog.md
● docs/skills/visibility.md
● src/autoskillit/agents/CLAUDE.md
● src/autoskillit/config/CLAUDE.md
● src/autoskillit/config/__init__.py
● src/autoskillit/config/_config_dataclasses.py
● src/autoskillit/config/defaults.yaml
● src/autoskillit/config/ingredient_defaults.py
● src/autoskillit/config/settings.py
● src/autoskillit/recipe/skill_contracts.yaml
● src/autoskillit/recipes/implementation.json
● src/autoskillit/recipes/implementation.yaml
● src/autoskillit/recipes/merge-prs.json
● src/autoskillit/recipes/merge-prs.yaml
● src/autoskillit/recipes/remediation.json
● src/autoskillit/recipes/remediation.yaml
● src/autoskillit/server/tools/tools_kitchen.py
● src/autoskillit/server/tools/tools_recipe.py
● src/autoskillit/skills_extended/write-recipe/SKILL.md
● tests/config/test_config.py
● tests/config/test_helpers.py
● tests/contracts/test_callable_skill_parity.py
● tests/docs/test_claude_md_structure.py
● tests/docs/test_doc_counts.py
● tests/infra/test_schema_version_convention.py
● tests/recipe/CLAUDE.md
● tests/recipe/test_bundled_recipes_general.py
● tests/recipe/test_check_ci_already_passed_routing.py
● tests/recipe/test_implementation.py
● tests/recipe/test_remediation_recipe.py
● tests/server/test_mcp_overrides.py
● tests/server/test_tools_agents.py
● tests/server/test_tools_kitchen_envelope.py
● tests/server/test_tools_recipe.py

Closes #2968

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	Model	count	uncached	output	cache_read	peak_ctx	turns	cache_write	time
plan	claude-sonnet-4-6	1	95	43.7k	3.8M	159.4k	310	151.7k	25m 3s
verify	claude-sonnet-4-6	1	140	19.5k	874.2k	66.5k	94	56.3k	7m 8s
implement*	MiniMax-M2.7-highspeed	1	10.9M	44.1k	5.5M	30.7k	401	15.9k	18m 38s
fix	claude-sonnet-4-6	1	758	37.8k	9.9M	155.4k	225	145.4k	27m 32s
prepare_pr*	MiniMax-M2.7-highspeed	1	117.7k	4.8k	242.6k	30.7k	22	15.4k	1m 34s
compose_pr*	MiniMax-M2.7-highspeed	1	40.7k	2.1k	181.2k	30.7k	14	15.1k	47s
review_pr	claude-sonnet-4-6	3	7.6k	74.6k	3.1M	95.2k	225	224.7k	24m 12s
resolve_review	claude-sonnet-4-6	3	3.5k	173.3k	30.9M	160.6k	735	381.5k	1h 28m
dispatch:608677e5-e1ef-4d42-8ae2-02c67a031fa7	claude-sonnet-4-6	1	556	135	4.8M	82.3k	69	166.4k	3h 0m
Total			11.1M	400.1k	59.3M	160.6k		1.2M	6h 13m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step	LoC Changed	cache_read/LoC	cache_write/LoC	output/LoC
plan	0	—	—	—
verify	0	—	—	—
implement	779	7044.4	20.4	56.6
fix	152	65344.1	956.5	248.8
prepare_pr	0	—	—	—
compose_pr	0	—	—	—
review_pr	0	—	—	—
resolve_review	0	—	—	—
dispatch:608677e5-e1ef-4d42-8ae2-02c67a031fa7	1009	4709.5	164.9	0.1
Total	1940	30591.3	604.3	206.2

Model Usage Breakdown

Model	steps	uncached	output	cache_read	cache_write	time
claude-sonnet-4-6	6	12.6k	349.1k	53.4M	1.1M	5h 52m
MiniMax-M2.7-highspeed	3	11.1M	50.9k	5.9M	46.5k	20m 59s

Add diagnostics.post_run_analysis config toggle (default: false) that runs an analyze-pipeline-health skill after every pipeline completes. The skill spawns Haiku scanner subagents per step group to read session logs, investigate anomalies, and report findings with adversarial validation. - Add DiagnosticsConfig dataclass with post_run_analysis field - Wire DiagnosticsConfig into AutomationConfig and config loading - Inject post_run_diagnostics and kitchen_id into ingredient_overrides at both open_kitchen and load_recipe call sites - Add pipeline-health-scanner agent definition - Add analyze-pipeline-health skill to tier3 and skills_extended - Add run_diagnostic* steps to implementation, remediation, merge-prs recipes with skip_when_false routing - Update documentation counts (27→28 dataclasses, 135→136 skills) - Add comprehensive tests for new config, ingredient defaults, injection, agent/skill existence, and recipe step validation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Fix SyntaxError: unterminated string literals in test_tools_kitchen_envelope.py and test_tools_recipe.py (single-quoted multi-line strings → escaped \n) - Fix wrong import: autoskillit.recipe.yaml_loader → autoskillit.recipe.io in test_diagnostic_steps.py - Add model: "" field to all run_diagnostic* steps in implementation.yaml, remediation.yaml, merge-prs.yaml - Add analyze-pipeline-health to skill_contracts.yaml with read_only: true - Add anti-fabrication guard language to analyze-pipeline-health/SKILL.md NEVER block - Add analyze-pipeline-health to write-recipe SKILL.md bundled skills list - Update test_bundled_recipes_general.py: accept run_diagnostic routing from register_clone_success - Update test_check_ci_already_passed_routing.py: accept run_diagnostic_no_ci as intermediary - Update test_implementation.py and test_remediation_recipe.py: accept run_diagnostic_unconfirmed routing - Update test_mcp_overrides.py: check user overrides are included (not exact match with auto-injected keys) - Update test_schema_version_convention.py: update tools_kitchen.py allowlist 608→613 - Add analyze-pipeline-health to KNOWN_EXCEPTIONS in test_callable_skill_parity.py - Regenerate contract cards and JSON compiled recipes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…redient_overrides open_kitchen() calls ctx.kitchen_id = resolve_kitchen_id() which overwrites the mock's pre-set value. Patch resolve_kitchen_id to return the expected test value so the assert validates that the injected kitchen_id matches what open_kitchen assigned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Trecek

AutoSkillit PR Review — Verdict: changes_requested

Trecek · 2026-05-26T02:31:53Z

+    on_success: run_diagnostic_no_ci
+    on_failure: run_diagnostic_no_ci
+
+  run_diagnostic_no_ci:


[warning] cohesion: Four near-identical diagnostic step blocks (run_diagnostic, run_diagnostic_error, run_diagnostic_no_ci, run_diagnostic_unconfirmed) are copy-pasted verbatim across implementation.yaml, remediation.yaml, and merge-prs.yaml (12 copies total). The only differences are on_success/on_failure targets. Consider whether a sub-recipe or shared step convention was intentional or if this was an oversight.

Valid observation — flagged for design decision. The four run_diagnostic* step blocks are near-identical by design to handle different pipeline exit paths (success, error, no-ci, unconfirmed). Refactoring to YAML anchors/aliases would reduce repetition but reduce readability. Deferred to a separate cleanup PR.

Trecek

AutoSkillit review: 1 critical finding and 11 warnings detected. See inline comments — changes required before merge.

Verdict: changes_requested (posted as COMMENT — own-PR guard active)

Critical (1):

src/autoskillit/config/settings.py:483 [defense]: bool("false") coercion bug — string "false" evaluates to True, potentially enabling diagnostics when disabled in config

Warnings (11):

tools_kitchen.py:402 [arch]: Duplicated auto-override injection logic across two IL-3 server tools
tools_kitchen.py:403 [defense]: kitchen_id injected without None-guard/str() coercion
tools_recipe.py:201 [arch]: Same auto-override duplication as tools_kitchen.py
tools_recipe.py:202 [defense]: Same kitchen_id None-guard missing
analyze-pipeline-health/SKILL.md:45 [arch]: Hardcoded Linux-only log path bypasses platform-aware resolution
test_config.py:292 [tests]: test_diagnostics_config_defaults duplicated verbatim (L50 and L292)
test_config.py:345 [tests]: test_load_diagnostics_config duplicated verbatim (L175 and L345)
test_helpers.py:76 [tests]: Assertion weakened from == 'auto' to is not None
_config_dataclasses.py:292 [cohesion]: Config field 'post_run_analysis' asymmetric with ingredient name 'post_run_diagnostics'
implementation.yaml:941 [cohesion]: 12 near-identical diagnostic step copies across 3 recipes — may warrant a sub-recipe
test_tools_kitchen_envelope.py:257 [tests]: resolve_kitchen_id patched but not asserted called
test_doc_counts.py:312 [slop]: Stale comment arithmetic (3+132=135, not 136)

Trecek

Observations accumulated from 1 local review rounds:

Trecek and others added 3 commits May 25, 2026 17:36

Trecek commented May 26, 2026

View reviewed changes

Comment thread tests/config/test_config.py

ci: trigger

f3da4e8

Trecek added this pull request to the merge queue May 26, 2026

Merged via the queue into develop with commit 0440da0 May 26, 2026
3 checks passed

Trecek deleted the implement-post-run-diagnostic-analysis-step-with-developer-c/2968 branch May 26, 2026 04:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement post-run pipeline health analysis with developer config toggle#3006

Implement post-run pipeline health analysis with developer config toggle#3006
Trecek merged 4 commits into
developfrom
implement-post-run-diagnostic-analysis-step-with-developer-c/2968

Trecek commented May 26, 2026 •

edited

Loading

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trecek May 26, 2026

Uh oh!

Trecek May 26, 2026

Uh oh!

Uh oh!

Uh oh!

Trecek left a comment

Uh oh!

Trecek left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture Impact

Implementation Plan

Changed Files

New (★):

Modified (●):

Token Usage Summary

Token Efficiency

Model Usage Breakdown

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trecek May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Trecek May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Trecek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented May 26, 2026 •

edited

Loading