Skip to content

[auto-research 9.2] Audit-trail artifact in worktree#2174

Merged
Trecek merged 9 commits into
developfrom
auto-research-9-2-audit-trail-artifact-in-worktree/856
May 7, 2026
Merged

[auto-research 9.2] Audit-trail artifact in worktree#2174
Trecek merged 9 commits into
developfrom
auto-research-9-2-audit-trail-artifact-in-worktree/856

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented May 7, 2026

Summary

Persist classification decisions and review verdicts as committed audit artifacts in the research bundle. This involves five coordinated changes:

  1. Extend create_worktree.sh to create a research/{slug}/audit/ directory and copy the evaluation dashboard and visualization plan trace into it.
  2. Extend plan-visualization/SKILL.md to write a visualization-plan-trace.md artifact capturing Tier-C routing decisions (primary_tradition, applied_union_rules, precedence_trace) and emit disambiguation_rule_applied and tier_c_lens as structured outputs.
  3. Extend generate-report/SKILL.md to add YAML frontmatter to the report template (the full audit-trail schema: 7 fields + nested audit_trail_path map) and add a "Design Review Summary" section referencing the committed dashboard.
  4. Update research.yaml to capture new outputs from plan_visualization and review_design, and thread them as flags to generate_report.
  5. Author two new docs: docs/research/silent-type-convention.md (shared convention for [auto-research 2.3] Review-design handling of all-silent types #835 and [auto-research 4.7] No-mandatory-figures path in vis-lens-methodology-norms #846) and docs/research/audit-trail-format.md (structure of research/{slug}/audit/).

Requirements

(Extracted from issue #856 ## Scope section)

Persist the classification decisions and review verdicts as committed audit artifacts in the research bundle: (a) copy evaluation_dashboard.md from {{AUTOSKILLIT_TEMP}}/review-design/ to research/{slug}/audit/design-review-dashboard.md; (b) create research/{slug}/audit/visualization-plan-trace.md documenting Tier-C routing decisions; (c) add a YAML frontmatter metadata block at the top of report.md. Author a shared convention doc docs/research/silent-type-convention.md consumed by Work Items 2.3 and 4.7.

Acceptance criteria:

  • research/{slug}/audit/design-review-dashboard.md materializes in the worktree post-review (copied from TEMP, committed)
  • research/{slug}/audit/visualization-plan-trace.md materializes post-plan-visualization
  • report.md has valid YAML frontmatter matching the schema
  • docs/research/silent-type-convention.md authored and consumed by WI 2.3 and 4.7
  • docs/research/audit-trail-format.md documents the audit-artifact structure
  • Test: generate a report; parse YAML frontmatter with a YAML loader; verify all fields present and well-formed
  • Test: audit files exist in the worktree and are committed (in git log)
  • task test-check passes

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260507-075657-573707/.autoskillit/temp/make-plan/audit_trail_artifact_in_worktree_plan_2026-05-07_080000.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step Model count uncached output cache_read peak_ctx turns cache_write time
plan 1 9.3k 18.7k 895.0k 81.6k 161 115.2k 14m 39s
verify 1 1.4k 5.7k 465.8k 43.8k 69 31.3k 3m 2s
implement 1 306.9k 3.4k 358.5k 25.6k 32 38.9k 2m 7s
prepare_pr* MiniMax-M2.7-highspeed 2 137.0k 16.6k 691.4k 41.2k 57 69.8k 4m 33s
compose_pr* MiniMax-M2.7-highspeed 2 40.1k 15.1k 974.5k 40.3k 70 42.4k 4m 57s
review_pr claude-sonnet-4-6 1 50 17.0k 875.8k 164.2k 134 93.8k 7m 35s
resolve_review claude-opus-4-6 1 55 15.4k 1.0M 72.2k 63 59.7k 9m 25s
Total 494.9k 91.8k 5.3M 164.2k 451.0k 46m 21s

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step LoC Changed cache_read/LoC cache_write/LoC output/LoC
plan 0
verify 0
implement 0
prepare_pr 0
compose_pr 0
review_pr 0
resolve_review 48 21599.9 1243.7 320.0
Total 48 110371.1 9395.1 1913.1

Model Usage Breakdown

Model steps uncached output cache_read cache_write time
claude-sonnet-4-6 1 79 20.0k 610.4k 51.6k 8m 37s
claude-opus-4-6 2 325 54.3k 10.6M 473.2k 57m 50s
MiniMax-M2.7-highspeed 3 4.5M 37.5k 2.7M 148.0k 12m 28s

Trecek and others added 9 commits May 7, 2026 09:11
- plan-visualization: emit visualization-plan-trace.md and new structured
  tokens (disambiguation_rule_applied, tier_c_lens, methodology_tradition,
  visualization_plan_trace_path)
- review-design: emit classification_timestamp structured output
- research.yaml: capture new outputs, thread to create_worktree and
  generate_report with --experiment-type, --methodology-traditions,
  --design-review-verdict, --disambiguation-rule-applied, --tier-c-lens,
  --classification-timestamp flags
- create_worktree.sh: create audit/ dir and copy evaluation_dashboard
  and visualization_plan_trace to audit/
- generate-report: add YAML frontmatter with full audit-trail schema,
  add Design Review Summary section
- docs: add silent-type-convention.md and audit-trail-format.md
- contracts: regenerate research.yaml contract
- tests: add 5 test files for schema, artifacts, recipe contracts,
  and documentation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add pytestmark with layer("recipe") and small markers to all 5 new test files
- Declare classification_timestamp in review-design outputs contract
- Declare disambiguation_rule_applied, tier_c_lens, methodology_tradition,
  visualization_plan_trace_path in plan-visualization outputs contract
- Add silent-type-convention.md and audit-trail-format.md to docs/README.md
- Remove unused monkeypatch parameter from test fixture

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lidation

The undeclared-capture-key rule reads from recipe/skill_contracts.yaml,
not recipes/contracts/research.yaml. Add classification_timestamp to
review-design and disambiguation_rule_applied, tier_c_lens,
methodology_tradition, visualization_plan_trace_path to plan-visualization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…vior

The implementation commits generated the contract card with skill_hashes
populated, but the test expects empty hashes (matching migration engine
behavior). Regenerated without skills_dir. Also added missing
write_behavior: always to make-groups in skill_contracts.yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Syncs test_contracts.py with skill_contracts.yaml after adding
write_behavior: always to make-groups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests now perform actual copy operations and verify file existence
and content, instead of tautological mkdir+exists or always-true
not-exists assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue May 7, 2026
Merged via the queue into develop with commit 1fe4a93 May 7, 2026
2 checks passed
@Trecek Trecek deleted the auto-research-9-2-audit-trail-artifact-in-worktree/856 branch May 7, 2026 17:18
Trecek added a commit that referenced this pull request May 8, 2026
## Summary

Persist classification decisions and review verdicts as committed audit
artifacts in the research bundle. This involves five coordinated
changes:

1. **Extend `create_worktree.sh`** to create a `research/{slug}/audit/`
directory and copy the evaluation dashboard and visualization plan trace
into it.
2. **Extend `plan-visualization/SKILL.md`** to write a
`visualization-plan-trace.md` artifact capturing Tier-C routing
decisions (primary_tradition, applied_union_rules, precedence_trace) and
emit `disambiguation_rule_applied` and `tier_c_lens` as structured
outputs.
3. **Extend `generate-report/SKILL.md`** to add YAML frontmatter to the
report template (the full audit-trail schema: 7 fields + nested
`audit_trail_path` map) and add a "Design Review Summary" section
referencing the committed dashboard.
4. **Update `research.yaml`** to capture new outputs from
`plan_visualization` and `review_design`, and thread them as flags to
`generate_report`.
5. **Author two new docs**: `docs/research/silent-type-convention.md`
(shared convention for #835 and #846) and
`docs/research/audit-trail-format.md` (structure of
`research/{slug}/audit/`).

## Requirements

(Extracted from issue #856 ## Scope section)

Persist the classification decisions and review verdicts as committed
audit artifacts in the research bundle: (a) copy
`evaluation_dashboard.md` from `{{AUTOSKILLIT_TEMP}}/review-design/` to
`research/{slug}/audit/design-review-dashboard.md`; (b) create
`research/{slug}/audit/visualization-plan-trace.md` documenting Tier-C
routing decisions; (c) add a YAML frontmatter metadata block at the top
of `report.md`. Author a shared convention doc
`docs/research/silent-type-convention.md` consumed by Work Items 2.3 and
4.7.

Acceptance criteria:
- `research/{slug}/audit/design-review-dashboard.md` materializes in the
worktree post-review (copied from TEMP, committed)
- `research/{slug}/audit/visualization-plan-trace.md` materializes
post-plan-visualization
- `report.md` has valid YAML frontmatter matching the schema
- `docs/research/silent-type-convention.md` authored and consumed by WI
2.3 and 4.7
- `docs/research/audit-trail-format.md` documents the audit-artifact
structure
- Test: generate a report; parse YAML frontmatter with a YAML loader;
verify all fields present and well-formed
- Test: audit files exist in the worktree and are committed (in git log)
- `task test-check` passes

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260507-075657-573707/.autoskillit/temp/make-plan/audit_trail_artifact_in_worktree_plan_2026-05-07_080000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit
<!-- autoskillit:pipeline-signature
steps=prepare_pr,run_arch_lenses,compose_pr,annotate_pr_diff,review_pr
-->

## Token Usage Summary

| Step | Model | count | uncached | output | cache_read | peak_ctx |
turns | cache_write | time |

|------|-------|-------|----------|--------|------------|----------|-------|-------------|------|
| plan | claude-sonnet-4-6 | 1 | 79 | 20.0k | 610.4k | 68.1k | 96 |
51.6k | 8m 37s |
| verify | claude-opus-4-6 | 1 | 55 | 15.8k | 2.2M | 83.1k | 173 |
131.3k | 9m 53s |
| implement* | MiniMax-M2.7-highspeed | 1 | 4.4M | 26.2k | 2.4M | 114.2k
| 189 | 92.1k | 9m 27s |
| fix | claude-opus-4-6 | 5 | 270 | 38.5k | 8.4M | 95.8k | 291 | 341.9k
| 47m 56s |
| prepare_pr* | MiniMax-M2.7-highspeed | 1 | 136.8k | 9.4k | 172.2k |
28.7k | 22 | 40.9k | 2m 16s |
| compose_pr* | MiniMax-M2.7-highspeed | 1 | 39.9k | 1.8k | 169.3k |
28.7k | 14 | 15.0k | 44s |
| **Total** | | | 4.5M | 111.8k | 13.9M | 114.2k | | 672.8k | 1h 18m |

\* *Step used a non-Anthropic provider; caching behavior may differ.*

## Token Efficiency

| Step | LoC Changed | cache_read/LoC | cache_write/LoC | output/LoC |
|------|-------------|----------------|-----------------|------------|
| plan | 0 | — | — | — |
| verify | 0 | — | — | — |
| implement | 0 | — | — | — |
| fix | 19147 | 436.4 | 17.9 | 2.0 |
| prepare_pr | 0 | — | — | — |
| compose_pr | 0 | — | — | — |
| **Total** | **19147** | 726.8 | 35.1 | 5.8 |

## Model Usage Breakdown

| Model | steps | uncached | output | cache_read | cache_write | time |
|-------|-------|----------|--------|------------|-------------|------|
| claude-sonnet-4-6 | 1 | 79 | 20.0k | 610.4k | 51.6k | 8m 37s |
| claude-opus-4-6 | 2 | 325 | 54.3k | 10.6M | 473.2k | 57m 50s |
| MiniMax-M2.7-highspeed | 3 | 4.5M | 37.5k | 2.7M | 148.0k | 12m 28s |

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant