Skip to content

feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714

Merged
MarkusNeusinger merged 2 commits intomainfrom
feat/daily-regen-pre-flight
May 5, 2026
Merged

feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714
MarkusNeusinger merged 2 commits intomainfrom
feat/daily-regen-pre-flight

Conversation

@MarkusNeusinger
Copy link
Copy Markdown
Owner

@MarkusNeusinger MarkusNeusinger commented May 5, 2026

Summary

Adds two autonomous pre-flight steps to daily-regen.yml that run before bulk-generate fans out, lifting two quality vectors that today only exist in the local /regen and /update skills into the cloud cadence.

Everything runs through claude-code-action on Claude Max OAuth — no extra API costs — same pinned SHA as impl-generate / impl-review / impl-repair.

What it does

For each spec the pick job selects:

  1. Skip-gate (Bash, no LLM) — gh pr list --search "plots/<spec>/ in:files is:open". If any PR is open touching the spec, the polish step is skipped to avoid racing humans or stacking auto-polish PRs. The similarity audit still runs (it's read-only).

  2. Spec polish (claude-code-action, --model haiku) — audits the spec across the five update.md §2 dimensions (wording, missing sections, tag completeness, tag quality, tag accuracy). If anything needs work, opens an auto-polish/<spec>/<timestamp> branch + PR with label auto-polish. Never pushes to main directly. Never auto-merges. PR awaits human review. If nothing needs work, prints NOOP and stops. continue-on-error: true so a transient action failure does not block the main pipeline.

  3. Cross-library similarity audit (claude-code-action, --model haiku) — reads the 9 review.image_description blobs from plots/<spec>/metadata/python/*.yaml and clusters libraries that converged on the same data scenario / example domain / visual variant beyond what the spec dictated. Optionally drills into impl .py files for ambiguous clusters via the Read tool. Emits /tmp/change-requests.json keyed by library. Project-mandated constants (Okabe-Ito palette positions 1–7, plot size and aspect ratio, theme chrome) are explicitly excluded as cluster signals. continue-on-error: true; if the audit fails, the collect step falls back to empty change_requests.

  4. Dispatch bulk-generate with hints — passes change_requests JSON to bulk-generate, which jq-extracts the per-library hint and forwards it as the new change_request input to impl-generate. The hint is staged to /tmp/anyplot-change-request.txt, where the updated impl-generate-claude.md picks it up and treats it as a hard requirement (mirroring regen.md §2c verbatim — hard requirement, no sibling reads, preserve review.strengths, override "no changes for sake of changes").

Why

  • Spec drift: specs are currently written once at creation and never revisited. Tag vocab evolves, sections go missing, wording grows vague. Polish-on-cycle keeps them sharp without manual maintenance, and at ~10 cycles/day across 300+ specs, each spec gets touched once a month — drift risk is low.
  • Silent convergence: without a similarity check, 9 libs can independently land on the same scenario / domain / variant, producing nine copies of the same chart in different engines — exactly the opposite of the catalog's purpose. The hint-injection breaks the cluster cleanly (one library per cluster, alphabetically later).

Model routing

  • The existing daily-regen model input (default haiku, choices haiku/sonnet/opus) is unchanged and still flows to bulk-generate → impl-generate / review / repair.
  • The two new pre-flight LLM steps hardcode --model haiku — they're narrow, cheap audits.

dry_run semantics

dry_run=true runs the read-only and decision-only steps so operators can preview what the cycle will do without committing anything:

  • Runs: pick, skip-gate, similarity audit (read-only), collect change_requests
  • Skipped: spec polish (would open a real PR — side effect), dispatch bulk-generate (would fan out 9 impl-generate jobs)

To preview spec polish in isolation, run a real (non-dry-run) cycle against a single spec: gh workflow run daily-regen.yml -f specification_id=<spec> -f model=haiku. Polish opens a PR; merge or close it manually.

Backwards compatibility

Both new inputs (change_request on impl-generate, change_requests on bulk-generate) default to empty ("" and '{}'). Existing manual triggers without these inputs behave byte-identically to today.

Risks + rollback

  • Auto-polish PRs accumulate if humans never review them. Skip-gate prevents duplicates per spec; the spec just doesn't get polished further until reviewed. Acceptable: human stays in control.
  • Spec polish prompt has hard rules: no changes to id / issue / created, no semantic changes (data shape, plot type, requirements). Reviewable in commit diffs; revert if anything slips.
  • Rollback: revert daily-regen.yml first. Downstream change_request[s] inputs default to empty; remaining changes are no-ops without daily-regen wiring.

Test plan

  • CI parses all four workflow YAMLs cleanly
  • Manual gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku -f dry_run=true — confirms pick + preflight-dispatch (skip-gate, similarity, collect) run, polish + bulk-generate dispatch are skipped (dry_run)
  • Manual gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku (no dry_run) — confirms the full chain: polish either NOOPs or opens a PR, similarity emits change_requests, bulk-generate fires, an impl-generate run with non-empty hint shows ::notice::Change request staged: …
  • On the auto-polish PR (if produced): id / issue / created unchanged; only wording / sections / tags polished; updated bumped

🤖 Generated with Claude Code

…ity audit

Two new pre-flight steps run on every daily-regen cycle, before bulk-generate
fans out:

1. **Spec polish** (claude-code-action, Haiku) — audits the picked spec for
   wording, missing sections, and tag hygiene against `update.md` Phase 2
   dimensions. If any issue found, opens an `auto-polish/<spec>/<ts>` branch +
   PR (label: `auto-polish`). No direct push to main, no auto-merge — awaits
   human review. Skipped if any open PR already touches `plots/<spec>/`.

2. **Cross-library similarity audit** (claude-code-action, Haiku) — reads the
   9 metadata `review.image_description` blobs and clusters libraries that
   converged on the same data scenario / domain / visual variant beyond what
   the spec dictated. Optionally drills into impl `.py` files for ambiguous
   clusters. Emits `/tmp/change-requests.json` keyed by library, mirroring
   `agentic/workflows/modules/regen/plan.py`'s schema. Project-mandated
   constants (Okabe-Ito palette, plot size / aspect ratio, theme chrome) are
   explicitly excluded as cluster signals.

The resulting per-library hint is threaded through:

    daily-regen → bulk-generate → impl-generate → /tmp/anyplot-change-request.txt
    → impl-generate-claude.md handles it as a hard requirement (mirroring
    `regen.md` §2c verbatim: hard requirement, no sibling reads, preserve
    review.strengths)

Pre-flight LLM steps hardcode `--model haiku`; the existing `inputs.model`
selector still flows unchanged to the heavy downstream impl-generate /
impl-review / impl-repair work. No extra API costs — everything runs through
`claude-code-action` on Claude Max OAuth, same SHA as the existing workflows.

Files:
- prompts/workflow-prompts/spec-polish-claude.md (new)
- prompts/workflow-prompts/impl-similarity-claude.md (new)
- prompts/workflow-prompts/impl-generate-claude.md (CHANGE_REQUEST section)
- .github/workflows/impl-generate.yml (change_request input + staging)
- .github/workflows/bulk-generate.yml (change_requests JSON input + per-lib jq fan-out)
- .github/workflows/daily-regen.yml (preflight-dispatch matrix job replaces dispatch)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 17:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two LLM-driven pre-flight checks to daily-regen so each scheduled regeneration can optionally polish the spec and inject cross-library divergence hints before fan-out into implementation generation.

Changes:

  • Added new Claude workflow prompts for autonomous spec polish and cross-library similarity auditing.
  • Threaded per-library change_request / change_requests inputs through daily-regen, bulk-generate, and impl-generate.
  • Updated the implementation-generation prompt so a staged divergence hint becomes a binding regeneration requirement.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
prompts/workflow-prompts/spec-polish-claude.md New prompt defining autonomous spec-polish behavior and PR creation flow.
prompts/workflow-prompts/impl-similarity-claude.md New prompt defining the read-only cross-library similarity audit and JSON hint output.
prompts/workflow-prompts/impl-generate-claude.md Adds instructions for consuming a staged divergence hint during regeneration.
.github/workflows/impl-generate.yml Adds optional change_request input and stages it into /tmp for Claude.
.github/workflows/daily-regen.yml Replaces direct dispatch with per-spec pre-flight polish/audit/collect/dispatch flow.
.github/workflows/bulk-generate.yml Adds change_requests input, validates it, and forwards per-library hints to impl-generate.

Comment thread .github/workflows/daily-regen.yml Outdated
# ============================================================================
preflight-dispatch:
needs: pick
if: ${{ needs.pick.outputs.count != '0' && !inputs.dry_run }}

- name: Spec polish (autonomous, opens PR — no auto-merge)
if: steps.gate.outputs.skip_polish == '0'
timeout-minutes: 15
Variables for this run:
- SPEC_ID: ${{ matrix.spec_id }}

- name: Cross-library similarity audit
…failures

Address Copilot PR review on #5714:

- Drop the job-level `!inputs.dry_run` gate so dry-runs can exercise skip-gate,
  similarity audit, and collect (the documented test path). Side-effect steps
  (polish, dispatch) are individually gated.
- Add `continue-on-error: true` to spec-polish: it's an optional quality pass,
  a transient claude-code-action failure must not block the main regen
  pipeline.
- Add `continue-on-error: true` to similarity audit: read-only, falls back to
  empty change_requests via the existing collect step's file-existence check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MarkusNeusinger MarkusNeusinger merged commit 92fc47c into main May 5, 2026
7 checks passed
@MarkusNeusinger MarkusNeusinger deleted the feat/daily-regen-pre-flight branch May 5, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants