feat(daily-regen): autonomous spec polish + cross-library similarity audit by MarkusNeusinger · Pull Request #5714 · MarkusNeusinger/anyplot

MarkusNeusinger · 2026-05-05T17:27:55Z

Summary

Adds two autonomous pre-flight steps to daily-regen.yml that run before bulk-generate fans out, lifting two quality vectors that today only exist in the local /regen and /update skills into the cloud cadence.

Everything runs through claude-code-action on Claude Max OAuth — no extra API costs — same pinned SHA as impl-generate / impl-review / impl-repair.

What it does

For each spec the pick job selects:

Skip-gate (Bash, no LLM) — gh pr list --search "plots/<spec>/ in:files is:open". If any PR is open touching the spec, the polish step is skipped to avoid racing humans or stacking auto-polish PRs. The similarity audit still runs (it's read-only).
Spec polish (claude-code-action, --model haiku) — audits the spec across the five update.md §2 dimensions (wording, missing sections, tag completeness, tag quality, tag accuracy). If anything needs work, opens an auto-polish/<spec>/<timestamp> branch + PR with label auto-polish. Never pushes to main directly. Never auto-merges. PR awaits human review. If nothing needs work, prints NOOP and stops. continue-on-error: true so a transient action failure does not block the main pipeline.
Cross-library similarity audit (claude-code-action, --model haiku) — reads the 9 review.image_description blobs from plots/<spec>/metadata/python/*.yaml and clusters libraries that converged on the same data scenario / example domain / visual variant beyond what the spec dictated. Optionally drills into impl .py files for ambiguous clusters via the Read tool. Emits /tmp/change-requests.json keyed by library. Project-mandated constants (Okabe-Ito palette positions 1–7, plot size and aspect ratio, theme chrome) are explicitly excluded as cluster signals. continue-on-error: true; if the audit fails, the collect step falls back to empty change_requests.
Dispatch bulk-generate with hints — passes change_requests JSON to bulk-generate, which jq-extracts the per-library hint and forwards it as the new change_request input to impl-generate. The hint is staged to /tmp/anyplot-change-request.txt, where the updated impl-generate-claude.md picks it up and treats it as a hard requirement (mirroring regen.md §2c verbatim — hard requirement, no sibling reads, preserve review.strengths, override "no changes for sake of changes").

Why

Spec drift: specs are currently written once at creation and never revisited. Tag vocab evolves, sections go missing, wording grows vague. Polish-on-cycle keeps them sharp without manual maintenance, and at ~10 cycles/day across 300+ specs, each spec gets touched once a month — drift risk is low.
Silent convergence: without a similarity check, 9 libs can independently land on the same scenario / domain / variant, producing nine copies of the same chart in different engines — exactly the opposite of the catalog's purpose. The hint-injection breaks the cluster cleanly (one library per cluster, alphabetically later).

Model routing

The existing daily-regen model input (default haiku, choices haiku/sonnet/opus) is unchanged and still flows to bulk-generate → impl-generate / review / repair.
The two new pre-flight LLM steps hardcode --model haiku — they're narrow, cheap audits.

dry_run semantics

dry_run=true runs the read-only and decision-only steps so operators can preview what the cycle will do without committing anything:

Runs: pick, skip-gate, similarity audit (read-only), collect change_requests
Skipped: spec polish (would open a real PR — side effect), dispatch bulk-generate (would fan out 9 impl-generate jobs)

To preview spec polish in isolation, run a real (non-dry-run) cycle against a single spec: gh workflow run daily-regen.yml -f specification_id=<spec> -f model=haiku. Polish opens a PR; merge or close it manually.

Backwards compatibility

Both new inputs (change_request on impl-generate, change_requests on bulk-generate) default to empty ("" and '{}'). Existing manual triggers without these inputs behave byte-identically to today.

Risks + rollback

Auto-polish PRs accumulate if humans never review them. Skip-gate prevents duplicates per spec; the spec just doesn't get polished further until reviewed. Acceptable: human stays in control.
Spec polish prompt has hard rules: no changes to id / issue / created, no semantic changes (data shape, plot type, requirements). Reviewable in commit diffs; revert if anything slips.
Rollback: revert daily-regen.yml first. Downstream change_request[s] inputs default to empty; remaining changes are no-ops without daily-regen wiring.

Test plan

CI parses all four workflow YAMLs cleanly
Manual gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku -f dry_run=true — confirms pick + preflight-dispatch (skip-gate, similarity, collect) run, polish + bulk-generate dispatch are skipped (dry_run)
Manual gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku (no dry_run) — confirms the full chain: polish either NOOPs or opens a PR, similarity emits change_requests, bulk-generate fires, an impl-generate run with non-empty hint shows ::notice::Change request staged: …
On the auto-polish PR (if produced): id / issue / created unchanged; only wording / sections / tags polished; updated bumped

🤖 Generated with Claude Code

…ity audit Two new pre-flight steps run on every daily-regen cycle, before bulk-generate fans out: 1. **Spec polish** (claude-code-action, Haiku) — audits the picked spec for wording, missing sections, and tag hygiene against `update.md` Phase 2 dimensions. If any issue found, opens an `auto-polish/<spec>/<ts>` branch + PR (label: `auto-polish`). No direct push to main, no auto-merge — awaits human review. Skipped if any open PR already touches `plots/<spec>/`. 2. **Cross-library similarity audit** (claude-code-action, Haiku) — reads the 9 metadata `review.image_description` blobs and clusters libraries that converged on the same data scenario / domain / visual variant beyond what the spec dictated. Optionally drills into impl `.py` files for ambiguous clusters. Emits `/tmp/change-requests.json` keyed by library, mirroring `agentic/workflows/modules/regen/plan.py`'s schema. Project-mandated constants (Okabe-Ito palette, plot size / aspect ratio, theme chrome) are explicitly excluded as cluster signals. The resulting per-library hint is threaded through: daily-regen → bulk-generate → impl-generate → /tmp/anyplot-change-request.txt → impl-generate-claude.md handles it as a hard requirement (mirroring `regen.md` §2c verbatim: hard requirement, no sibling reads, preserve review.strengths) Pre-flight LLM steps hardcode `--model haiku`; the existing `inputs.model` selector still flows unchanged to the heavy downstream impl-generate / impl-review / impl-repair work. No extra API costs — everything runs through `claude-code-action` on Claude Max OAuth, same SHA as the existing workflows. Files: - prompts/workflow-prompts/spec-polish-claude.md (new) - prompts/workflow-prompts/impl-similarity-claude.md (new) - prompts/workflow-prompts/impl-generate-claude.md (CHANGE_REQUEST section) - .github/workflows/impl-generate.yml (change_request input + staging) - .github/workflows/bulk-generate.yml (change_requests JSON input + per-lib jq fan-out) - .github/workflows/daily-regen.yml (preflight-dispatch matrix job replaces dispatch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds two LLM-driven pre-flight checks to daily-regen so each scheduled regeneration can optionally polish the spec and inject cross-library divergence hints before fan-out into implementation generation.

Changes:

Added new Claude workflow prompts for autonomous spec polish and cross-library similarity auditing.
Threaded per-library change_request / change_requests inputs through daily-regen, bulk-generate, and impl-generate.
Updated the implementation-generation prompt so a staged divergence hint becomes a binding regeneration requirement.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`prompts/workflow-prompts/spec-polish-claude.md`	New prompt defining autonomous spec-polish behavior and PR creation flow.
`prompts/workflow-prompts/impl-similarity-claude.md`	New prompt defining the read-only cross-library similarity audit and JSON hint output.
`prompts/workflow-prompts/impl-generate-claude.md`	Adds instructions for consuming a staged divergence hint during regeneration.
`.github/workflows/impl-generate.yml`	Adds optional `change_request` input and stages it into `/tmp` for Claude.
`.github/workflows/daily-regen.yml`	Replaces direct dispatch with per-spec pre-flight polish/audit/collect/dispatch flow.
`.github/workflows/bulk-generate.yml`	Adds `change_requests` input, validates it, and forwards per-library hints to `impl-generate`.

+  # ============================================================================
+  preflight-dispatch:
    needs: pick
    if: ${{ needs.pick.outputs.count != '0' && !inputs.dry_run }}


+
+      - name: Spec polish (autonomous, opens PR — no auto-merge)
+        if: steps.gate.outputs.skip_polish == '0'
+        timeout-minutes: 15


+            Variables for this run:
+            - SPEC_ID: ${{ matrix.spec_id }}
+
+      - name: Cross-library similarity audit


…failures Address Copilot PR review on #5714: - Drop the job-level `!inputs.dry_run` gate so dry-runs can exercise skip-gate, similarity audit, and collect (the documented test path). Side-effect steps (polish, dispatch) are individually gated. - Add `continue-on-error: true` to spec-polish: it's an optional quality pass, a transient claude-code-action failure must not block the main regen pipeline. - Add `continue-on-error: true` to similarity audit: read-only, falls back to empty change_requests via the existing collect step's file-existence check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 5, 2026 17:27

Copilot started reviewing on behalf of MarkusNeusinger May 5, 2026 17:28 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

MarkusNeusinger merged commit 92fc47c into main May 5, 2026
7 checks passed

MarkusNeusinger deleted the feat/daily-regen-pre-flight branch May 5, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714

feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714
MarkusNeusinger merged 2 commits intomainfrom
feat/daily-regen-pre-flight

MarkusNeusinger commented May 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MarkusNeusinger commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

Why

Model routing

dry_run semantics

Backwards compatibility

Risks + rollback

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarkusNeusinger commented May 5, 2026 •

edited

Loading