feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714
Merged
MarkusNeusinger merged 2 commits intomainfrom May 5, 2026
Merged
feat(daily-regen): autonomous spec polish + cross-library similarity audit#5714MarkusNeusinger merged 2 commits intomainfrom
MarkusNeusinger merged 2 commits intomainfrom
Conversation
…ity audit
Two new pre-flight steps run on every daily-regen cycle, before bulk-generate
fans out:
1. **Spec polish** (claude-code-action, Haiku) — audits the picked spec for
wording, missing sections, and tag hygiene against `update.md` Phase 2
dimensions. If any issue found, opens an `auto-polish/<spec>/<ts>` branch +
PR (label: `auto-polish`). No direct push to main, no auto-merge — awaits
human review. Skipped if any open PR already touches `plots/<spec>/`.
2. **Cross-library similarity audit** (claude-code-action, Haiku) — reads the
9 metadata `review.image_description` blobs and clusters libraries that
converged on the same data scenario / domain / visual variant beyond what
the spec dictated. Optionally drills into impl `.py` files for ambiguous
clusters. Emits `/tmp/change-requests.json` keyed by library, mirroring
`agentic/workflows/modules/regen/plan.py`'s schema. Project-mandated
constants (Okabe-Ito palette, plot size / aspect ratio, theme chrome) are
explicitly excluded as cluster signals.
The resulting per-library hint is threaded through:
daily-regen → bulk-generate → impl-generate → /tmp/anyplot-change-request.txt
→ impl-generate-claude.md handles it as a hard requirement (mirroring
`regen.md` §2c verbatim: hard requirement, no sibling reads, preserve
review.strengths)
Pre-flight LLM steps hardcode `--model haiku`; the existing `inputs.model`
selector still flows unchanged to the heavy downstream impl-generate /
impl-review / impl-repair work. No extra API costs — everything runs through
`claude-code-action` on Claude Max OAuth, same SHA as the existing workflows.
Files:
- prompts/workflow-prompts/spec-polish-claude.md (new)
- prompts/workflow-prompts/impl-similarity-claude.md (new)
- prompts/workflow-prompts/impl-generate-claude.md (CHANGE_REQUEST section)
- .github/workflows/impl-generate.yml (change_request input + staging)
- .github/workflows/bulk-generate.yml (change_requests JSON input + per-lib jq fan-out)
- .github/workflows/daily-regen.yml (preflight-dispatch matrix job replaces dispatch)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds two LLM-driven pre-flight checks to daily-regen so each scheduled regeneration can optionally polish the spec and inject cross-library divergence hints before fan-out into implementation generation.
Changes:
- Added new Claude workflow prompts for autonomous spec polish and cross-library similarity auditing.
- Threaded per-library
change_request/change_requestsinputs throughdaily-regen,bulk-generate, andimpl-generate. - Updated the implementation-generation prompt so a staged divergence hint becomes a binding regeneration requirement.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
prompts/workflow-prompts/spec-polish-claude.md |
New prompt defining autonomous spec-polish behavior and PR creation flow. |
prompts/workflow-prompts/impl-similarity-claude.md |
New prompt defining the read-only cross-library similarity audit and JSON hint output. |
prompts/workflow-prompts/impl-generate-claude.md |
Adds instructions for consuming a staged divergence hint during regeneration. |
.github/workflows/impl-generate.yml |
Adds optional change_request input and stages it into /tmp for Claude. |
.github/workflows/daily-regen.yml |
Replaces direct dispatch with per-spec pre-flight polish/audit/collect/dispatch flow. |
.github/workflows/bulk-generate.yml |
Adds change_requests input, validates it, and forwards per-library hints to impl-generate. |
| # ============================================================================ | ||
| preflight-dispatch: | ||
| needs: pick | ||
| if: ${{ needs.pick.outputs.count != '0' && !inputs.dry_run }} |
|
|
||
| - name: Spec polish (autonomous, opens PR — no auto-merge) | ||
| if: steps.gate.outputs.skip_polish == '0' | ||
| timeout-minutes: 15 |
| Variables for this run: | ||
| - SPEC_ID: ${{ matrix.spec_id }} | ||
|
|
||
| - name: Cross-library similarity audit |
…failures Address Copilot PR review on #5714: - Drop the job-level `!inputs.dry_run` gate so dry-runs can exercise skip-gate, similarity audit, and collect (the documented test path). Side-effect steps (polish, dispatch) are individually gated. - Add `continue-on-error: true` to spec-polish: it's an optional quality pass, a transient claude-code-action failure must not block the main regen pipeline. - Add `continue-on-error: true` to similarity audit: read-only, falls back to empty change_requests via the existing collect step's file-existence check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two autonomous pre-flight steps to
daily-regen.ymlthat run beforebulk-generatefans out, lifting two quality vectors that today only exist in the local/regenand/updateskills into the cloud cadence.Everything runs through
claude-code-actionon Claude Max OAuth — no extra API costs — same pinned SHA asimpl-generate/impl-review/impl-repair.What it does
For each spec the
pickjob selects:Skip-gate (Bash, no LLM) —
gh pr list --search "plots/<spec>/ in:files is:open". If any PR is open touching the spec, the polish step is skipped to avoid racing humans or stacking auto-polish PRs. The similarity audit still runs (it's read-only).Spec polish (claude-code-action,
--model haiku) — audits the spec across the fiveupdate.md§2 dimensions (wording, missing sections, tag completeness, tag quality, tag accuracy). If anything needs work, opens anauto-polish/<spec>/<timestamp>branch + PR with labelauto-polish. Never pushes to main directly. Never auto-merges. PR awaits human review. If nothing needs work, printsNOOPand stops.continue-on-error: trueso a transient action failure does not block the main pipeline.Cross-library similarity audit (claude-code-action,
--model haiku) — reads the 9review.image_descriptionblobs fromplots/<spec>/metadata/python/*.yamland clusters libraries that converged on the same data scenario / example domain / visual variant beyond what the spec dictated. Optionally drills into impl.pyfiles for ambiguous clusters via the Read tool. Emits/tmp/change-requests.jsonkeyed by library. Project-mandated constants (Okabe-Ito palette positions 1–7, plot size and aspect ratio, theme chrome) are explicitly excluded as cluster signals.continue-on-error: true; if the audit fails, the collect step falls back to empty change_requests.Dispatch bulk-generate with hints — passes
change_requestsJSON to bulk-generate, which jq-extracts the per-library hint and forwards it as the newchange_requestinput toimpl-generate. The hint is staged to/tmp/anyplot-change-request.txt, where the updatedimpl-generate-claude.mdpicks it up and treats it as a hard requirement (mirroringregen.md§2c verbatim — hard requirement, no sibling reads, preservereview.strengths, override "no changes for sake of changes").Why
Model routing
daily-regenmodelinput (defaulthaiku, choiceshaiku/sonnet/opus) is unchanged and still flows to bulk-generate → impl-generate / review / repair.--model haiku— they're narrow, cheap audits.dry_run semantics
dry_run=trueruns the read-only and decision-only steps so operators can preview what the cycle will do without committing anything:To preview spec polish in isolation, run a real (non-dry-run) cycle against a single spec:
gh workflow run daily-regen.yml -f specification_id=<spec> -f model=haiku. Polish opens a PR; merge or close it manually.Backwards compatibility
Both new inputs (
change_requeston impl-generate,change_requestson bulk-generate) default to empty (""and'{}'). Existing manual triggers without these inputs behave byte-identically to today.Risks + rollback
id/issue/created, no semantic changes (data shape, plot type, requirements). Reviewable in commit diffs; revert if anything slips.daily-regen.ymlfirst. Downstreamchange_request[s]inputs default to empty; remaining changes are no-ops without daily-regen wiring.Test plan
gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku -f dry_run=true— confirmspick+preflight-dispatch(skip-gate, similarity, collect) run, polish + bulk-generate dispatch are skipped (dry_run)gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku(no dry_run) — confirms the full chain: polish either NOOPs or opens a PR, similarity emits change_requests, bulk-generate fires, an impl-generate run with non-empty hint shows::notice::Change request staged: …id/issue/createdunchanged; only wording / sections / tags polished;updatedbumped🤖 Generated with Claude Code