Detect recurring failures in scheduled cron jobs and reusable skills, draft improved prompts automatically via Claude, validate them statically and against historical runs, and ship them only with human approval.
Part of The Agent Crafting Table — standalone agent system components for Claude Code.
- No feedback from scoring to proposals — scoring-only systems label jobs as broken forever.
variant-gen.jscloses the loop by drafting concrete fixes the moment a failure mode recurs. - Blind LLM rewrites — handing Claude an old prompt and "please fix" produces variants that look plausible but re-introduce bugs from other jobs or drop the success marker. 5 static gates + dry-run catch these before humans see them.
- Cross-contamination between jobs — fixing job A by borrowing an approach from job B used to re-introduce a pattern that broke job C.
variant-test.jschecks every candidate against every other job'srequired_absentset. - Silent prompt drift — without a score on whether a change would have fixed the actual failures, reviewers LGTM any plausible diff. The dry-run forces the question.
- Dead-code skills — procedures in
memory/skills/with no usage signal silently decay.skill-log.jsmakes their health observable and feeds them into the same refinement pipeline.
src/
trace-extract.js # Converts raw cron logs → structured trace records
instruction-refine.js # Auto-patches low-risk failures (3+ consecutive); escalates to variant-gen
variant-gen.js # Reflexion pattern: one diagnosis + one targeted change per proposal
variant-test.js # Dry-runs candidates against historical traces, cross-contamination check
cron-eval.js # Static 5-gate pre-flight validator for prompt patches
evolution-review.js # Posts compact diff to Discord/stdout; applies approved variants
skill-log.js # Records skill invocation outcomes, flags recurring failures
skill-create.js # Creates/updates skill files, rebuilds INDEX.md
examples/
cron-eval.json # Example eval ruleset to copy to data/cron-eval.json
failure-classifiers.json # Example project-specific failure classifiers
variant-constraints.json # Example per-project constraints for variant generation
skill-example.md # Example skill file format
assets/
architecture.md # System design notes
This kit layers on top of cron-framework or an equivalent system that produces:
crons/jobs.json— job registry with{ id, name, message, ... }crons/logs/YYYY-MM-DD-<prefix>.log— per-run logsdata/cron-performance.json— scored run history keyed by job id
The skill-level loop (skill-log, skill-create) works independently of cron-framework.
- Node.js 18+
claudeCLI on$PATH— only required byvariant-gen.js- No database, no network services
All seven scripts are standalone — no npm install needed.
cp examples/cron-eval.json data/cron-eval.jsonEdit data/cron-eval.json to define your project's failure patterns, success markers, and per-mode fix requirements. This is the file the loop reads most often.
cp examples/failure-classifiers.json data/failure-classifiers.json
cp examples/variant-constraints.json data/variant-constraints.jsonfailure-classifiers.json— project-specific failure fingerprints (paths, error strings, HTTP codes)variant-constraints.json— constraints appended to the Claude meta-prompt during variant generation (e.g. "do not reference paths under /old/config")
Add this to your cron schedule (daily is typical):
node src/trace-extract.js --all --days=1
node src/instruction-refine.js --apply # auto-patch low-risk failures
node src/variant-gen.js --pending # draft single-change diffs for 3+ consecutive failures
node src/evolution-review.js # post to Discord or stdout for human approval
node src/skill-log.js --flag-recurring
node src/skill-create.js --refresh-indexWhen a variant is approved:
node src/evolution-review.js --apply-variant <candidate-id> <variant-id>In crons/jobs.json, add an autofixLevel field to each job:
"autofixLevel": "safe"—instruction-refine.jscan auto-apply mechanical patches (utility crons: backups, health checks, status posts)"autofixLevel": "review"(default if unset) — variants always go to human approval (dev agents, customer-facing jobs, anything that touches money)
Jobs with "critical": true are never auto-patched regardless of autofixLevel.
cron logs → trace-extract.js → execution-traces.json
↓
cron-performance.json
↓
instruction-refine.js
(auto-patch low-risk, 3+ consecutive)
↓ (medium/high-risk or review-level)
variant-gen.js → candidate-variants.json
(Reflexion: one diagnosis + one change)
↓
cron-eval.js (5 static gates)
variant-test.js (dry-run + cross-contamination)
↓
evolution-review.js → human reviews diff
(reply "apply <id>" / "skip <id>")
↓ (approved)
jobs.json updated
| Variable | Default | Description |
|---|---|---|
WORKSPACE_DIR |
cwd | Project root |
CRON_MODEL |
sonnet |
Claude model for variant-gen.js |
SUCCESS_MARKER |
HEARTBEAT_OK |
String that marks a successful run |
MAX_PER_RUN |
3 | Max job+mode pairs per --pending invocation |
DISCORD_POST |
— | Path to node <script> <channel> <message> helper |
REVIEW_CHANNEL |
— | Discord channel ID for variant reviews |
IMPROVEMENTS_FILE |
memory/references/cron-improvements.md |
Human-review log for instruction-refine.js |
PATCH_LOG_FILE |
data/patch-log.md |
Applied-patch log for instruction-refine.js |
MAX_PATCH_CHARS |
500 | Max chars a single auto-patch may add |
MAX_PATCHES_PER_RUN |
3 | Max auto-patches per --apply invocation |
SKILLS_DIR |
memory/skills |
Where skill markdown files live |
SKILL_LOG_MAX |
100 | Rolling invocation window per skill |
SKILL_LOG_FAIL_THRESHOLD |
3 | Consecutive failures before a skill is flagged |
# Refine a specific job+failure-mode
node src/variant-gen.js --job-id daily-status-report --mode timeout
# Dry-run a hand-written candidate against history
node src/variant-test.js --job-id daily-status-report \
--variant-message "$(cat proposed.txt)"
# Static eval only
node src/cron-eval.js --job-id daily-status-report --mode timeout \
--original-message "$(cat old.txt)" --patched-message "$(cat new.txt)"
# List pending reviews
node src/evolution-review.js --list
# Apply an approved variant
node src/evolution-review.js --apply-variant <candidate-id> <variant-id>data/execution-traces.json— structured, rolling-window traces per job (~KB per job)data/candidate-variants.json— every proposed variant with scores; status:pending_review → review_posted → applied | skipped; stale records pruned after 14 daysdata/skill-outcomes.json— per-skill invocation history and consecutive failure countsmemory/skills/INDEX.md— auto-generated table of contents for all skill files
- Failure patterns — patched message must not contain known-bad patterns
- Required absent — patterns that must never appear in this job's prompt
- Mode-specific fix content — for the target failure mode, patched message must contain the required fix
- Per-job success indicators — job-specific success markers must be preserved
- Default success indicators — fleet-wide success signal (e.g.
HEARTBEAT_OK) enforced automatically
variant-test.js dry-runs a candidate against the past 30 traces for that job:
- Coverage score (0.0–1.0) — what fraction of past failure modes would this variant have addressed?
- Cross-contamination check — does this variant contain any pattern in another job's
required_absent? - Verdict —
pass(score ≥ 0.7, no contamination) /warn/fail
- Humans approve every prompt change.
evolution-review.jsnever auto-applies. - Static analysis only. No variant is ever executed against production.
- English-language substring checks. Patterns are literal substrings. For regex, extend
classifyFailure()intrace-extract.js. - Variant generation costs LLM tokens. Each
--pendingrun makes up toMAX_PER_RUNClaude calls.
variant-gen.jsspawnsclaude --dangerously-skip-permissions. The model sees the full job message — don't point this at jobs whose prompts contain secrets.evolution-review.js --apply-variantoverwritescrons/jobs.jsonin-place. Commit before running so you cangit diffand revert.- Variant messages are not sandboxed — once applied, the next scheduled tick runs whatever the approved variant says. The human reviewer is the final gate.
skill-log.jsstores--note/--contextvalues in plain JSON. Avoid piping raw user input into these fields.