agents

Autoresearch loop for `emoji-witty-zh-tw`

This repo includes a minimal autoresearch-style runner that can iterate on the witty emoji skill for multiple loops, benchmark each candidate, and keep or discard changes based on reasoning-gap metrics.

The implementation now follows the skill-local layout:

.agents/skills/emoji-witty-zh-tw/SKILL.md
.agents/skills/emoji-witty-zh-tw/scripts/autoresearch.py
.agents/skills/emoji-witty-zh-tw/references/emoji-witty-zh-tw/*.md

What it mutates

The runner intentionally limits edits to:

.agents/skills/emoji-witty-zh-tw/SKILL.md
.agents/skills/emoji-witty-zh-tw/references/emoji-witty-zh-tw/test-cases.md
.agents/skills/emoji-witty-zh-tw/references/emoji-witty-zh-tw/target-metrics.md

What it benchmarks

Each loop runs a fixed generator / solver / judge flow:

Generate emoji designs for the selected targets
Ask a full-tier solver and a mini-tier solver to explain them
Judge both solver outputs against the generator rationale
Aggregate a reasoning-gap objective
Keep or discard the candidate

Default backend matrix

The runner now defaults to:

generator: Codex (reasoning_effort=high)
judge: Codex (reasoning_effort=high)
solvers:
- Codex full (reasoning_effort=high)
- Copilot full (gpt-5.4)
- Gemini full (CLI default model unless overridden)
- Copilot mini (gpt-5.4-mini)

This keeps generation and judging anchored to Codex, while using a multi-provider solver matrix.

Run it

python3 .agents/skills/emoji-witty-zh-tw/scripts/autoresearch.py --iterations 10

Useful flags:

--baseline-only — run only the benchmark baseline, without mutation loops
--cases osaka,hongkong — choose the benchmark case pool
--output-dir .autoresearch-runs/emoji-witty-zh-tw — change where artifacts are written
--mutation-backend codex:full:mutation::high — choose the mutation backend
--generator-backend codex:full:generator::high — choose the generator backend
--judge-backend codex:full:judge::high — choose the judge backend
--solver-backend provider:tier:label[:model[:reasoning_effort]] — add or replace solver backends

Artifacts

Each run writes a timestamped directory under .autoresearch-runs/emoji-witty-zh-tw/ with:

config.json
baseline/
iteration-*/mutation.json
iteration-*/benchmark/
iteration-*/decision.json
run-summary.json

This is the repo's equivalent of an overnight autoresearch log: each iteration records the mutation hypothesis, benchmark result, and keep/discard decision.

More runner details live in:

.agents/skills/emoji-witty-zh-tw/references/emoji-witty-zh-tw/autoresearch.md

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.agents/skills		.agents/skills
.github		.github
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agents

Autoresearch loop for `emoji-witty-zh-tw`

What it mutates

What it benchmarks

Default backend matrix

Run it

Artifacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agents

Autoresearch loop for emoji-witty-zh-tw

What it mutates

What it benchmarks

Default backend matrix

Run it

Artifacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Autoresearch loop for `emoji-witty-zh-tw`

Packages