Research direction: HyperEnv - Environment and curriculum synthesis from captured traces

### What this is

A **direction-setting issue**, not a roadmap checklist. Items here are research trajectories ClawLoop is actively exploring. Shape will change as we learn.

### The question

Agent learning plateaus when the training distribution stops surprising the agent. Real deployments produce a firehose of traces — successes, failures, tool calls, reward signals — that describe the world the agent actually lives in. The open question is: can we use that trace data to **synthesize environments, world models, and curricula** that keep the agent learning past where a fixed benchmark tops out?

### Concrete research threads

- **Failure-driven env synthesis.** Cluster real failures (error taxonomy), generate targeted tasks that exercise those failures, measure whether targeted training closes the gap faster than random sampling.
- **Curriculum from traces.** Order synthesized tasks by difficulty inferred from observed reward distributions, not hand-tuned schedules. Compare against fixed curricula and random sampling.
- **World-model distillation.** Learn approximate environment dynamics from traces so learners can train against simulations when live envs are expensive, slow, or irreversible. What's the fidelity floor before transfer breaks down?
- **Coverage metrics.** Synthesized envs are only useful if they explore regions the real distribution under-samples. Need a measurement story before we claim any benefit.

### Why this is separate from learner tuning

The companion issue (#54) is about tuning how learners *learn*. This one is about tuning what they learn *from*. Different objectives, different data diets, different literature — keeping them separate lets each be evaluated on its own terms. They eventually co-evolve: harder envs drive better learners, better learners surface new failure modes, new failures seed the next round of envs.

### Prior art worth reading

- PAIRED, Minimax regret env design, unsupervised env design literature.
- Open-Ended Learning (POET, PLR, ACCEL).
- Synthetic data / self-play work in LLM-agent settings.

### Related

- Companion issue on learner tuning: #54.
- Error taxonomy work is the natural seed for failure-driven env synthesis.

### Engage

Comment with papers, critiques, or pointers. If you want to collaborate, reach out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research direction: HyperEnv - Environment and curriculum synthesis from captured traces #55

What this is

The question

Concrete research threads

Why this is separate from learner tuning

Prior art worth reading

Related

Engage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research direction: HyperEnv - Environment and curriculum synthesis from captured traces #55

Description

What this is

The question

Concrete research threads

Why this is separate from learner tuning

Prior art worth reading

Related

Engage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions