Skip to content

Jason-xy/AutoRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AutoRL 🚦

Agent Skill for governed RL reward design, curriculum design, and hyperparameter tuning.

status runtime scope cache agents omx

AutoRL is a contract-first Agent Skill / Codex skill package for RL engineers who want agents to automate bounded reinforcement-learning task iteration in oh-my-codex (OMX) workflows. It helps agents reason about, constrain, and verify changes around an existing training stack, especially when iterating on:

  • reward design;
  • curriculum design;
  • trainer and optimizer hyperparameters;
  • other explicitly approved training-surface changes.

It is not a trainer or runtime. AutoRL is a reusable control and documentation layer for agent-assisted RL improvement: clarify the mission, bind the task contract, execute only through a known external runtime, collect evidence, verify the result, then decide what happens next.

AutoRL is intentionally skill-only and contract-first. This repository does not provide an embedded RL runtime, framework adapter, scheduler, queue, dispatcher, retry daemon, long-running monitor, experiment launcher, checkpoint store, or log backend. Instead, it provides contracts, templates, gates, knowledge packs, and workflow guardrails for agents working with your existing RL project.


✨ Why AutoRL?

Reinforcement-learning projects fail in ways that ordinary software workflows often miss:

  • reward hacking can look like progress until behavior is inspected;
  • β€œone more run” quickly becomes untraceable experiment history;
  • hidden launch commands make results hard to reproduce;
  • agents can drift from analysis into unauthorized training edits;
  • long-running jobs can tempt tools into acting like implicit schedulers or monitors.

AutoRL addresses those failure modes with one rule:

Every loop must be bounded by a task contract, grounded in a task-local cache, executed through an external runbook, and closed with artifact-backed evidence, verification, and a decision.


🌟 Highlights

Highlight What it gives you
πŸ”’ Contract-first gates Plan and Execute proceed only after the task contract, metric spec, runbook, surface map, and cache state agree.
πŸ—‚οΈ Task-local .autorl/ cache Task facts, runbooks, metrics, loop state, manifests, and decisions live under .autorl/tasks/<task_slug>/, not inside the reusable skill package.
πŸ§ͺ Evidence-first decisions Each iteration should produce retained artifacts, a feedback packet, a verification verdict, and a continue / stop / rollback / narrow / escalate decision.
πŸ›‘οΈ External-runtime boundary AutoRL records commands, artifacts, stop rules, resource facts, and outcomes; real training still runs through your validated external runtime.
🌲 Worktree isolation Baseline-relative training-surface edits use fresh task-cache worktrees after explicit RL-loop entry authorization.
🧠 Source-traceable RL knowledge Domain packs for mobile robot RL, manipulation RL, and WBC / legged RL help diagnose reward, curriculum, and tuning choices with applicability limits.
🚫 Anti-scheduler by design Helpers may record bounded file-backed leases, but AutoRL must not become a scheduler, queue, monitor, dispatcher, or retry service.

πŸš€ Quick Start

AutoRL is a skill and contract surface for an OMX-driven agent workflow. Use it from the RL repository you want to improve; do not treat this repository as a standalone trainer.

1. Install the required orchestration layer

AutoRL expects oh-my-codex (OMX) to be available. OMX provides the $autoresearch workflow that drives the validator-gated research loop around this skill.

omx setup
omx doctor

Use the skill workflow invocation form

omx --madmax --xhigh
$autoresearch   "Use AutoRL, xxx"     # supported workflow skill

2. Expose the AutoRL skill

Make this repository available to your Codex skills environment. The skill entrypoint is:

SKILL.md  ->  name: autorl

3. Start from your target RL repository

Run from the repository that contains the RL task you want to improve. Use OMX $autoresearch as the outer loop and ask it to use AutoRL as the contract and RL-iteration skill surface.

Recommended intake:

$deep-interview --autoresearch
Mission: use the AutoRL skill to govern a bounded RL iteration.
Goal: improve rough-terrain locomotion success rate.
Allowed surfaces: reward terms and curriculum only.
Budget: at most 3 runs, no more than 2 hours each.
Runtime: use the project training command, but verify the runbook first.
Success: higher success rate without worse fall rate or unsafe behavior.
Validation mode: prompt-architect-artifact.

Then execute the validator-gated loop:

$autoresearch
Use the AutoRL skill to discover the RL stack, materialize the .autorl task cache, bind the task contract, and proceed only through the validated external runbook.

4. Let AutoRL form the task contract

Within the $autoresearch loop, AutoRL guides the workflow through:

  1. Discover the RL stack and relevant knowledge packs.
  2. Interview for mission, constraints, non-goals, metrics, budgets, approvals, and runtime details.
  3. Materialize concrete task-cache artifacts under .autorl/tasks/<task_slug>/.
  4. Contract the approved scope, allowed surfaces, runbook, metric spec, and stop rules.

5. Enter the RL loop only after explicit authorization

Training execution or training-surface modification starts only after the user explicitly asks to enter the RL training loop and the task contract adjudicates Execute as allowed.

6. Close every loop with evidence

After an external run completes, AutoRL returns through:

Evidence -> Verify -> Decide -> Handoff

No hidden β€œnext run.” No silent scope expansion. No unreviewed training edits.


🧭 Workflow at a Glance

flowchart LR
    A[Discover] --> B[Interview]
    B --> C[Materialize .autorl task cache]
    C --> D[Contract]
    D --> E[Plan one bounded change]
    E --> F[Execute via external runtime]
    F --> G[Evidence]
    G --> H[Verify]
    H --> I[Decide]
    I --> J{Continue?}
    J -->|yes, after gate| E
    J -->|stop / rollback / escalate| K[Handoff]
Loading

Long-running training uses an external orchestration handoff. AutoRL records loop state, run params, resource facts, artifact expectations, stop/cancel details, and externally consumed completion/status facts. Waiting and next-operation timing belong to the external runtime, operator, or OMX $autoresearch workflowβ€”not to AutoRL core.


🧱 The .autorl/ Task Cache

AutoRL separates reusable package logic from task-specific operational facts.

A materialized task typically looks like this:

.autorl/
  tasks/
    <task_slug>/
      task-profile.md        # mission, constraints, non-goals
      surface-map.md         # allowed reward/curriculum/hyperparameter surfaces
      runbook.md             # concrete external runtime commands and artifacts
      metric-spec.md         # success, regression, and rollback metrics
      task-contract.md       # package-local Plan / Execute adjudication record
      state.json             # compact cache facts, schema, freshness, loop refs
      artifact-manifest.md   # retained logs, checkpoints, metrics, outputs
      worktrees/             # fresh isolated worktrees for authorized edits
      loops/                 # loop state, run params, evidence, decisions

This cache is the operational authority for one concrete task instance. It does not redefine the AutoRL skill package.


🧠 Knowledge Packs

AutoRL includes task-agnostic RL guidance that agents can cite during planning, evidence analysis, verification, and decision-making.

Pack Use it for
modules/knowledge/mobile-robot-rl/ Navigation, obstacle avoidance, social navigation, exploration, docking, and learned local planning.
modules/knowledge/manipulation-rl/ Dexterous hands, grasp / lift / place, insertion, assembly, tactile tasks, bimanual tasks, and deformable-object tasks.
modules/knowledge/wbc-rl/ Whole-body control, humanoids, quadrupeds, command tracking, terrain curricula, and sim-to-real robustness.
modules/knowledge/stack-guidance/ Framework discovery and stack-specific non-destructive validation patterns.

Knowledge packs are advisory evidence, not governing authority. When they influence a plan or verdict, preserve their rule IDs, source IDs, applicability notes, exclusions, evidence levels, and uncertainty notes. They must not override the task contract, metric spec, runbook, or observed evidence.


🧩 Repository Map

.
β”œβ”€β”€ SKILL.md       # Codex skill entrypoint and behavioral contract
β”œβ”€β”€ README.md     # project overview and quick start
β”œβ”€β”€ decisions/    # ADR-style rationale
β”œβ”€β”€ docs/         # architecture, examples, feature list, module docs
└── modules/
    β”œβ”€β”€ governance/  # authority model, lifecycle gates, contract adjudication
    β”œβ”€β”€ task/        # task-cache layout, materialization, state, worktrees
    β”œβ”€β”€ planning/    # intake, discovery, iteration plans, evidence, verification, decisions
    β”œβ”€β”€ execution/   # external invocation and resource boundaries, not a runtime
    └── knowledge/   # stack guidance, domain packs, glossary, deferred-runtime guardrails

Recommended reading path:


βœ… What AutoRL Does / Does Not Do

AutoRL does AutoRL does not
Clarifies mission, constraints, metrics, budgets, and allowed surfaces. Guess hidden runtime commands or task facts.
Materializes task-local contracts and cache artifacts. Store downstream task profiles inside the skill package.
Plans one bounded reward, curriculum, or hyperparameter change. Silently expand into unrelated training-surface edits.
Requires explicit RL-loop entry authorization before execution. Launch training just because a plan exists.
Invokes only validated external runbooks. Implement an embedded trainer, adapter, scheduler, queue, or monitor.
Compresses artifacts into feedback and verification records. Treat a reward curve as sufficient proof of success.
Decides whether to continue, stop, roll back, narrow scope, or escalate. Auto-resume or keep working while an external job runs.

πŸ›‘οΈ Design Promises

AutoRL should remain:

  • Task-agnostic β€” reusable across downstream RL projects.
  • Contract-first β€” phase transitions are explicit and auditable.
  • Evidence-backed β€” claims cite artifacts, metrics, logs, or retained records.
  • External-runtime friendly β€” built to work around your existing launcher, cluster, container, or job system.
  • Anti-bloat β€” no package-owned runtime engine unless a future ADR explicitly changes the project direction.
  • Agent-safe β€” bounded subagents may inspect, prepare, collect, and verify, but they do not independently authorize Plan or Execute.

🀝 Contributing

High-quality contributions should strengthen the contract surface without sneaking in runtime ownership.

Good first contribution areas include:

  • clearer task-cache examples;
  • additional non-destructive validation probes;
  • better metric and evidence templates;
  • source-traceable knowledge cards with applicability limits;
  • documentation and diagrams that make gates easier to follow.

Please avoid adding runtime engines, framework adapters, launch schedulers, long-running monitors, artifact backends, or hidden task-specific assumptions. If the project needs to move beyond the current skill-only boundary, introduce that transition through an ADR first.


⭐ Project Philosophy

AutoRL aims to make RL iteration feel less like gambling and more like engineering discipline:

one task
one contract
one bounded change
one external runbook
one evidence bundle
one verified decision

If you want AI agents to improve RL systems safely, reproducibly, and with auditability, AutoRL is designed for that goal.

About

Agent Skill for governed RL reward design, curriculum design, and hyperparameter tuning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors