Skip to content

chenhangcuisg-code/MetaClaw

 
 

Repository files navigation

MetaClaw

Just talk to your agent — it learns and EVOLVES.

GitHub License MIT Fully Async No GPU Cluster Skill Evolution One-Click Deploy


🇨🇳 中文🇯🇵 日本語🇰🇷 한국어🇫🇷 Français🇩🇪 Deutsch🇪🇸 Español



Two commands. That's it.

metaclaw setup              # one-time config wizard
metaclaw start              # skills on, OpenClaw wired — ready to chat
metaclaw start --mode rl    # optional: + live RL training via Tinker
MetaClaw demo

🔥 News

  • [03/10/2026] v0.2 — One-click deployment via metaclaw CLI. Skills enabled by default, RL is now opt-in.
  • [03/09/2026] We release MetaClaw — Just talk to your agent and let it evolve automatically. NO GPU deployment required; just plug into the API.

🎥 Demo

metaclaw.mp4

📖 Overview

MetaClaw turns live conversations into continuous training data — automatically. Just talk to your agent as usual, and MetaClaw handles the learning loop behind the scenes.

It places your model behind an OpenAI-compatible proxy that intercepts interactions from OpenClaw, injects relevant skills at each step, and can optionally perform continuous fine-tuning through Tinker Cloud RL. Updated weights are hot-swapped seamlessly without interrupting the service.

There is no need to maintain a dedicated GPU cluster. MetaClaw works with any OpenAI-compatible LLM API out of the box, and optionally integrates Kimi-K2.5 (1T MoE) via Tinker for cloud-based LoRA training.

🤖 Key Features

One-click deployment

Configure once with metaclaw setup, then metaclaw start brings up the proxy, injects skills, and wires OpenClaw automatically. No manual shell scripts needed.

Two operating modes

Mode Default What it does
skills_only Proxy → your LLM API. Skills injected, auto-summarized after each session. No GPU/Tinker required.
rl off Proxy → Tinker cloud RL. Full training loop with PRM scoring and skill evolution from failures.

Skill injection

At every turn, MetaClaw retrieves the most relevant skill instructions and injects them into the agent's system prompt. Immediate behavior improvement without retraining.

Automatic skill summarization

After each conversation, the same LLM you're already using analyzes the session and distills new skills automatically. With RL enabled, a dedicated judge model extracts skills from failed episodes.

No GPU cluster required

In skills_only mode, only a network connection is needed. RL training is offloaded to Tinker cloud.

Two learning modes

MetaClaw supports both:

  • RL (GRPO) for learning from implicit feedback signals
  • On-Policy Distillation (OPD) for distilling a larger teacher model into the student on-policy

In OPD mode, the student generates responses as usual, and a teacher model provides per-token log-probabilities on those same responses. The teacher logprobs are passed to the loss function (e.g., cispo) so the student learns to match the teacher's distribution. The teacher must be served behind an OpenAI-compatible /v1/completions endpoint (e.g., vLLM, SGLang).

Asynchronous by design

Serving, reward modeling, and training are fully decoupled. The agent continues responding while scoring and optimization run in parallel.


🚀 Quick Start

1. Install

pip install -e .            # skills_only mode (lightweight)
pip install -e ".[rl]"      # + RL training support (torch, transformers, tinker)
pip install -e ".[evolve]"  # + skill evolution via OpenAI-compatible LLM

2. Configure

metaclaw setup

The interactive wizard will ask you to choose your LLM provider (Kimi, Qwen, or custom), enter your API key, and optionally enable RL training.

3. Start

metaclaw start

That's it. MetaClaw starts the proxy, automatically configures OpenClaw to use it, and restarts the gateway. Open OpenClaw and start chatting — skills are injected at every turn, and the session is automatically summarized into new skills when you're done.


🛠️ CLI Reference

metaclaw setup              # Interactive first-time configuration wizard
metaclaw start              # Start MetaClaw (proxy + optional RL)
metaclaw start --mode rl    # Force RL mode for this session
metaclaw stop               # Stop a running MetaClaw instance
metaclaw status             # Check proxy health and running mode
metaclaw config show        # View current configuration
metaclaw config KEY VALUE   # Set a config value

Common config keys:

metaclaw config rl.enabled true           # Enable RL training
metaclaw config rl.tinker_api_key sk-...  # Set Tinker key
metaclaw config skills.auto_evolve false  # Disable auto skill summarization
metaclaw config proxy.port 31000          # Change proxy port

⚙️ Configuration

Configuration lives in ~/.metaclaw/config.yaml, created by metaclaw setup.

mode: skills_only          # "skills_only" | "rl"

llm:
  provider: kimi            # kimi | qwen | openai | custom
  model_id: moonshotai/Kimi-K2.5
  api_base: https://api.moonshot.cn/v1
  api_key: sk-...

proxy:
  port: 30000

skills:
  enabled: true
  dir: ~/.metaclaw/skills   # your skill library
  retrieval_mode: template  # template | embedding
  top_k: 6
  task_specific_top_k: 10   # cap task-specific skills (default 10)
  auto_evolve: true         # auto-summarize skills after each session

rl:
  enabled: false            # set to true to enable RL training
  model: moonshotai/Kimi-K2.5
  tinker_api_key: ""
  prm_url: https://api.openai.com/v1
  prm_model: gpt-5.2
  prm_api_key: ""
  lora_rank: 32
  batch_size: 4
  resume_from_ckpt: ""      # optional checkpoint path to resume training
  evolver_api_base: ""      # leave empty to reuse llm.api_base
  evolver_api_key: ""
  evolver_model: gpt-5.2

opd:
  enabled: false            # set to true to enable OPD (teacher distillation)
  teacher_url: ""           # teacher model base URL (OpenAI-compatible /v1/completions)
  teacher_model: ""         # teacher model name (e.g., Qwen/Qwen3-32B)
  teacher_api_key: ""       # teacher model API key
  kl_penalty_coef: 1.0      # KL penalty coefficient for OPD

max_context_tokens: 20000   # prompt token cap before truncation

💪 Skills

Skills are short Markdown instructions injected into the agent's system prompt at each turn. They live in your skills directory (~/.metaclaw/skills/ by default), organized as individual SKILL.md files.

Skill auto-summarization runs after each conversation. The LLM you configured analyzes what happened and generates new skills automatically. No manual curation needed — the library grows with your usage.

To pre-load the built-in skill bank (40+ skills across coding, security, agentic tasks, etc.):

cp -r memory_data/skills/* ~/.metaclaw/skills/

🔬 Advanced: RL Mode

Enable RL training to continuously fine-tune the model from live conversations:

metaclaw config rl.enabled true
metaclaw config rl.tinker_api_key sk-...
metaclaw config rl.prm_url https://api.openai.com/v1
metaclaw config rl.prm_api_key sk-...
metaclaw start

In RL mode:

  • Each conversation turn is tokenized and submitted as a training sample
  • A judge LLM (PRM) scores responses asynchronously
  • Tinker cloud runs LoRA fine-tuning; updated weights are hot-swapped every batch_size samples
  • A dedicated evolver LLM extracts new skills from failed episodes

Programmatic rollout (no OpenClaw TUI needed): set openclaw_env_data_dir to a directory of JSONL task files:

{"task_id": "task_1", "instruction": "Register the webhook at https://example.com/hook"}

🔬 Advanced: OPD Mode

On-Policy Distillation (OPD) lets you distill a larger teacher model into the student while it trains on-policy. The student generates responses as usual; the teacher provides per-token log-probabilities on those same responses. A KL penalty steers the student toward the teacher's distribution.

metaclaw config opd.enabled true
metaclaw config opd.teacher_url http://localhost:8082/v1
metaclaw config opd.teacher_model Qwen/Qwen3-32B
metaclaw config opd.kl_penalty_coef 1.0
metaclaw start --mode rl

The teacher must be served behind an OpenAI-compatible /v1/completions endpoint (e.g., vLLM, SGLang). OPD can be combined with PRM scoring — both run asynchronously.

See examples/run_conversation_opd.py for a programmatic example and scripts/run_openclaw_tinker_opd.sh for a ready-made launch script.


📚 Citation

@misc{xia2026metaclaw,
  author       = {Xia, Peng and Chen, Jianwen and Yang, Xinyu and Tu, Haoqin and Han, Siwei and Qiu, Shi and Zheng, Zeyu and Xie, Cihang and Yao, Huaxiu},
  title        = {MetaClaw},
  year         = {2026},
  organization = {GitHub},
  url          = {https://github.com/aiming-lab/MetaClaw},
}

🙏 Acknowledgements

MetaClaw builds on top of the following open-source projects:


📄 License

This project is licensed under the MIT License.

About

Just talk to your agent — it learns and EVOLVES.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.3%
  • Shell 0.7%