Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
LLM agents can be augmented with a skill library — a persistent memory of reusable strategies. Using such a library requires three coupled capabilities: selecting a relevant skill, utilizing it during execution, and distilling new skills from experience. Prior methods optimize these in isolation with separate reward signals, causing conflicting evolution.
Skill1 trains a single policy (Qwen2.5-7B) via RL (GRPO) to co-evolve all three capabilities using only one task-outcome reward. Credit assignment is achieved by decomposing the reward into a low-frequency trend (credits selection) and high-frequency variation (credits distillation).
Task → [Selection] → [Utilization] → [Distillation] → Skill Library
↑ |
└───────────────────────────────────────────────────┘
- Skill Selection — Policy generates a query, retrieves candidates via a frozen encoder, and re-ranks them.
- Skill Utilization — Policy interacts with the environment conditioned on the selected skill.
- Skill Distillation — Policy reflects on the trajectory and writes a new reusable skill (strategy + scenario description) into the library.
All three stages are produced by the same policy and optimized by the same task-outcome signal — no auxiliary models, no hand-crafted rewards.
conda create -n skill1 python==3.12 -y
conda activate skill1
pip3 install vllm==0.11.0
pip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install -e .ALFWorld:
conda env create -f agent-alfworld-env.yamlWebShop:
conda env create -f agent-webshop-env.yamlWe use download the Alfworld and WebShop data from the original sources: alfworld/alfworld | princeton-nlp/WebShop
# ALFWorld
bash launch_scripts/alfworld/train_alfworld.sh
# WebShop
bash launch_scripts/webshop/train_webshop.shThis code is built upon several open-source projects. We thank the authors and contributors of: verl, verl-agent, and LaMer.
If you find our work useful, please consider citing our paper:
@article{shi2026skill1,
title={Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning},
author={Shi, Yaorui and Chen, Yuxin and Lu, Zhengxi and Miao, Yuchun and Liu, Shugui and Gu, Qi and Cai, Xunliang and Wang, Xiang and Zhang, An},
journal={arXiv preprint arXiv:2605.06130},
year={2026}
}