Skill1

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

TL;DR

LLM agents can be augmented with a skill library — a persistent memory of reusable strategies. Using such a library requires three coupled capabilities: selecting a relevant skill, utilizing it during execution, and distilling new skills from experience. Prior methods optimize these in isolation with separate reward signals, causing conflicting evolution.

Skill1 trains a single policy (Qwen2.5-7B) via RL (GRPO) to co-evolve all three capabilities using only one task-outcome reward. Credit assignment is achieved by decomposing the reward into a low-frequency trend (credits selection) and high-frequency variation (credits distillation).

How It Works

Task → [Selection] → [Utilization] → [Distillation] → Skill Library
         ↑                                                   |
         └───────────────────────────────────────────────────┘

Skill Selection — Policy generates a query, retrieves candidates via a frozen encoder, and re-ranks them.
Skill Utilization — Policy interacts with the environment conditioned on the selected skill.
Skill Distillation — Policy reflects on the trajectory and writes a new reusable skill (strategy + scenario description) into the library.

All three stages are produced by the same policy and optimized by the same task-outcome signal — no auxiliary models, no hand-crafted rewards.

Quick Start

1. Install Base Environment

conda create -n skill1 python==3.12 -y
conda activate skill1

pip3 install vllm==0.11.0
pip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install -e .

2. Install Task Environments

ALFWorld:

conda env create -f agent-alfworld-env.yaml

WebShop:

conda env create -f agent-webshop-env.yaml

3. Download Data

We use download the Alfworld and WebShop data from the original sources: alfworld/alfworld | princeton-nlp/WebShop

4. Run Training

# ALFWorld
bash launch_scripts/alfworld/train_alfworld.sh

# WebShop
bash launch_scripts/webshop/train_webshop.sh

Acknowledgments

This code is built upon several open-source projects. We thank the authors and contributors of: verl, verl-agent, and LaMer.

Citation

If you find our work useful, please consider citing our paper:

@article{shi2026skill1,
  title={Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning},
  author={Shi, Yaorui and Chen, Yuxin and Lu, Zhengxi and Miao, Yuchun and Liu, Shugui and Gu, Qi and Cai, Xunliang and Wang, Xiang and Zhang, An},
  journal={arXiv preprint arXiv:2605.06130},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.codex		.codex
agent_system		agent_system
docker		docker
docs		docs
examples		examples
gigpo		gigpo
launch_scripts		launch_scripts
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent-alfworld-env.yaml		agent-alfworld-env.yaml
agent-sokoban-env.yaml		agent-sokoban-env.yaml
agent-webshop-env.yaml		agent-webshop-env.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skill1

TL;DR

How It Works

Quick Start

1. Install Base Environment

2. Install Task Environments

3. Download Data

4. Run Training

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Skill1

TL;DR

How It Works

Quick Start

1. Install Base Environment

2. Install Task Environments

3. Download Data

4. Run Training

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages