Skip to content

AlphaLab-USTC/Skill1

Repository files navigation

Skill1

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

arXiv HuggingFace License


TL;DR

LLM agents can be augmented with a skill library — a persistent memory of reusable strategies. Using such a library requires three coupled capabilities: selecting a relevant skill, utilizing it during execution, and distilling new skills from experience. Prior methods optimize these in isolation with separate reward signals, causing conflicting evolution.

Skill1 trains a single policy (Qwen2.5-7B) via RL (GRPO) to co-evolve all three capabilities using only one task-outcome reward. Credit assignment is achieved by decomposing the reward into a low-frequency trend (credits selection) and high-frequency variation (credits distillation).


How It Works

Task → [Selection] → [Utilization] → [Distillation] → Skill Library
         ↑                                                   |
         └───────────────────────────────────────────────────┘
  1. Skill Selection — Policy generates a query, retrieves candidates via a frozen encoder, and re-ranks them.
  2. Skill Utilization — Policy interacts with the environment conditioned on the selected skill.
  3. Skill Distillation — Policy reflects on the trajectory and writes a new reusable skill (strategy + scenario description) into the library.

All three stages are produced by the same policy and optimized by the same task-outcome signal — no auxiliary models, no hand-crafted rewards.


Quick Start

1. Install Base Environment

conda create -n skill1 python==3.12 -y
conda activate skill1

pip3 install vllm==0.11.0
pip3 install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
pip install -e .

2. Install Task Environments

ALFWorld:

conda env create -f agent-alfworld-env.yaml

WebShop:

conda env create -f agent-webshop-env.yaml

3. Download Data

We use download the Alfworld and WebShop data from the original sources: alfworld/alfworld | princeton-nlp/WebShop

4. Run Training

# ALFWorld
bash launch_scripts/alfworld/train_alfworld.sh

# WebShop
bash launch_scripts/webshop/train_webshop.sh

Acknowledgments

This code is built upon several open-source projects. We thank the authors and contributors of: verl, verl-agent, and LaMer.

Citation

If you find our work useful, please consider citing our paper:

@article{shi2026skill1,
  title={Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning},
  author={Shi, Yaorui and Chen, Yuxin and Lu, Zhengxi and Miao, Yuchun and Liu, Shugui and Gu, Qi and Cai, Xunliang and Wang, Xiang and Zhang, An},
  journal={arXiv preprint arXiv:2605.06130},
  year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors