PeRL: Parameter-Efficient Reinforcement Learning

A minimal, modular, and lightning-fast framework for PEFT + RL.

| AlphaXiv | ArXiv | Checkpoints | Wandb Log

Important

This repo is based on TRL and we are actively developing on verl / slime.

We thank for the contribution of Tina where our project was forked from its repo and we used part of its hyperparameter settings.

We also utilize verifier functions, evaluation, and data from openr1 and JustRL.

News

[2025/12] We released our code, paper, and checkpoints!

About

PeRL is the first large-scale evaluation of over 12 PEFT methodologies using the DeepSeek-R1-Distill family on complex mathematical reasoning benchmarks.

Key Finding: Standard LoRA is suboptimal. Structural variants such as DoRA, AdaLoRA, and MiSS consistently outperform standard LoRA. Notably, DoRA (46.6% avg. accuracy) even surpasses full-parameter fine-tuning (44.9%) across multiple benchmarks.

The failure of SVD-based initialization. Strategies like PiSSA and MiLORA experience significant performance degradation or total training collapse. This is due to a fundamental "spectral misalignment": these methods force updates on principal components, while RLVR intrinsically operates in the off-principal regime.

The Expressivity Floor. While RLVR can tolerate moderate parameter reduction, extreme compression (e.g., VeRA, IA³, or Rank-1 adapters) creates an information bottleneck. Reasoning tasks require a minimum threshold of trainable capacity to successfully reorient policy circuits.

Supported Parameter-Efficient Methods

Method	Status	Description
LoRA	✅	Standard Low-Rank Adaptation
DoRA	✅	Weight-decomposed Low-Rank Adaptation
MiSS	✅	Mixture of Sub-Spaces (Efficient shard-sharing structure)
AdaLoRA	✅	Adaptive budget allocation for rank-adaptive matrices
LoRA+	✅	Differentiated learning rates for improved adaptation dynamics
rsLORA	✅	Rank stabilization scaling factors
PiSSA	✅	Principal Singular values & Singular vectors Adaptation
MiLORA	✅	Minor Singular components initialization
LORA-FA	✅	Memory-efficient adaptation with frozen projection matrix A
VeRA	✅	Vector-based Random Matrix Adaptation
LN Tuning	✅	Parameter-efficient tuning on Layer Normalization layers
$IA^3$	✅	Infused Adapter by Inhibiting and Amplifying Inner Activations

Environment Setup

uv pip install -r requirements.txt

uv pip install flash-attn --no-cache-dir --no-build-isolation
python -c "import flash_attn" # verify

Training

source [your virtual env]/bin/activate
bash scripts/openr1/dapo_full.sh # run a full RL
bash scripts/openr1/dapo_lora.sh # run a lora RL

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
config		config
doc		doc
perl		perl
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PeRL: Parameter-Efficient Reinforcement Learning

News

About

Supported Parameter-Efficient Methods

Environment Setup

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PeRL: Parameter-Efficient Reinforcement Learning

News

About

Supported Parameter-Efficient Methods

Environment Setup

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages