Skip to content

MiliLab/Seirenes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seirênes adversarial self-play illustration

Seirênes

Adversarial Self-Play with Evolving Distractions for LLM Reasoning

Overview · Method · Results · Evaluation · Release Status

Python vLLM Seirenes 7B Seirenes 4B License

Overview

Seirênes is a self-play RL framework that turns contextual distraction into an internal training signal for stronger mathematical reasoning.

Instead of generating new tasks, Seirênes keeps the original problem and verifier fixed. A single shared policy plays two role-conditioned parts:

  • Adversary: writes plausible but misleading hints that expose the current Reasoner's blind spots.
  • Reasoner: solves the original problem while learning to ignore or correct those distractions.

This creates a compact internal arms race: as the Reasoner improves, the Adversary must discover sharper distractions; as the Adversary improves, the Reasoner receives harder robustness pressure without changing the downstream test-time interface.

Seirênes training pipeline

Method

For each training question q, Seirênes builds a paired rollout bundle:

  1. R1: Clean rollout
    The policy acts as the Reasoner and solves the original question. These rollouts estimate the current clean success rate.

  2. R2: Adversarial hint generation
    The same policy acts as the Adversary and generates natural, locally plausible hints intended to derail the reasoning path.

  3. R3: Hint-conditioned rollout
    The Reasoner answers the original question with the adversarial hint appended. The verifier still scores against the original ground truth.

The Adversary is rewarded by the clean-to-hinted performance drop, while the Reasoner is trained on both clean and hinted trajectories. The task, answer verifier, and inference format remain unchanged.

Results

Across seven mathematical reasoning benchmarks and three backbone scales, Seirênes improves standalone reasoning performance over instruction-tuned models and competitive RL baselines.

Backbone Base AVG Strong RL Baseline Seirênes Gain
Qwen2.5-7B-Instruct 13.8 18.7 22.9 +9.1
Qwen3-4B-Instruct 47.8 53.7 58.0 +10.2
Qwen3-30B-A3B-Instruct 56.7 60.1 63.9 +7.2

Benchmarks include AIME 2024--2026, IMO-Bench, Minerva Math, OlympiadBench, and HMMT 2026. The same-budget comparisons indicate that the gains are not explained by simply allocating more rollout compute to standard RL.

Evaluation

The repository includes a self-contained math benchmark runner under math_verify/. It supports OpenAI-compatible endpoints, vLLM serving, resume-safe inference, and fail-fast grading.

Install the core runtime:

pip install openai httpx pandas pyarrow tqdm sympy pylatexenc transformers

Install vllm if you want to launch a local OpenAI-compatible server:

pip install vllm

Start a local server:

cd math_verify
MODEL_PATH=/path/to/model TP=1 DP=1 PORT=8000 bash start_server.sh

Run the bundled benchmark suite:

cd math_verify
PORT=8000 DATASETS=all N=32 RUN_NAME=seirenes_eval bash run_eval.sh

Run against an existing endpoint:

cd math_verify
API_BASE=http://localhost:8000/v1 \
MODEL=/served/model/name \
DATASETS=aime24,aime25,aime26 \
N=32 \
RUN_NAME=seirenes_eval \
bash run_eval.sh

Outputs are written to:

  • math_verify/results/<run_name>/inference/*.jsonl
  • math_verify/results/<run_name>/graded/summary.json

See math_verify/README.md for more options, including external parquet files, slicing, resume mode, tokenizer-based length metrics, and hint-conditioned evaluation.

Repository Layout

.
├── img/                  # Project logo and method figure
├── math_verify/          # Main benchmark inference and grading toolkit
│   ├── my_bm/            # Bundled parquet benchmark files
│   ├── infer.py          # OpenAI-compatible batched inference
│   ├── grade.py          # Math grading and metric aggregation
│   └── run_eval.sh       # One-command inference + grading
├── LICENSE
└── README.md

Release Status

  • Main benchmark evaluation flow
  • Bundled math benchmark files
  • Training code
  • Model checkpoints: 7B, 4B
  • Paper link

Citation

BibTeX will be added when the paper is public.

License

This project is released under the Apache 2.0 License.

About

Seirênes: A single-policy internal arms race.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors