Release v2.0.0 · areal-project/AReaL

🚀 Key Highlights

Micro-service Architecture

Includes training, inference, agent, and weight update services.

Hermes Agent Example

End-to-end agent built on the agent service and wired into the 2.0 training loop. Check out the Hermes RL training example for more details.

CLI Commands

A unified areal CLI to launch, manage, and operate each micro-service. See the per-service guides: training, inference, agent.

What's Changed

perf(infra): pipeline controller initialization with background threads by @garrett4wade in #1294
ci: add nightly workflow for long-running tests by @garrett4wade in #1312
feat: support pp for Sglang by @TaoZex in #1162
perf(archon): parallelize inference controller initialization by @garrett4wade in #1314
fix(archon): improve HTTP client reliability and reduce logging verbosity by @garrett4wade in #1315
refactor(experimental): remove redundant capacity grant from inference service by @garrett4wade in #1318
refactor(experimental): add unified group semantics to inference service session lifecycle by @garrett4wade in #1321
chore: migrate repo references from InclusionAI to areal-project by @garrett4wade in #1325
fix(proxy): refuse default admin API key on non-loopback bind by @sebastiondev in #1323
refactor: consolidate admin key validation into shared helper by @garrett4wade in #1328
fix: Add error detection function and test for ZeroDivisionError and other errors alike by @chenzhiyi021 in #1332
docs: Add OpenSSF Best Practices badge to README 📛 by @mingcheng in #1348
gov: add new maintainer by @sitabulaixizawaluduo in #1349
fix(utils): mask 2d sequence advantages by @haoyang9804 in #1346
feat(awex): add colocated CUDA IPC weight transfer by @garrett4wade in #1310
fix(infra): correct staleness capacity inflation after recovery by @daihaowz in #1345
fix(test): add missing colocate field to wu_controller connect tests by @sitabulaixizawaluduo in #1357
refactor(infra): backport rl infra cleanup by @sitabulaixizawaluduo in #1353
feat(megatron): implement async_save with AsyncCallsQueue by @dingzhiqiang in #1356
feat: controller v2 refactor by @sitabulaixizawaluduo in #1354
fix(fsdp): maintain fp32 master weights for AdamW (#1292) by @guozhihao-224 in #1369
chore: add 0516 community meeting materials and update agenda for the next biweekly sync by @sitabulaixizawaluduo in #1371
docs: fix some typos by @jeis4wpi in #1352
feat:enable v2 training pipeline with controller parity by @sitabulaixizawaluduo in #1363
fix(trainer): skip controller-side CUDA sync in single-controller mode by @Adiactive in #1377
fix[v2]: localize RTensor trajectories before reading on controller by @sitabulaixizawaluduo in #1387
chore: migrate community governance files to external repository 📋 by @mingcheng in #1386
feat(megatron): Qwen3.5 dense + MoE training/inference support via megatron-bridge by @Adiactive in #1384
feat(distillation): add on-policy distillation using RolloutEngine by @zahrayousefijamarani in #1376
docs(roadmap): add 2026 Q2 and H2 milestones by @sitabulaixizawaluduo in #1390
Fix LoRA model training by @lifeiteng in #1385
fix(v2/awex): unblock weight-update bring-up by @sitabulaixizawaluduo in #1401
fix(engine): default shard_ids=None on clear_batches across all engines by @Adiactive in #1402
docs: add community section with WeChat QR code to README 📢 by @mingcheng in #1409
Supporting features for IcePop and KPop by @guojiapub in #1405
fix: per-sample version tracking with loss_mask filter and multi-turn… by @pyq623 in #1408
docs: add IcePop/KPop feature introduction by @guojiapub in #1424
fix(vllm): forward frequency_penalty and stop in generation requests by @EazyReal in #1429
fix(awex): allow disabling batch_send_recv use_group via AWEX_WU_USE_GROUP by @sitabulaixizawaluduo in #1414
feat: disable megatron grad buffers CPU backup to save host memory by @HT-Yuan in #1393
fix(network): find_free_ports ignores out-of-range exclude_ports by @EazyReal in #1436
fix(reward): guard clevr_count_70k_reward_fn against scoring failures by @EazyReal in #1430
feat(megatron): make MTP head opt-in to support Qwen3.6 MoE RL by @Adiactive in #1403
feat(ppo): add CISPO loss surrogate (MiniMax-M1) by @EazyReal in #1412
fix(CI): fix vlm_grpo CI OOM bug by @sitabulaixizawaluduo in #1438
feat(cli): add experimental cli scaffold for service-style subcommands by @sitabulaixizawaluduo in #1440
refactor: move 5 experimental modules into areal/v2 for 2.0 release by @sitabulaixizawaluduo in #1448
feat: support v2 weight update disk mode for lora RL by @sitabulaixizawaluduo in #1450
feat(cli): add training service cli by @sitabulaixizawaluduo in #1446
feat(cli): add agent service cli by @sitabulaixizawaluduo in #1447
feat(cli): add inference service cli by @sitabulaixizawaluduo in #1434
feat(agent_service): add agent service with OpenClaw and Hermes examples by @IF007 in #1383
feat(ppo): add reuse_train_logp proximal logp method by @Le8r0nJames in #1453
fix(ppo): coerce ppo_n_minibatches to 1 for reuse_train_logp instead of raising by @Le8r0nJames in #1457
Chore/add areal 2 report paper by @sitabulaixizawaluduo in #1461
feat(launcher): add best-effort post-exit hook by @Le8r0nJames in #1459
feat(megatron): add CP-safe vocab stats and MoE config support by @Le8r0nJames in #1460
fix(ppo): handle variable-size trajectory groups in reward normalization by @Le8r0nJames in #1454
fix(openai): parse tool_call arguments from JSON string to dict before chat template by @Le8r0nJames in #1463
feat(swe): add SWE-bench RL training workflow example by @Le8r0nJames in #1462
fix: fix safe-to-test CI workflow by @sitabulaixizawaluduo in #1456
feat(openai): add proxy preprocessors and Qwen tool-call parsing by @Le8r0nJames in #1458
test(ppo): update singleton leave-one-out group expectation by @Le8r0nJames in #1464
chore: update AReaL 2.0 report paper by @sitabulaixizawaluduo in #1465
chore: bump v2.0.0 by @sitabulaixizawaluduo in #1466

New Contributors

@sebastiondev made their first contribution in #1323
@mingcheng made their first contribution in #1348
@haoyang9804 made their first contribution in #1346
@jeis4wpi made their first contribution in #1352
@zahrayousefijamarani made their first contribution in #1376
@lifeiteng made their first contribution in #1385
@guojiapub made their first contribution in #1405
@pyq623 made their first contribution in #1408
@EazyReal made their first contribution in #1429
@IF007 made their first contribution in #1383
@Le8r0nJames made their first contribution in #1453

Full Changelog: v1.0.4...v2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🚀 Key Highlights

Micro-service Architecture

Hermes Agent Example

CLI Commands

What's Changed

New Contributors

Contributors

Uh oh!