Release SkyRL: v0.2.0 · NovaSky-AI/SkyRL

Highlights

VLM Support: SkyRL now supports VLM training, both through the Tinker API as well as via the python entrypoint. We've validated stable training for single and multi-turn datasets with both text-only environments and multi-modal environment outputs. Get started here: https://docs.skyrl.ai/docs/tutorials/vision_language_rl

New inference refactor centralizing on HTTP: We've implemented a new HTTP-based refactor (*in inference_servers) for inference with vLLM. This standardizes all inference interactions over HTTP, further integrating vllm-router as a high-performance router for generation requests. We also support prefill-decode disaggregation, allows users to squeeze out more performance in multi-turn async RL use-cases. The new inference codepath is now the default - to use the legacy (inference_engines/), set _SKYRL_USE_NEW_INFERENCE=0. The inference_engines/ codepath will be removed in the next release.

vLLM Native weight syncing API integration: SkyRL's new inference servers implementation uses vLLM's native weight syncing APIs: https://docs.vllm.ai/en/latest/training/weight_transfer/

Step-wise training improvements: We've made a number of fixes to the stepwise training implementation, addressing correctness issues (#1492), implementing support for fully async training (#1536) and adding prefix-aware merging to avoid redundant forward passes (#1532)

R3 support: SkyRL now supports R3 for stabilizing training with MoE models. Currently this is limited to cases where vllm engine is within a node due to limitations from vLLM.

Nemotron 3 and Qwen 3.5 support: SkyRL now supports Nemotron 3 and Qwen 3.5 models. Nemotron 3 is supported in the FSDP and Megatron backends, while Qwen 3.5 is supported in FSDP, Megatron and Jax backends.

What's Changed

[trivial] Fix comments numbering and make some code more concise in trainer.py by @CharlieFRuan in #1283
[train][1/N] Native Weight Sync API: NCCL by @hao-aaron in #1271
[ci] Fix CI from broken test imports from #1271 by @erictang000 in #1290
[fix] Fix cuda ipc weight sync after #1271 by @erictang000 in #1292
fix paths for instruction comments to match current location by @linde in #1294
[CI] Skip FlashRL integration test in CI and fix failing generation test for new inference codepath by @SumanthRH in #1301
Skip output_router_logits for granitemoehybrid models by @eltonjohnfanboy in #1295
WIP: Restore PR changes lost during skyrl-train deprecation by @tyler-griggs in #1310
[chore] Update skyrl and skyrl-gym versions after 0.1.0 release by @SumanthRH in #1312
[lint] Add isort to pre-commit by @SumanthRH in #1267
[examples][bug] fix silent eval max generate length not overriding by @erictang000 in #1317
[docs] Add explicit eval_sampling_params.max_generate_length by @SumanthRH in #1318
[vllm] enable mp distributed executor backend (no multi-node engines) by @erictang000 in #1300
[train][2/N] Native Weight Syncing APIs: IPC by @hao-aaron in #1291
[algorithm][generator] change overlong filtering to use stop reasons over checking eos token by @erictang000 in #1319
Add rollout_is policy loss by @SamuelGabriel in #1314
[AsyncRL] Use keep mode for pause and resume by @hao-aaron in #1179
[skyrl][inference] Fix port collision when ports are allocated. by @nithinvc in #1302
R3 PR: Rollout Routing Replay by @erictang000 in #1273
[megatron] enable bucketed weight sync for non-colocated nccl weight sync in megatron by @erictang000 in #1324
[fix] Fix placement group bundle ordering for inference engines by @SumanthRH in #1308
[train][fix] Fix concurrency limitations in the new inference codepath by @SumanthRH in #1320
[megatron][lora] Fix megatron lora weight syncing not initializing buckets correctly by @erictang000 in #1330
[train] Add worker_process_setup_hook to set mp start method to spawn by @SumanthRH in #1333
[CI] Fix test_inference_engines_generation after vllm 0.16.0 upgrade; Use the correct GSM8k path for test_generator_multi_turn_gsm8k_router_replay by @SumanthRH in #1339
[train] Make TrainingInputBatch to PAD only to left, hence response tensors be right-aligned by @CharlieFRuan in #1285
Revert "[train] Add worker_process_setup_hook to set mp start method to spawn" by @SumanthRH in #1344
[Docs] Add docs on agent integration and step-wise training by @CharlieFRuan in #1347
[Docs] Small update on docs by @CharlieFRuan in #1348
[train] Add validation for step-wise GeneratorOutput by @CharlieFRuan in #1281
[megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing by @erictang000 in #1345
[StepWise] Trivial fix to avg_response_length metric by @CharlieFRuan in #1351
[CI] Make MultiItemDataset a global variable after switch to spawn by @SumanthRH in #1346
[train] Add support for LoRA in the new inference codepath by @SumanthRH in #1329
[bug][algorithm] remove incorrect torch.no_grad() for kl in loss (use_kl_loss=True) by @erictang000 in #1353
[transformers] set return dict false for transformers v5 compatibility by @erictang000 in #1325
[skyrl][tx] Move ModelInput token extraction to backends by @nithinvc in #1352
[tx] Fuse the projection matrices for Qwen3 by @pcmoritz in #1341
[tx] Fuse the projection matrices for Qwen 3.5 by @pcmoritz in #1362
Add CodeScout project to README by @CharlieFRuan in #1364
[train] Patch vLLM v0.16.0 sleep mode to properly free model weights by @CharlieFRuan in #1365
[tx] Optimize the decode performance by @pcmoritz in #1363
[skyrl] Add ImageChunk and ImageAssetPointerChunk types by @nithinvc in #1361
[train] Enable support for the mp backend with the new inference codepath by @SumanthRH in #1355
Fix loss_fn_outputs right-aligned slicing in Tinker API path by @CharlieFRuan in #1367
[bug] Move server creation and server start in the same thread by @hao-aaron in #1375
[router replay] downcast expert router indices to uint8/int16 to reduce space by @erictang000 in #1378
[train] Fix double-serialization of TensorBatch in pickle by @erictang000 in #1379
Bump vLLM to 0.18 by @hao-aaron in #1374
[train] Use a shared semaphore for all generate requests with RemoteInferenceClient; Move tokenization to client by @SumanthRH in #1381
[async] Add search r1 fully async script by @CharlieFRuan in #1386
[train] Add vLLMRouter in new inference codepath by @SumanthRH in #1385
[trainer] refactor dispatch_from_staged to individually serialize DP chunks to avoid materializing whole batch on all workers by @erictang000 in #1376
[SkyRL] Introduce /render endpoint to the new http inference client by @nithinvc in #1373
[async] Add DAPO fully async script by @CharlieFRuan in #1390
[examples] update command paths to include train/ in examples by @erictang000 in #1395
[bug] fix sleep bugs by @hao-aaron in #1383
[train][2/N] Support for Megatron PP + CP for R3 by @devpatelio in #1335
[train] Multi-modal inputs support in FSDP2 by @nithinvc in #1331
[bug] Fix weight sync with DP > 1 in non-colocated setups by @SumanthRH in #1399
[train] Make vLLMRouter the default router in the new inference codepath by @SumanthRH in #1394
[bug] Fix KeyError for pixel_values in RefWorkerBase.forward by @SumanthRH in #1402
[megatron][checkpointing] fix checkpointing with optimizer cpu offload for dist_ckpt_optim_fully_reshardable=False by @erictang000 in #1403
Add MaxRL mean normalization over advantages by @tamoghnokandar in #1126
[ci] bump logprob diff gap from 7e-2 to 9e-2 for dense models by @erictang000 in #1408
[example] Add example scripts for Nemotron models by @SumanthRH in #1409
[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API by @nithinvc in #1396
[bug] Fix fork + Ray deadlock in dataset filtering by using spawn by @SumanthRH in #1415
[CI] Add ep tests to CI by @hao-aaron in #1404
Fix preprocess_packed_seqs crash with short sequences under CP > 1 by @CharlieFRuan in #1407
[fix] Forward SKYRL_RAY_PG_TIMEOUT_IN_S to workers via runtime_env by @CharlieFRuan in #1406
[BREAKING][skyrl-train] Implement loss reduction via advantage normalization and fix token_mean reduction strategy by @justinvyu in #1296
[megatron] Fix loss aggregation for context parallelism (CP) in Megatron by @erictang000 in #1420
[megatron] Upgrade mbridge -> 0.3.1, megatron-core -> 0.16.1 by @erictang000 in #1412
[bug] Fix logprobs handling and error status codes for vllm-router compatibility by @SumanthRH in #1421
[megatron] Fix redundant downloading of shards across workers/nodes for megatron dist checkpointing by @erictang000 in #1414
fix: correct gsm8k example paths in Modal README by @tyfeng1997 in #1429
[test] Refactor tests to use single event loop; fix RemoteInferenceClient connector cleanup by @SumanthRH in #1387
[Bug][CI] Increase server timeout for MoE model initialization by @SumanthRH in #1454
[tinker] Implement model unloading for skyrl_train_backend by @pcmoritz in #1453
[dependencies] Upgrade transformers to >=5.0.0,<=5.3.0 by @erictang000 in #1426
[Fix][CI] Fix CI Failures: set return_dict=False for sample API tests, use proper sleep/wake up for tests with colocated engine by @SumanthRH in #1463
[dependencies] bump vllm to 0.19.0 by @erictang000 in #1462
Add OpenReward environment integration example by @tyfeng1997 in #1458
[ci] Add ability to trigger skyrl-train gpu CI (and megatron CI) by adding run_train_gpu_ci and run_train_megatron_gpu_ci labels by @erictang000 in #1465
[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest by @erictang000 in #1425
fix: use gloo for CPU collectives in Megatron workers to fix multi-node checkpointing by @CharlieFRuan in #1466
[tinker] Forbid extra keys in EngineConfig by @pbokc in #1459
fix: set CUDA device before gloo+nccl process group init in Megatron workers by @CharlieFRuan in #1468
[megatron] Fix MoE weight syncing by grouping fused expert tasks into dedicated buckets, supporting qwen3.5 moe by @erictang000 in #1471
[Fix] Add semaphore throttling to RemoteInferenceClient.sample, RemoteInferenceClient.completion, RemoteInferenceClient.chat_completion by @SumanthRH in #1475
[SkyRL][train] Support prompt_logprobs in /sample in the new inference stack by @nithinvc in #1417
fix: escape bare < in generated MDX to fix Vercel doc builds by @tyler-griggs in #1478
[R3] Enable R3 with new inference by @hao-aaron in #1428
[skyrl] New inference client in the skyrl-train tinker backend by @nithinvc in #1452
[fix] update vllm.inputs.data import to vllm.inputs after vllm 0.19.0 upgrade by @SumanthRH in #1482
[skyrl] vLLM Renderer for rendering Multi-Modal ModelInputChunks for training backend by @nithinvc in #1464
[qoc] Remove thread for sleep() in _get_new_inference_client by @CharlieFRuan in #1488
[train] Save checkpoints at epoch boundaries by @CharlieFRuan in #1490
[train] Final ckpt for fully async by @CharlieFRuan in #1491
[CI] Fix moe model initialization timeout by @SumanthRH in #1495
[feat][train] Add prefill-decode disaggregation support and refactor vLLMRouter by @SumanthRH in #1467
[CI] Relax Megatron vs FSDP policy_loss tolerance in test_megatron_train by @SumanthRH in #1500
[train][multimodal][1/3] Add vision support to generate() in new inference stack by @nithinvc in #1494
[multimodal] add language_model_only flag for models like qwen3.5 by @erictang000 in #1487
[fix][train][step-wise] Broadcast step-wise advantage with each step's own response_mask by @CharlieFRuan in #1507
[fix][CI] Fix cleanup for test_weight_sync by @SumanthRH in #1510
[fix][train] Fix port collision for num_engines > 1 in non-PD case by @SumanthRH in #1511
[train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput by @nithinvc in #1498
[CI] Migrate non-Megatron GPU CI to run on new inference codepath by @SumanthRH in #1476
[feat] chunked ipc support for new inference by @hao-aaron in #1512
[CI] Fix Megatron Lora for new inference and migrate Megatron CI to new inference codepath by @SumanthRH in #1518
[CI] Migrate E2E CI to use new inference by @SumanthRH in #1521
[train] SFT 1/N: Add a native SFT trainer to SkyRL by @SumanthRH in #1503
[dep] Add causal-conv1d and mamba-ssm wheels, add mamba-ssm as extra by @CharlieFRuan in #1524
[qoc] Remove unused TrainingInputBatch.metadata["trajectory_ids"] by @CharlieFRuan in #1526
[qoc] Extract pad_batch() into a helper to training_batch.py by @CharlieFRuan in #1527
[bug][train] Fix max_seq_len calculation by @tamoghnokandar in #1303
[fix][train] Prompt-based mini-batching for step-wise training by @CharlieFRuan in #1529
[train][multimodal][3/3] Add multi-turn VLM generator by @nithinvc in #1486
[skyrl][tinker] Use VLLMRenderer in SkyRL train backend by @nithinvc in #1496
[qoc] Make concatenate_generator_outputs linear instead of O(K^2) by @CharlieFRuan in #1535
[CI] Fix timeout failures in E2E CI pipelines by @SumanthRH in #1533
[stepwise] Plumb through step-wise training for fully async by @CharlieFRuan in #1536
[train][step-wise] Add prefix-aware merging for step-wise training by @CharlieFRuan in #1532
Revert "[train][step-wise] Add prefix-aware merging for step-wise training" by @CharlieFRuan in #1537
[train][step-wise] Add prefix-aware merging for step-wise training by @CharlieFRuan in #1538
[harbor] Bump harbor to HEAD, make run_harbor_gen runnable by @CharlieFRuan in #1540
[trivial][test] Add overlong filtering test to step-wise prefix-aware merging by @CharlieFRuan in #1541
[docs][example] VLM Examples by @nithinvc in #1531
Make the new inference codepath the default by @SumanthRH in #1544
[chore] Move flash attn wheel specification to uv.sources by @SumanthRH in #1547
[bug][tinker] Fix vLLMRouter init in SkyRLTrainBackend by @SumanthRH in #1554
[train] Support packing for CUDA IPC transfer with new inference codepath by @SumanthRH in #1558

New Contributors

@linde made their first contribution in #1294
@eltonjohnfanboy made their first contribution in #1295
@SamuelGabriel made their first contribution in #1314
@nithinvc made their first contribution in #1302
@tyfeng1997 made their first contribution in #1429

Full Changelog: skyrl-v0.1.0...skyrl-v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SkyRL: v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!