Skip to content

v2.7.1: LLM RL Quantization & Bug Fixes

Latest

Choose a tag to compare

@jaimesabalbermudez jaimesabalbermudez released this 23 Jun 12:43
3f2a321

Features

LLM RL quantization (#522): Adds bitsandbytes quantization to the LLM RL post-training stack plus the memory machinery to run longer-context RL on a single smaller GPU:

  • Trainer-side bnb quantization (none | int8 | nf4 QLoRA), resolved from a QUANTIZATION preset by create_population; vLLM mirrors the trainer's precision (bitsandbytes rollout when quantized, dense bf16
    otherwise).
  • Colocated vLLM rollout: vLLM and trainer each hold their own base and share the GPU via vLLM native sleep/wake; trainer base is CPU-offloaded during rollout and only LoRA adapters are synced per cycle.
    CUDA-safe trainer-first init.
  • Always-on, memory-bounded fused/chunked linear log-probs, plus optional padding-free sequence packing (FA2-varlen / flex-attention block-sparse).
  • Fused multi-adapter LoRA forward (actor+critic in one pass) with per-row routing.
  • Importance-sampling level (token / turn / trajectory) decoupled from advantage granularity across GRPO / GSPO / CISPO / PPO / REINFORCE, plus a vLLM sampling-mismatch (truncated-IS) correction.
  • CI: gpu/vllm-marked tests now run in a CUDA container; bitsandbytes pinned linux-only.

Docs (#523): list previously-missing LLM algos (CISPO, GSPO, LLM PPO, LLM REINFORCE, SFT) in the README/API tables, fix the broken GRPO example, GSPO heading typo, and expand the loss_type explanation.

Bugs

  • EvolvableCNN RNG propagation (#546): the rng setter now also seeds mut_kernel_size, so MutableKernelSizes shares the module's generator instead of an independent RNG, restoring reproducibility of
    kernel-size mutations.
  • PPO value-head save/load (#522): v_head is now restored on the LoRA-only load path and lr_actor is stored, so optimizer-metadata restore no longer crashes.

Dependency upgrades

  • tensordict 0.12.2 → 0.13.0 (#515, #526)
  • redis 4.4.4 → 8.0.0 (#527)
  • pymunk 6.2.1 → 7.2.0 (#518)
  • termcolor 1.1.0 → 3.3.0 (#542)
  • pre-commit 3.8.0 → 4.6.0 (#543)
  • hydra-core 1.3.2 → 1.3.3 (#537)
  • omegaconf 2.3.0 → 2.3.1 (#536, #552)
  • tqdm 4.67.3 → 4.68.0 (#525)
  • dill 0.4.0 → 0.4.1 (#551)
    1e01a1)

What's Changed

Full Changelog: v2.7.0...v2.7.1