Releases: Infini-AI-Lab/astraflow
Releases · Infini-AI-Lab/astraflow
AstraFlow v0.1.1
AstraFlow v0.1.1 broadens the system from RLVR math/code training toward multi-agent and recursive-agent RL, adds a second training backend, and modernizes the toolchain. Recipes, docs, and Docker images are updated to match.
Major Updates
- Megatron-LM training backend. A second trainer backend alongside FSDP, with weight sync through an HF-space transfer buffer (supports tensor/pipeline/expert/virtual-pipeline parallelism) so the RaaS receive path is identical to FSDP. Includes a direct-DMA weight-offload path (~23× faster) and a streaming Megatron→HF per-tensor exporter.
- Recursive-agent workflows. New
recursive_agent-style workflows where an agent recursively spawns sub-agents that share state under a team reward: TextCraft (+ recipe variants), Oolong, and DeepDive, each with a Qwen3-4B recipe. - Spawn-sub-agents workflow for math RL — parallel sub-agent rollouts under a single trajectory.
- Offline training support. Offline math datasets and recipes for clusters without internet access at train time.
Other Notable Changes
- astraEnv tooling: a minimal LLM-as-judge library, a
reward_modeselector, a CMU RAG search client, and an AI-rubric checklist grader. - Workflows can opt out of the producer's default group-reward statistics; sub-agent reward routing fixes.
- Stability fixes: hardened Megatron weight offload, capped TextCraft recursive
max_concurrent_rollouts, normalizedapply_chat_templateoutput to token ids for transformers 5, and GPU-arch-aware SGLang attention-backend / norm-path selection.
Environment and CI
- Toolchain bump: CUDA 13 image, SGLang 0.5.12, transformers 5 support (with
kernels<0.13pin and a relaxed RaaS health watchdog). - Pre-built Megatron Docker image (
astraflowai/astraflow:v0.1.1.megatron) documented; FSDP image remainsastraflowai/astraflow:v0.1.1. - New docs: TextCraft recursive-agent and offline-math recipe pages, a Megatron weight-sync architecture page, and CUDA 13 install steps.
What's Changed
- Core reorg by @haizhongzheng in #2
- Core reorg by @haizhongzheng in #3
- feat: support offline mode for math training datasets by @haizhongzheng in #4
- chore: bump sglang 0.5.5.post1 -> 0.5.12.post1 (FSDP path) by @WWWjiahui in #5
- Feat/megatron weight sync dev by @jsw-zorro in #9
- Spawn solution by @haizhongzheng in #12
- fix(megatron): build Transformer Engine from source for CUDA 13 by @jsw-zorro in #14
- AstraFlow v0.1.1: Megatron backend, offline math, recursive/spawn agents by @haizhongzheng in #16
- docs: fix textcraft recipe docs URL by @haizhongzheng in #17
New Contributors
- @WWWjiahui made their first contribution in #5
- @jsw-zorro made their first contribution in #9
Full Changelog: v0.1.0...v0.1.1
AstraFlow v0.1.0
First public release of AstraFlow, a dataflow-oriented RL system for (multi-)agentic LLMs.
- 📖 Paper: https://arxiv.org/abs/2605.15565
- 🌐 Site: https://Infini-AI-Lab.github.io/astraflow/
- 📚 Docs: https://Infini-AI-Lab.github.io/astraflow/docs/
Highlights
- Fully async multi-policy collaborative RL
- Elastic heterogeneous cross-region rollouts (RaaS)
- Substitutable rollout and trainer services
- Composable data algorithms (GRESO, dynamic sampling, buffer replay)
Recipes
math/, math-multi-agent/, math-efficient-data/, code/, code-multi-agent/, search/, alfworld/, webshop/. See the docs for details.
Install
docker run --gpus all --net=host --shm-size=512g -it astraflowai/astraflow:v0.1.0Or from source — see the installation guide. Requires Linux, Python 3.10–3.12, CUDA 12.9.
Status
Alpha — runs end-to-end, but 0.x APIs may evolve. Issues welcome: https://github.com/Infini-AI-Lab/astraflow/issues