AstraFlow v0.1.1 broadens the system from RLVR math/code training toward multi-agent and recursive-agent RL, adds a second training backend, and modernizes the toolchain. Recipes, docs, and Docker images are updated to match.
Major Updates
- Megatron-LM training backend. A second trainer backend alongside FSDP, with weight sync through an HF-space transfer buffer (supports tensor/pipeline/expert/virtual-pipeline parallelism) so the RaaS receive path is identical to FSDP. Includes a direct-DMA weight-offload path (~23× faster) and a streaming Megatron→HF per-tensor exporter.
- Recursive-agent workflows. New
recursive_agent-style workflows where an agent recursively spawns sub-agents that share state under a team reward: TextCraft (+ recipe variants), Oolong, and DeepDive, each with a Qwen3-4B recipe. - Spawn-sub-agents workflow for math RL — parallel sub-agent rollouts under a single trajectory.
- Offline training support. Offline math datasets and recipes for clusters without internet access at train time.
Other Notable Changes
- astraEnv tooling: a minimal LLM-as-judge library, a
reward_modeselector, a CMU RAG search client, and an AI-rubric checklist grader. - Workflows can opt out of the producer's default group-reward statistics; sub-agent reward routing fixes.
- Stability fixes: hardened Megatron weight offload, capped TextCraft recursive
max_concurrent_rollouts, normalizedapply_chat_templateoutput to token ids for transformers 5, and GPU-arch-aware SGLang attention-backend / norm-path selection.
Environment and CI
- Toolchain bump: CUDA 13 image, SGLang 0.5.12, transformers 5 support (with
kernels<0.13pin and a relaxed RaaS health watchdog). - Pre-built Megatron Docker image (
astraflowai/astraflow:v0.1.1.megatron) documented; FSDP image remainsastraflowai/astraflow:v0.1.1. - New docs: TextCraft recursive-agent and offline-math recipe pages, a Megatron weight-sync architecture page, and CUDA 13 install steps.
What's Changed
- Core reorg by @haizhongzheng in #2
- Core reorg by @haizhongzheng in #3
- feat: support offline mode for math training datasets by @haizhongzheng in #4
- chore: bump sglang 0.5.5.post1 -> 0.5.12.post1 (FSDP path) by @WWWjiahui in #5
- Feat/megatron weight sync dev by @jsw-zorro in #9
- Spawn solution by @haizhongzheng in #12
- fix(megatron): build Transformer Engine from source for CUDA 13 by @jsw-zorro in #14
- AstraFlow v0.1.1: Megatron backend, offline math, recursive/spawn agents by @haizhongzheng in #16
- docs: fix textcraft recipe docs URL by @haizhongzheng in #17
New Contributors
- @WWWjiahui made their first contribution in #5
- @jsw-zorro made their first contribution in #9
Full Changelog: v0.1.0...v0.1.1