Dev#7
Merged
Merged
Conversation
Upgrade the inference/runtime stack to the latest sglang and the dependency versions it requires, validated end-to-end on the FSDP backend (qwen3-1.7b math example, 2x L40). Version pins (pyproject.toml, docs, Docker): - sglang 0.5.5.post1 -> 0.5.12.post1 - torch 2.8.0 -> 2.11.0; torch_memory_saver 0.0.9 -> 0.0.9.post1 - transformers 4.57.1 -> 5.6.1 (sglang pins ==5.6.0, which has a flash-attention s_aux=None crash for non-sink models; 5.6.1 is the upstream patch release. Forced via [tool.uv] override-dependencies, which requires uv >= 0.10 -- documented in installation.md) - peft -> >=0.18.0 (required by transformers 5.x) - CUDA base image 12.9.1 -> 13.0.0 sglang 0.5.12 API compatibility: - remove LoRAAbortReleasePatch (the abort-path lora_registry.release() it added is now fixed upstream; keeping it would double-release) - remove enable_ep_moe from SGLangConfig (field dropped from ServerArgs) - kernel package rename sgl_kernel -> sglang_kernel in the installation validator transformers 5.x / sglang 0.5.12 runtime fixes (surfaced by the run): - rlvr workflow: apply_chat_template now returns a BatchEncoding; pass return_dict=False to get the flat list[int] the rollout path expects - fsdp apply_fsdp2: model._no_split_modules is a set in transformers 5.x; coerce to list before indexing - raas free-port range capped at 55535 so sglang's derived gRPC port (port + 10000) stays <= 65535 Scope: FSDP backend only. Megatron / VL paths are intentionally not covered here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chore: bump sglang 0.5.5.post1 -> 0.5.12.post1 (FSDP path)
sglang 0.5.12's /health round-trips through the scheduler, which stays saturated for ~30-40s during the initial unchunked prefill of ~2048 requests/engine. The old 3-strike / 30s watchdog (5s probe timeout) hard-exited a busy-but-alive engine before the first rollout batch completed, hanging the rollout pipeline at step 0. Raise the /health probe timeout 5s -> 20s so a slow-but-alive endpoint isn't marked failed, and the failure budget 3 -> 5 strikes. A crashed engine refuses connections instantly, so real-death detection stays ~50s (worst case ~100s) while the prefill ramp is tolerated. Verified: math and code qwen3-8b-m2po-delta recipes train through the ramp with zero watchdog strikes.
…ution Two from-scratch install blockers with the sglang 0.5.12 / torch 2.11 stack: - sglang 0.5.12 depends on flash-attn-4>=4.0.0b9 (a pre-release pulled in as a dependency), so resolution fails unless pre-releases are allowed. Add prerelease = "allow" to [tool.uv] so `uv pip install -e ".[sglang]"` resolves on both the conda and Docker paths. - flash-attn 2.8.3 builds from source; nvcc writes GBs of intermediates to $TMPDIR. When $TMPDIR is a small/NFS-quota'd home the build fails with "nvFatbin error: empty input" / "Disk quota exceeded" from truncated temps. Document setting CUDA_HOME and a roomy TMPDIR, switch the sglang step to the project-extra form, and clarify flash-attn (FA2, trainer) vs flash-attn-4 (pulled in by sglang).
sglang requires an unbounded "kernels", so uv resolved the latest (0.15), but transformers 5.6.1 only supports kernels<0.13 — its hub_kernels module constructs LayerRepository() without a revision/version, which kernels 0.15 rejects, so `import sglang` crashes with "Either a revision or a version must be specified." Pin to the range transformers 5.6.1 expects (0.12.x). Verified on a from-scratch env: kernels resolves to 0.12.3 and the math recipe trains.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
merge dev to main.