Skip to content

v1.0.0

Latest

Choose a tag to compare

@github-actions github-actions released this 05 Jun 23:10
3dbe87f

What's Changed

  • Update README to include TensorRT LLM and vLLM in description by @nlevin-ui in #1
  • [MISC] Add License / headers, and a small check to prepare for release by @xinli-sw in #4
  • feat: enable runtime container detection for portable dynamo source builds by @qiching in #3
  • Sync ishandhanani/srt-slurm history into NVIDIA/srt-slurm by @csahithi in #14
  • Add trace-replay benchmark type by @alec-flowers in #16
  • fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue by @richardhuo-nv in #20
  • fix: add nvidia pypi as an extra index to be able to pip install the prerelease dynamo wheels by @richardhuo-nv in #22
  • fix: support cross-arch clusters (x86_64 login, aarch64 compute) by @alec-flowers in #17
  • feat: trace-replay benchmark with aiperf_args passthrough by @alec-flowers in #18
  • feat: add mocker backend for pipeline smoke tests by @alec-flowers in #25
  • feat: separate login-node and compute-node venvs by @alec-flowers in #29
  • feat: runtime fingerprinting, identity verification, and lockfile by @alec-flowers in #19
  • feat: configurable NATS max_payload for disagg serving by @alec-flowers in #31
  • Copy {job_id}.json into log directory for S3 upload by @KaunilD in #15
  • TRTLLM nsys profiling harness + Dynamo OTEL tracing automation by @karen-sy in #27
  • Add CODEOWNERS file by @xinli-sw in #37
  • Add CSV export for sa-bench rollup by @weireweire in #26
  • Sanitize srun output in node IP resolution by @weireweire in #38
  • feat: lockfile v2 — shareable recipe + lock section by @alec-flowers in #32
  • fix: Install maturin if not present by @trevor-m in #45
  • [codex] Add generic telemetry and custom benchmark support by @ishandhanani in #43
  • [codex] Port HF cache cleanup by @ishandhanani in #49
  • Add srt-slurm MCP spec server and preflight validation by @ishandhanani in #53
  • Push logs_url to status API eagerly and via final PUT by @ishandhanani in #54
  • [codex] narrow srtctl mcp to authoring and validation by @ishandhanani in #55
  • [codex] Keep MCP validation off host cluster config by @ishandhanani in #56
  • fix: emit aggregated resources and harden sa-bench rollup by @ishandhanani in #58
  • feat: use pre-generated custom dataset for benchmarking MTP with chat template by @richardhuo-nv in #64
  • docs: loud warnings on custom benchmark templating and nginx-off mode by @ishandhanani in #66
  • feat(sa-bench): add sglang DeepSeek-V4 tokenizer by @YAMY1234 in #73
  • feat: DeepSeek-V4-Pro perf recipes for GB300 / GB200 (1k/1k agg) by @elvischenv in #70
  • fix(orchestrate): robust container bootstrap (maturin/protoc/venv-race) by @ishandhanani in #81
  • fix(sa-bench): actionable error + warmup parity for use_chat_template by @YAMY1234 in #76
  • feat(schema): make gsm8k a first-class BenchmarkType by @ishandhanani in #82
  • [codex] add AIME benchmark by @ishandhanani in #83
  • feat(aime): rework around ns eval for reasoning-model parity by @ishandhanani in #87
  • Add scripts for wideEP; Note we can reach a PD balance with dep8, cc=2048 by @samuellees in #52
  • Revert "Add scripts for wideEP; Note we can reach a PD balance with dep8, cc=2048" by @ishandhanani in #89
  • refactor(aime): drop structured runner, ship configs/aime/{run.sh,rescore.py} by @ishandhanani in #91
  • Add the chat template to the glm5 tokenizer and apply that when sampling the requests by @richardhuo-nv in #65
  • feat(config): resolve container aliases for telemetry + preflight by @ishandhanani in #101
  • [codex] Add Dynamo nightly wheel install support by @alec-flowers in #99
  • feat(dynamo): cache hash-pinned source builds on /configs by @ishandhanani in #88
  • Add DeepSeek V4 Pro vLLM GB200 recipes by @alec-flowers in #102
  • feat(config): cluster-wide default_bash_preamble for ulimits and the like by @ishandhanani in #104
  • fix(nginx): raise file descriptor limit for nginx workers by @ishandhanani in #108
  • log: always set dyn skip log fmt by @ishandhanani in #109
  • [NOT FINAL] add wip DSv4 aggregate and disaggregate recipes by @ishandhanani in #85
  • nginx: rework to make ulimit optional by @ishandhanani in #110
  • log: demote per-srun command line to DEBUG by @cquil11 in #111
  • fix: using a setup script to install pip in trtllm venv by @richardhuo-nv in #116
  • default dyn log by @ishandhanani in #118
  • feat: Add live monitor to SRT-SLURM by @leo-cf-tian in #119
  • Pass in boostrap port on prefill by @wenscarl in #121
  • Cherry-pick lm-eval benchmark runner from sa-submission-q2-2026 by @ishandhanani in #122
  • fix: preflight accepts hf:* model paths and Docker image URIs by @Thunderbeee in #125
  • Add GLM5 B200 FP8 disaggregated recipe by @weireweire in #50
  • [NOT FINAL] Qwen3.5 fp8 mtp-off recipes by @samuellees in #128
  • feat: live in-flight batch-metrics snapshotter (opt-in) by @YAMY1234 in #115
  • feat(profiling): add extra_nsys_args for optional nsys CLI flags by @zhengd-nv in #59
  • Handle null telemetry in live metrics startup by @weireweire in #135
  • Add GPT-OSS TRT-LLM aggregated recipe by @faradawn in #132
  • feat: peak gen throughput metric in sa-bench + server-side node metrics CSV export by @zhengd-nv in #93
  • feat: first-class mooncake KV store support for SGLang backend by @ishandhanani in #136
  • feat: SGLang decode slow_down for PD disagg nsys profiling (with skip-warmup workflow) by @zhengd-nv in #60
  • sglang: enable mooncake_master HTTP metadata server + auto-inject MOONCAKE_TE_META_DATA_SERVER by @ishandhanani in #138
  • recipes: update glm5 sglang to use faster weights loading by @weireweire in #137
  • sa-bench: make SGLangDeepseekV4Tokenizer callable by @ch-wan in #144
  • fix(batch-metrics): split agg logs by DP rank by @YAMY1234 in #145
  • Capture git state for extra mounts by @YAMY1234 in #146
  • Sglang port jitter by @nvjullin in #134
  • Default SA-Bench random workers to auto by @weireweire in #147
  • Update GB300 FP4 GLM-5 recipe by @weireweire in #152
  • Expand environment variables in extra_mount paths by @weireweire in #153
  • Make batch metrics legends translucent by @YAMY1234 in #151
  • Centralize safe runtime port allocation by @weireweire in #156
  • Support default sbatch directives in srtslurm config by @weireweire in #159
  • Update GB300 FP8 GLM-5 recipe by @weireweire in #160
  • Add Nemotron Super 120B recipes by @faradawn in #150
  • Add --no-preflight CLI flag to srtctl apply by @cquil11 in #162
  • Add Qwen3.5 DeepEP MTP recipes by @YAMY1234 in #163
  • Accept legacy token metric names in telemetry plots by @weireweire in #166
  • Fixing sweep submissions for 'sweep' block by @AlphaBladez in #170
  • Add DSV4 GB300 8k1k recipe by @weireweire in #173
  • Add GB300 FP8 GLM5 MTP recipes and Upadate max-running-requests. by @weireweire in #168
  • Add spread worker placement and vLLM colocation (PR against main) by @jasonlizhengjian in #182
  • Force-reinstall maturin in portable top_of_tree dynamo source build by @Ankur-singh in #183
  • feat(config): add default_health_check cluster-level default in srtslurm.yaml by @shljessie in #180
  • fix(slurm): use --key=value for srun options (Slurm 25.11 cpu-bind regression) by @shljessie in #179
  • Added heterogenous job support by @nvjullin in #178
  • Copy resolved override/zip config into log dir for S3 upload by @KaunilD in #194
  • Add workflow: auto-release on PR merge to main by @Ankur-singh in #185
  • Fix release workflow 403 by using pull_request_target by @Ankur-singh in #200

New Contributors

Full Changelog: https://github.com/NVIDIA/srt-slurm/commits/v1.0.0