Support for DeepSeekR1 model with SGLang / AI Dynamo #641

TaekyungHeo · 2025-08-11T17:53:03Z

Summary

Support for DeepSeekR1 model with SGLang / AI Dynamo

Add a backend field to AIDynamoArgs.
Add dynamo_repo as an installable package.
Add support for a new backend, SGLang.

Test Plan

CI passes
Run on EOS

Checklist
1. Single-worker AI dynamo works

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/deepseek_r1_distill_llama_8b.toml   
[INFO] System Name: EOS
[INFO] Scheduler: slurm
[INFO] Test Scenario Name: deepseek_r1_distill_llama_8b
[INFO] Checking if test templates are installed.
[INFO] Test Scenario: deepseek_r1_distill_llama_8b

Section Name: Tests.1
  Test Name: vllm
  Description: vllm
  No dependencies
[INFO] Initializing Runner [RUN] mode
[INFO] Creating SlurmRunner
[INFO] Starting test: Tests.1
[INFO] Running test: Tests.1
[INFO] Submitted slurm job: 3428521

https://drive.google.com/drive/folders/1kjIchFDYKUJTMzK5iCmcHgO_swJoS-M9?usp=drive_link

2. Multi-worker AI dynamo works

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/dsr1_70b_3k_150.toml
[INFO] System Name: EOS                                                                  
[INFO] Scheduler: slurm                                                                  
[INFO] Test Scenario Name: dsr1_70b_3k_150                                               
[INFO] Checking if test templates are installed.                                         
[INFO] Test Scenario: dsr1_70b_3k_150                                                                                                                                              
                                                                                                                                                                                   
Section Name: Tests.1                                                                                                                                                              
  Test Name: vllm                                                                                                                                                                  
  Description: vllm                                                                                                                                                                
  No dependencies                                                                                                                                                                  
[INFO] Initializing Runner [RUN] mode                                                                                                                                              
[INFO] Creating SlurmRunner                                                              
[INFO] Starting test: Tests.1                                                                                                                                                      
[INFO] Running test: Tests.1                                                             
[INFO] Submitted slurm job: 3429601                                                                                                                                                
[INFO] Job completed: Tests.1 (iteration 1 of 1)                                         
[INFO] All test scenario results stored at: results/dsr1_70b_3k_150_2025-08-12_12-00-05

https://drive.google.com/drive/folders/1CtcJK8JBqP8cjfGadyddSZxoh66Afip0?usp=drive_link

3. SGlang DSR1 works
Take https://github.com/Mellanox/cloudaix/pull/329

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/deepseek_ai_DSR1.toml 
[INFO] System Name: EOS
[INFO] Scheduler: slurm
[INFO] Test Scenario Name: deepseek_ai_DeepSeek_R1
[INFO] Checking if test templates are installed.
[INFO] Test Scenario: deepseek_ai_DeepSeek_R1

Section Name: Tests.1
  Test Name: sglang
  Description: sglang
  No dependencies
[INFO] Initializing Runner [RUN] mode
[INFO] Creating SlurmRunner
[INFO] Starting test: Tests.1
[INFO] Running test: Tests.1
[INFO] Submitted slurm job: 3427522

https://drive.google.com/drive/folders/1tGurw5xqoV8S3XWbkUBbmwuwvdi9ELOo?usp=drive_link

TaekyungHeo · 2025-08-13T19:06:50Z

@karya0 , please review.

src/cloudai/workloads/ai_dynamo/ai_dynamo.py

src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py

karya0 · 2025-08-13T19:37:36Z

Quick note for making sure we are aligned. We eventually want to support all Dynamo backends (vllm, sglang, trtllm). We started with vllm due to narrow down the scope due to time limitation. For all backends, we want to support all supported models -- DSR1 Distill Llama 3.1 70B, DSR1 v3, etc.

The run.sh that I have in my branch is backend and model agnostic so far. Any model/backend-specific things are handled in the toml files. Having said that, my run.sh doesn't support DSR1 v3 so we have to find a way to merge the two scripts in a reasonable manner.

src/cloudai/workloads/ai_dynamo/ai_dynamo.sh

TaekyungHeo · 2025-08-13T21:15:52Z

TODOs for Taekyung

Make tp-size and dp-size configurable.
Support mounting any JSON files. This will be a follow-up PR. Kapil will provide additional example JSON files.

TaekyungHeo · 2025-08-14T01:06:08Z

Both TODOs above have been completed. The following PRs are blocked by this PR. Please review and confirm this PR, @karya0.

karya0

I have left several comments. Feel free to resolve them as you see fit. As discussed offline, there will be follow-on PRs to resolve some of these issues and/or improve the UX.

src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py

karya0 · 2025-08-15T00:37:18Z

src/cloudai/workloads/ai_dynamo/ai_dynamo.sh

+  prefill_args["--port"]=${dynamo_args["prefill-port"]}
+  decode_args["--port"]=${dynamo_args["decode-port"]}


Nit: I used some patterns like %port% and %model% in toml that get replaced with dynamo_args["port"] and dynamo_args["model"]. This might scale a bit better when dealing with multiple backends.

src/cloudai/workloads/ai_dynamo/ai_dynamo.sh

TaekyungHeo · 2025-08-15T02:33:45Z

Thanks, @karya0 . I will resolve them in follow-up PRs.

TaekyungHeo · 2025-08-15T03:10:59Z

@karya0

Will be resolved by #653

Use _gpus_per_node (link)
prefill-initialized-regex from TOML (link1, link2)
Partially resolved - Initializing prefill-cmd (link)
- My response
  - It is okay to have an initial value. This does not prevent users from specifying prefill-cmd.
  - Updated the sglang.toml file to set prefill-cmd as we did for vllm.toml (https://github.com/Mellanox/cloudaix/pull/332).

TODOs in follow-up PRs

Differentiate HF_PATH from MODEL_PATH (link1, link2). Currently, these two concepts are mixed up.
Deprecate prefill.num_nodes and decode.num_nodes (link)
Multi-node support for SGLang (link)

Potentially Rejected

Mounting dynamo_repo_path (link)
- My response: There is no strong reason to make this mount optional. I also need dynamo_repo for K8s support. Therefore, it should always be mounted. There is no downside.
SGLang-specific container mounts (link)
- My response: It is acceptable to use if-else statements to support different backends. There's no strong reason not to have backend-specific container mounts.

Discussion Needed

TaekyungHeo added the feature label Aug 11, 2025

TaekyungHeo force-pushed the rm-4572636 branch 23 times, most recently from 9dec6b4 to 640cd84 Compare August 12, 2025 17:53

Add SGLang backend support for DSR1

7ea6f8d

TaekyungHeo force-pushed the rm-4572636 branch from 28a1f06 to 7ea6f8d Compare August 12, 2025 18:58

TaekyungHeo marked this pull request as ready for review August 12, 2025 20:17

TaekyungHeo requested review from amaslenn, srinivas212 and srivatsankrishnan as code owners August 12, 2025 20:17

amaslenn approved these changes Aug 13, 2025

View reviewed changes

srivatsankrishnan reviewed Aug 13, 2025

View reviewed changes

src/cloudai/workloads/ai_dynamo/ai_dynamo.py Show resolved Hide resolved

Merge branch 'main' into rm-4572636

c97234f

srivatsankrishnan reviewed Aug 13, 2025

View reviewed changes

src/cloudai/workloads/ai_dynamo/slurm_command_gen_strategy.py Show resolved Hide resolved

srivatsankrishnan reviewed Aug 13, 2025

View reviewed changes

src/cloudai/workloads/ai_dynamo/ai_dynamo.sh Show resolved Hide resolved

This was referenced Aug 13, 2025

Set tp-size and dp-size from args if provided, else use total_gpus #649

Merged

Support mounting any JSON files for --dynamo-deepep-config #650

Merged

karya0 approved these changes Aug 15, 2025

View reviewed changes

TaekyungHeo merged commit 24774c5 into NVIDIA:main Aug 15, 2025
4 checks passed

TaekyungHeo mentioned this pull request Aug 15, 2025

Follow-up for PR641 (Support for DeepSeekR1 model with SGLang / AI Dynamo) #653

Merged

This was referenced Aug 15, 2025

Mount huggingface_home_container_path unconditionally #655

Merged

Refactor nodelist validation to check DYNAMO_NODELIST only if both args empty #658

Merged

		prefill_args["--port"]=${dynamo_args["prefill-port"]}
		decode_args["--port"]=${dynamo_args["decode-port"]}

Support for DeepSeekR1 model with SGLang / AI Dynamo #641

Support for DeepSeekR1 model with SGLang / AI Dynamo #641

Uh oh!

Conversation

TaekyungHeo commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

TaekyungHeo commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

karya0 commented Aug 13, 2025

Uh oh!

Uh oh!

TaekyungHeo commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TaekyungHeo commented Aug 14, 2025

Uh oh!

karya0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karya0 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TaekyungHeo commented Aug 15, 2025

Uh oh!

TaekyungHeo commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TaekyungHeo commented Aug 11, 2025 •

edited

Loading

TaekyungHeo commented Aug 13, 2025 •

edited

Loading

TaekyungHeo commented Aug 15, 2025 •

edited

Loading