Skip to content

Conversation

@TaekyungHeo
Copy link
Member

@TaekyungHeo TaekyungHeo commented Aug 11, 2025

Summary

Support for DeepSeekR1 model with SGLang / AI Dynamo

  1. Add a backend field to AIDynamoArgs.
  2. Add dynamo_repo as an installable package.
  3. Add support for a new backend, SGLang.

RM4572636

Test Plan

  1. CI passes
  2. Run on EOS

Checklist
1. Single-worker AI dynamo works

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/deepseek_r1_distill_llama_8b.toml   
[INFO] System Name: EOS
[INFO] Scheduler: slurm
[INFO] Test Scenario Name: deepseek_r1_distill_llama_8b
[INFO] Checking if test templates are installed.
[INFO] Test Scenario: deepseek_r1_distill_llama_8b

Section Name: Tests.1
  Test Name: vllm
  Description: vllm
  No dependencies
[INFO] Initializing Runner [RUN] mode
[INFO] Creating SlurmRunner
[INFO] Starting test: Tests.1
[INFO] Running test: Tests.1
[INFO] Submitted slurm job: 3428521

https://drive.google.com/drive/folders/1kjIchFDYKUJTMzK5iCmcHgO_swJoS-M9?usp=drive_link

2. Multi-worker AI dynamo works

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/dsr1_70b_3k_150.toml
[INFO] System Name: EOS                                                                  
[INFO] Scheduler: slurm                                                                  
[INFO] Test Scenario Name: dsr1_70b_3k_150                                               
[INFO] Checking if test templates are installed.                                         
[INFO] Test Scenario: dsr1_70b_3k_150                                                                                                                                              
                                                                                                                                                                                   
Section Name: Tests.1                                                                                                                                                              
  Test Name: vllm                                                                                                                                                                  
  Description: vllm                                                                                                                                                                
  No dependencies                                                                                                                                                                  
[INFO] Initializing Runner [RUN] mode                                                                                                                                              
[INFO] Creating SlurmRunner                                                              
[INFO] Starting test: Tests.1                                                                                                                                                      
[INFO] Running test: Tests.1                                                             
[INFO] Submitted slurm job: 3429601                                                                                                                                                
[INFO] Job completed: Tests.1 (iteration 1 of 1)                                         
[INFO] All test scenario results stored at: results/dsr1_70b_3k_150_2025-08-12_12-00-05         

https://drive.google.com/drive/folders/1CtcJK8JBqP8cjfGadyddSZxoh66Afip0?usp=drive_link

3. SGlang DSR1 works
Take https://github.com/Mellanox/cloudaix/pull/329

$ python cloudaix.py run --system-config conf/common/system/eos.toml --tests-dir conf/staging/ai_dynamo/test --test-scenario conf/staging/ai_dynamo/test_scenario/deepseek_ai_DSR1.toml 
[INFO] System Name: EOS
[INFO] Scheduler: slurm
[INFO] Test Scenario Name: deepseek_ai_DeepSeek_R1
[INFO] Checking if test templates are installed.
[INFO] Test Scenario: deepseek_ai_DeepSeek_R1

Section Name: Tests.1
  Test Name: sglang
  Description: sglang
  No dependencies
[INFO] Initializing Runner [RUN] mode
[INFO] Creating SlurmRunner
[INFO] Starting test: Tests.1
[INFO] Running test: Tests.1
[INFO] Submitted slurm job: 3427522

https://drive.google.com/drive/folders/1tGurw5xqoV8S3XWbkUBbmwuwvdi9ELOo?usp=drive_link

@TaekyungHeo TaekyungHeo force-pushed the rm-4572636 branch 23 times, most recently from 9dec6b4 to 640cd84 Compare August 12, 2025 17:53
@TaekyungHeo
Copy link
Member Author

@karya0 , please review.

@karya0
Copy link
Contributor

karya0 commented Aug 13, 2025

Quick note for making sure we are aligned. We eventually want to support all Dynamo backends (vllm, sglang, trtllm). We started with vllm due to narrow down the scope due to time limitation. For all backends, we want to support all supported models -- DSR1 Distill Llama 3.1 70B, DSR1 v3, etc.

The run.sh that I have in my branch is backend and model agnostic so far. Any model/backend-specific things are handled in the toml files. Having said that, my run.sh doesn't support DSR1 v3 so we have to find a way to merge the two scripts in a reasonable manner.

@TaekyungHeo
Copy link
Member Author

TaekyungHeo commented Aug 13, 2025

TODOs for Taekyung

  1. Make tp-size and dp-size configurable.
  2. Support mounting any JSON files. This will be a follow-up PR. Kapil will provide additional example JSON files.

@TaekyungHeo
Copy link
Member Author

Both TODOs above have been completed. The following PRs are blocked by this PR. Please review and confirm this PR, @karya0.

  1. Set tp-size and dp-size from args if provided, else use total_gpus #649
  2. Support mounting any JSON files for --dynamo-deepep-config #650
  3. Support the TRT-LLM backend in AI Dynamo #648

Copy link
Contributor

@karya0 karya0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left several comments. Feel free to resolve them as you see fit. As discussed offline, there will be follow-on PRs to resolve some of these issues and/or improve the UX.

Comment on lines +191 to +192
prefill_args["--port"]=${dynamo_args["prefill-port"]}
decode_args["--port"]=${dynamo_args["decode-port"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I used some patterns like %port% and %model% in toml that get replaced with dynamo_args["port"] and dynamo_args["model"]. This might scale a bit better when dealing with multiple backends.

@TaekyungHeo TaekyungHeo merged commit 24774c5 into NVIDIA:main Aug 15, 2025
4 checks passed
@TaekyungHeo
Copy link
Member Author

Thanks, @karya0 . I will resolve them in follow-up PRs.

@TaekyungHeo
Copy link
Member Author

TaekyungHeo commented Aug 15, 2025

@karya0

Will be resolved by #653

  1. Use _gpus_per_node (link)
  2. prefill-initialized-regex from TOML (link1, link2)
  3. Partially resolved - Initializing prefill-cmd (link)
    • My response

TODOs in follow-up PRs

  1. Differentiate HF_PATH from MODEL_PATH (link1, link2). Currently, these two concepts are mixed up.
  2. Deprecate prefill.num_nodes and decode.num_nodes (link)
  3. Multi-node support for SGLang (link)

Potentially Rejected

  1. Mounting dynamo_repo_path (link)
    • My response: There is no strong reason to make this mount optional. I also need dynamo_repo for K8s support. Therefore, it should always be mounted. There is no downside.
  2. SGLang-specific container mounts (link)
    • My response: It is acceptable to use if-else statements to support different backends. There's no strong reason not to have backend-specific container mounts.

Discussion Needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants