-
Notifications
You must be signed in to change notification settings - Fork 42
Support for DeepSeekR1 model with SGLang / AI Dynamo #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9dec6b4 to
640cd84
Compare
28a1f06 to
7ea6f8d
Compare
|
@karya0 , please review. |
|
Quick note for making sure we are aligned. We eventually want to support all Dynamo backends (vllm, sglang, trtllm). We started with vllm due to narrow down the scope due to time limitation. For all backends, we want to support all supported models -- DSR1 Distill Llama 3.1 70B, DSR1 v3, etc. The run.sh that I have in my branch is backend and model agnostic so far. Any model/backend-specific things are handled in the toml files. Having said that, my run.sh doesn't support DSR1 v3 so we have to find a way to merge the two scripts in a reasonable manner. |
|
TODOs for Taekyung
|
|
Both TODOs above have been completed. The following PRs are blocked by this PR. Please review and confirm this PR, @karya0. |
karya0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left several comments. Feel free to resolve them as you see fit. As discussed offline, there will be follow-on PRs to resolve some of these issues and/or improve the UX.
| prefill_args["--port"]=${dynamo_args["prefill-port"]} | ||
| decode_args["--port"]=${dynamo_args["decode-port"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I used some patterns like %port% and %model% in toml that get replaced with dynamo_args["port"] and dynamo_args["model"]. This might scale a bit better when dealing with multiple backends.
|
Thanks, @karya0 . I will resolve them in follow-up PRs. |
|
Will be resolved by #653
TODOs in follow-up PRs
Potentially Rejected
Discussion Needed |
Summary
Support for DeepSeekR1 model with SGLang / AI Dynamo
backendfield to AIDynamoArgs.RM4572636
Test Plan
Checklist
1. Single-worker AI dynamo works
https://drive.google.com/drive/folders/1kjIchFDYKUJTMzK5iCmcHgO_swJoS-M9?usp=drive_link
2. Multi-worker AI dynamo works
https://drive.google.com/drive/folders/1CtcJK8JBqP8cjfGadyddSZxoh66Afip0?usp=drive_link
3. SGlang DSR1 works
Take https://github.com/Mellanox/cloudaix/pull/329
https://drive.google.com/drive/folders/1tGurw5xqoV8S3XWbkUBbmwuwvdi9ELOo?usp=drive_link