Skip to content

[tx] Add EP axis to deepseek#993

Merged
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
tanmaysachan:tanmay/ep_axis
Jan 30, 2026
Merged

[tx] Add EP axis to deepseek#993
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
tanmaysachan:tanmay/ep_axis

Conversation

@tanmaysachan
Copy link
Contributor

@tanmaysachan tanmaysachan commented Jan 30, 2026

Adds EP axis to deepseek, simplifies some variables to match Qwen's

Testing...

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for expert parallelism (EP axis) for the Deepseek V3 model, a crucial feature for efficiently training Mixture-of-Experts models. A security review, however, identified a potential Denial of Service vulnerability due to missing validation of the adapter_indices parameter, which could lead to crashes if out-of-bounds indices are provided. Additionally, a minor suggestion is to improve code formatting for better readability.

@pcmoritz
Copy link
Collaborator

Very nice ❤️ ! With this PR, the step time for SFT on GLM 4.7 Flash goes down from 60s to 35s :)

@pcmoritz pcmoritz merged commit ecc1b47 into NovaSky-AI:main Jan 30, 2026
4 of 6 checks passed
@tanmaysachan
Copy link
Contributor Author

tanmaysachan commented Jan 30, 2026

Thanks for the review and merge!
great to hear the boost 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants