[pull] master from deepspeedai:master by pull[bot] · Pull Request #110 · QSLee-Net/DeepSpeed

pull · 2025-10-22T22:23:12Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

This PR is fixing this: ``` [rank0]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 985, in grad_handling_hook [rank0]: self.process_gradients(param, i) [rank0]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 1524, in process_gradients [rank0]: self.reduce_ready_partitions_and_remove_grads(param, i) [rank0]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 1528, in reduce_ready_partitions_and_remove_grads [rank0]: self.reduce_independent_p_g_buckets_and_remove_grads(param, i) [rank0]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py", line 1006, in reduce_independent_p_g_buckets_and_remove_grads [rank0]: self.report_ipg_memory_usage("In ipg_remove_grads before reduce_ipg_grads", param.numel(), param.dtype) [rank0]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/base_optimizer.py", line 70, in report_ipg_memory_usage [rank0]: bucket = self.ipg_buckets[dt] [rank0]: ~~~~~~~~~~~~~~~~^^^^ [rank0]: KeyError: torch.bfloat16 ``` the problem doesn't exist if: `seq_parallel_communication_data_type: bf16` is used, but fails with `fp32` (or no setting). In this PR I'm syncing with the z3 implementation which doesn't pass the `dtype` arg and lets the traversal of existing dtypes do the thing. https://github.com/deepspeedai/DeepSpeed/blob/407708cdb6e48dbff971b0f03ec4613d0f084a4b/deepspeed/runtime/base_optimizer.py#L66-L75 Fixes: #7607

Ulysses/ALST integration with HF Accelerate: - Allow `UlyssesSPAttentionHF.register_with_transformers` to get a `model` obj as an argument, to match HF accelerate's workflow - Fix existing Ulysses' tests to tests z2 instead of z1 - Improve documentation - Add a defensive check The HF Accelerate PR that depends on this PR is here huggingface/accelerate#3817 --------- Signed-off-by: Stas Bekman <stas@stason.org>

stas00 added 2 commits October 22, 2025 17:15

pull bot locked and limited conversation to collaborators Oct 22, 2025

pull bot added the ⤵️ pull label Oct 22, 2025

pull bot merged commit 64c0052 into QSLee-Net:master Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from deepspeedai:master#110

[pull] master from deepspeedai:master#110
pull[bot] merged 2 commits intoQSLee-Net:masterfrom
deepspeedai:master

pull bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull bot commented Oct 22, 2025 •

edited

Loading