[BUG] Deepspeed does not update the model when using "Qwen/Qwen2.5-3B" and is fine with ""Qwen/Qwen2.5-1.%B""

**Describe the bug**
I know this sounds very weird. However, when I use the deepspeed to optimize a "Qwen/Qwen2.5-3B" model, the model does not update at all. The same exact training code works with "Qwen/Qwen2.5-1.5B". Also checked and optimizing "meta-llama/Llama-3.2-3B" does not work. The parameters remain exactly the same. However, by just setting "torch_adam" to true, the issue goes away. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Deepspeed does not update the model when using "Qwen/Qwen2.5-3B" and is fine with ""Qwen/Qwen2.5-1.%B"" #7077

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Deepspeed does not update the model when using "Qwen/Qwen2.5-3B" and is fine with ""Qwen/Qwen2.5-1.%B"" #7077

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions