fix gemma4 num attention head bugs by mingxiang1006 · Pull Request #7975 · deepspeedai/DeepSpeed

mingxiang1006 · 2026-04-15T05:55:03Z

Error as there is module under Gemma4Config, either Gemma4 text config, visual or audio config, to grab the num attention head. This will cause run time error during deepspeed launch.

chatgpt-codex-connector · 2026-04-15T05:55:07Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

delock · 2026-04-15T07:22:19Z

Hi @mingxiang1006 , thanks for the fix. I have two questions:

Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?
I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

mingxiang1006 · 2026-04-15T08:02:01Z

Hi @mingxiang1006 , thanks for the fix. I have two questions:

Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?

I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

HI @delock --> it was a temporary fix . agree with your suggestion, should fall back to text config, if hf_model_config does not have the key.

Yes it need further thought into this, when to trigger text config and vision.

delock · 2026-04-15T09:28:42Z

Hi @mingxiang1006 , thanks for the fix. I have two questions:

Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?

I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

HI @delock --> it was a temporary fix . agree with your suggestion, should fall back to text config, if hf_model_config does not have the key.

Yes it need further thought into this, when to trigger text config and vision.

Hi, we can start from making text_config a fallback path.

For pick between text config and vision, does the modeling know which one is being used? It might be okay to stay with text_config for the time being, because Ulysses SP are likely work on text than vision, but I want to have better understanding of the mechanism behind Gemma4.

fix gemma4 num attention head bugs

62ac4c6

mingxiang1006 requested review from tjruwase and tohtana as code owners April 15, 2026 05:55

revise logic of fallback for text config

96d503d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gemma4 num attention head bugs#7975

fix gemma4 num attention head bugs#7975
mingxiang1006 wants to merge 2 commits intodeepspeedai:masterfrom
mingxiang1006:master

mingxiang1006 commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 15, 2026

Uh oh!

delock commented Apr 15, 2026

Uh oh!

mingxiang1006 commented Apr 15, 2026

Uh oh!

delock commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mingxiang1006 commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 15, 2026

Uh oh!

delock commented Apr 15, 2026

Uh oh!

mingxiang1006 commented Apr 15, 2026

Uh oh!

delock commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants