Skip to content

fix gemma4 num attention head bugs#7975

Open
mingxiang1006 wants to merge 2 commits intodeepspeedai:masterfrom
mingxiang1006:master
Open

fix gemma4 num attention head bugs#7975
mingxiang1006 wants to merge 2 commits intodeepspeedai:masterfrom
mingxiang1006:master

Conversation

@mingxiang1006
Copy link
Copy Markdown

Error as there is module under Gemma4Config, either Gemma4 text config, visual or audio config, to grab the num attention head. This will cause run time error during deepspeed launch.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 15, 2026

Hi @mingxiang1006 , thanks for the fix. I have two questions:

  1. Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?
  2. I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

@mingxiang1006
Copy link
Copy Markdown
Author

Hi @mingxiang1006 , thanks for the fix. I have two questions:

  1. Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?
  2. I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

HI @delock --> it was a temporary fix . agree with your suggestion, should fall back to text config, if hf_model_config does not have the key.

Yes it need further thought into this, when to trigger text config and vision.

@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 15, 2026

Hi @mingxiang1006 , thanks for the fix. I have two questions:

  1. Should text_config be used as a fallback when hf_model_config does not have the keys needed, instead of going there when it first appears?
  2. I saw vision_config and text_config has different number of attention heads, if text_config is picked, does it mean during training/inference only text related weights are used?

HI @delock --> it was a temporary fix . agree with your suggestion, should fall back to text config, if hf_model_config does not have the key.

Yes it need further thought into this, when to trigger text config and vision.

Hi, we can start from making text_config a fallback path.

For pick between text config and vision, does the modeling know which one is being used? It might be okay to stay with text_config for the time being, because Ulysses SP are likely work on text than vision, but I want to have better understanding of the mechanism behind Gemma4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants