Open
Description

head_size
here comes from q.sizes()[3]
But in 'modeling_deepseek.py' of DeepSeek-V3 model,
q = q.view(bsz, q_len, self.num_heads, self.q_head_dim).transpose(1, 2)
Here self.q_head_dim = config.qk_nope_head_dim + config.qk_rope_head_dim
which is 128+64=192 according to 'config.json'.
How to understand this correctly?
Metadata
Metadata
Assignees
Labels
No labels