Is Llama4TextL2Norm meant to be RMS norm? #37934

0x6b64 · 2025-05-02T21:28:05Z

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/modeling_llama4.py#L118

x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

This is just the rms norm?

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-05-06T13:17:41Z

That does look like the RMSNorm computation, yes. However, RMSNorm was only added to PyTorch in 2.3 or 2.4 I think, so we need to do it manually until our minimum supported torch version catches up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Llama4TextL2Norm meant to be RMS norm? #37934

Is Llama4TextL2Norm meant to be RMS norm? #37934

0x6b64 commented May 2, 2025 •

edited

Loading

Rocketknight1 commented May 6, 2025

Is Llama4TextL2Norm meant to be RMS norm? #37934

Is Llama4TextL2Norm meant to be RMS norm? #37934

Comments

0x6b64 commented May 2, 2025 • edited Loading

Rocketknight1 commented May 6, 2025

0x6b64 commented May 2, 2025 •

edited

Loading