special_image_mask handling can get hit by accidental same embedding value at certain dims #38012

lancercat · 2025-05-08T07:47:52Z

transformers/src/transformers/models/gemma3/modeling_gemma3.py

Line 1262 in 5c47d08

special_image_mask = inputs_embeds == self.get_input_embeddings()(

FWIS should be testing if the [whole vector] is same enough to the special token embedding,
instead of testing same at each dim,
which should be something like

            if input_ids is None:
                special_image_mask =((inputs_embeds - self.get_input_embeddings()(
                    torch.tensor(self.config.image_token_index, dtype=torch.long, device=inputs_embeds.device)
                )).abs().sum(-1)<0.009).unsqueeze(-1);
            else:
                special_image_mask = (input_ids == self.config.image_token_index).unsqueeze(-1)
            special_image_mask = special_image_mask.expand_as(inputs_embeds).to(inputs_embeds.device)

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-05-08T12:33:31Z

@lancercat It should not be an issue if embeds are obtained using the same model with the same dtype. Is it failing for you at inference time?

lancercat · 2025-05-08T13:29:11Z

No.
Bcs inference takes id and thus does not compare embedding.
Doing some prefix finetuning and it suddenly became angry bcs the random embedding hits some one or two dims :)

zucchini-nlp · 2025-05-08T13:55:01Z

Ah, oke, so IIUC the issue, when doing prefix tuning some virtual inputs embeds get assigned an image token idx. I still dont think this needs a fix, because comparing the diff to an arbitrary <0.009 will lead to more issues if vocabulary has embeddings nearly identical to the image token

If you are doing prefix tuning with PEFT, we can try to fix it on PEFT side (though I think PEFT doesn't expand embeds but rather cache). If it was a custom script for tuning, I suggest to initialize virtual embeddings to be non-equal to the image token

lancercat · 2025-05-08T14:06:47Z

I am already meddling with the PEFT.
The problem is bigger than i thought due to Gemma's 4d attention mask and its right alignment magick :)
I am currently writing a dedicated PEFT class for it...

Anyway, if the embedding comparison behaviour is not intended, maybe remove the logic?
as the current comparison does not seem to do what it seems to do either...

zucchini-nlp · 2025-05-08T14:21:05Z

The feature works when users pass embeds = get_input_embeddings(input_ids) to forward along with the pixels. It should not be removed

lancercat changed the title ~~special_image_mask handling can get hit by accidental same embedding at certain dims~~ special_image_mask handling can get hit by accidental same embedding value at certain dims May 8, 2025

lancercat closed this as completed May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

special_image_mask handling can get hit by accidental same embedding value at certain dims #38012

special_image_mask handling can get hit by accidental same embedding value at certain dims #38012

lancercat commented May 8, 2025 •

edited

Loading

zucchini-nlp commented May 8, 2025

Uh oh!

lancercat commented May 8, 2025

Uh oh!

zucchini-nlp commented May 8, 2025

Uh oh!

lancercat commented May 8, 2025 •

edited

Loading

Uh oh!

zucchini-nlp commented May 8, 2025

Uh oh!

special_image_mask handling can get hit by accidental same embedding value at certain dims #38012

special_image_mask handling can get hit by accidental same embedding value at certain dims #38012

Comments

lancercat commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

zucchini-nlp commented May 8, 2025

Uh oh!

lancercat commented May 8, 2025

Uh oh!

zucchini-nlp commented May 8, 2025

Uh oh!

lancercat commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented May 8, 2025

Uh oh!

lancercat commented May 8, 2025 •

edited

Loading

lancercat commented May 8, 2025 •

edited

Loading