fix gemma4 dtype mismatch by aireenmei · Pull Request #3746 · AI-Hypercomputer/maxtext

aireenmei · 2026-04-25T02:37:52Z

Description

When fixing another bug in #3727 , I changed to use config.weight_dtype from config.dtype to initialize layer_scalar because it's a weight, this requires casting layer_scalar back to dtype during use time, or it causes error The input carry component c[1] has type bfloat16[2048,4096,2816] but the corresponding output carry component has type float32[2048,4096,2816], so the dtypes do not match. when weight_dtype=float32, dtype=bfloat16

Tests

log with error
working log

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-04-25T02:42:44Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/models/gemma4.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

shralex

typo in title dtye

gagika · 2026-04-25T05:27:57Z

    next_layer_addition = mlp_lnx + residual
    layer_output = next_layer_addition
-    layer_output = layer_output * self.layer_scalar.value
+    layer_output = layer_output * jnp.asarray(self.layer_scalar.value, cfg.dtype)


nit: self.layer_scalar.value.astype(cfg.dtype) probably matches better maxtext style.

fix gemma4 dtye mismatch

9184714

aireenmei marked this pull request as ready for review April 25, 2026 02:55

aireenmei requested review from NicoGrande, NuojCheng, RissyRan, bvandermoon, gagika, gobbleturk, jesselu-google, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners April 25, 2026 02:55

NuojCheng approved these changes Apr 25, 2026

View reviewed changes

shralex approved these changes Apr 25, 2026

View reviewed changes

aireenmei changed the title ~~fix gemma4 dtye mismatch~~ fix gemma4 dtype mismatch Apr 25, 2026

aireenmei added the pull ready label Apr 25, 2026

gagika approved these changes Apr 25, 2026

View reviewed changes

copybara-service Bot merged commit 59e0f17 into main Apr 25, 2026
63 of 68 checks passed

copybara-service Bot deleted the aireen/fix_gemma4_dtype branch April 25, 2026 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gemma4 dtype mismatch#3746

fix gemma4 dtype mismatch#3746
copybara-service[bot] merged 1 commit intomainfrom
aireen/fix_gemma4_dtype

aireenmei commented Apr 25, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 25, 2026

Uh oh!

shralex left a comment

Uh oh!

gagika Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aireenmei commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Apr 25, 2026

Codecov Report

Uh oh!

shralex left a comment

Choose a reason for hiding this comment

Uh oh!

gagika Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aireenmei commented Apr 25, 2026 •

edited

Loading