Add Gemma 4 FLOPs & fix sliding window flops computations by gagika · Pull Request #3592 · AI-Hypercomputer/maxtext

gagika · 2026-04-07T19:44:48Z

Description

Adds TFLOPs calculations for the Gemma 4 architecture (including MoE) and fixes several inaccuracies in existing FLOPs math (sliding window overlap, vision encoder scaling, and shared KV projections).

Gemma 4 & MoE: Added Gemma 4 support. Fixed fallback MoE calculations to correctly use moe_mlp_dim and generalized MoE layer detection (num_experts > 1).
Sliding Window: Corrected local causal FLOPs to account for triangular overlap (max_target_length * window - 0.5 * window**2).
Vision Encoders: Fixed backward pass scaling (x3) for Gemma 3 and Llama 4 when parameters are unfreezed.
KV Projections: Factored in share_kv_projections for accurate QKV FLOPs.

Tests

Added maxtext_utils_flops_test.py to validate FLOPs calculations across 12 model architectures.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

github-actions · 2026-04-07T19:46:46Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-04-07T19:51:19Z

🤖 I'm sorry @gagika, but I was unable to process your request. Please see the logs for more details.

codecov · 2026-04-07T19:52:30Z

Codecov Report

❌ Patch coverage is 95.74468% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/utils/maxtext_utils.py	95.74%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-04-07T21:33:29Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This Pull Request introduces Gemma 4 FLOPs calculations and significantly improves the accuracy of existing FLOPs math, particularly for sliding window attention and mixed attention architectures. The addition of a comprehensive test suite covering multiple model families is a major highlight and ensures the reliability of these critical metrics.

🔍 General Feedback

Great Test Coverage: The new maxtext_utils_flops_test.py is excellent. It uses a robust 6 * params * tokens verification strategy that provides high confidence in the computed TFLOPs across various architectures.
Improved Accuracy: The fixes for sliding window area and vision encoder scaling (backward pass) are well-timed and correct.
Inconsistency in Shared KV Projections: There is a potential logic error in how share_kv_projections is applied to mixed attention models in the main caller. One unit test specifically assumes local layers do not share KV projections even when the flag is True, but the code currently applies it to both.
MoE Fallback Logic: The fallback for MoE layer detection is now more generalized, which is good, but might be too broad for future hybrid architectures.

github-actions · 2026-04-07T22:44:19Z

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This pull request significantly improves the accuracy of FLOPs and MFU (Model Flops Utilization) calculations across multiple architectures, with a focus on Gemma 4 and corrected sliding window logic. The implementation is thorough, including a new comprehensive test suite that validates calculations for 12 different model configurations.

🔍 General Feedback

Accuracy Improvements: The switch to a precise triangular overlap formula for sliding window attention and the inclusion of backward pass FLOPs for vision encoders are excellent updates that prevent MFU over-estimation.
Architectural Coverage: The addition of Gemma 4 specific logic and the generalization of MoE layer detection make the utilities much more robust for future model support.
Testing: The new maxtext_utils_flops_test.py is a great addition, providing clear manual-calculation-based verification for various architectures.
Suggestions: I've provided a few suggestions to further generalize the MoE layer detection and ensure consistent dimension usage in MoE FFN calculations.

gobbleturk · 2026-04-07T22:49:19Z

I love the test! I am not sure how we have gotten this far without testing our tflops calculation...

gagika added the gemini-review label Apr 7, 2026

gagika force-pushed the agagik-gemma4-flops branch from e46459c to 038d9e9 Compare April 7, 2026 21:12

gagika removed the gemini-review label Apr 7, 2026

gagika changed the title ~~Gemma4 TFLOPs calculations and fix causal attention flops for sliding window attention~~ Add Gemma 4 FLOPs & fix sliding window flops computations Apr 7, 2026

gagika force-pushed the agagik-gemma4-flops branch from 038d9e9 to ccbcf03 Compare April 7, 2026 21:29

gagika added the gemini-review label Apr 7, 2026

github-actions Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/unit/maxtext_utils_flops_test.py Outdated

Comment thread src/maxtext/utils/maxtext_utils.py

Comment thread src/maxtext/utils/maxtext_utils.py

Comment thread tests/unit/maxtext_utils_flops_test.py Outdated

gobbleturk approved these changes Apr 7, 2026

View reviewed changes

gagika force-pushed the agagik-gemma4-flops branch from ccbcf03 to ffd741b Compare April 7, 2026 22:43

gagika added gemini-review and removed gemini-review labels Apr 7, 2026

github-actions Bot reviewed Apr 7, 2026

View reviewed changes

gagika marked this pull request as ready for review April 8, 2026 02:41

gagika requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, hengtaoguo, khatwanimohit, richjames0, shralex and vipannalla as code owners April 8, 2026 02:41

gagika requested review from NicoGrande, dipannita08, igorts-git, jesselu-google, jiangjy1982 and suexu1025 as code owners April 8, 2026 02:41

NuojCheng reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/unit/maxtext_utils_flops_test.py Outdated

gagika force-pushed the agagik-gemma4-flops branch 4 times, most recently from 589589b to 4c726e3 Compare April 10, 2026 17:01

gagika requested a review from abhinavclemson as a code owner April 10, 2026 17:01

NuojCheng approved these changes Apr 10, 2026

View reviewed changes

gagika added pull ready labels Apr 10, 2026

gagika force-pushed the agagik-gemma4-flops branch from 4c726e3 to 5c9d56d Compare April 11, 2026 00:52

Add Gemma 4 FLOPs & fix sliding window flops computations

6288c23

gagika force-pushed the agagik-gemma4-flops branch from 5c9d56d to 6288c23 Compare April 11, 2026 00:54

gagika added pull ready and removed pull ready labels Apr 11, 2026

copybara-service Bot merged commit 360cd5a into main Apr 11, 2026
28 of 29 checks passed

copybara-service Bot deleted the agagik-gemma4-flops branch April 11, 2026 01:29

Conversation

gagika commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gobbleturk commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gagika commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading