Fix matmul precision and vmem in test scripts #2712

shuningjin · 2025-11-18T21:58:18Z

Description

fix vmem for gpt oss test

Problem: gpt oss test has RESOURCE_EXHAUSTED: Ran out of memory in memory space vmem in megablox gmm operation during forward pass.
Fix b/461483388
Solution: export LIBTPU_INIT_ARGS='--xla_tpu_scoped_vmem_limit_kib=81920'

fix matmul precision in multiple tests

Problem:matmul_precision=float32 used by multiple test scripts was working previously. It now gives error Input should be 'default', 'high' or 'highest' [type=enum, input_value='float32', input_type=str] (see deepseek run: cmd, log). This is due to recent commit.
Solution: adding float32 to enum value now. Note it is equivalent to matmul_precision=highest, reference

Tests

run forward pass for gpt-oss on v5p-8 (xpk), with export LIBTPU_INIT_ARGS='--xla_tpu_scoped_vmem_limit_kib=81920' and matmul_precision=float32

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Fix matmul precision and vmem in test scripts

5248e89

shuningjin force-pushed the shuningjin-fix branch from 8f195a1 to 5248e89 Compare November 18, 2025 22:25

shuningjin marked this pull request as ready for review November 18, 2025 23:57

shuningjin requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners November 18, 2025 23:57

aireenmei approved these changes Nov 19, 2025

View reviewed changes

RissyRan approved these changes Nov 19, 2025

View reviewed changes

shuningjin added the pull ready label Nov 19, 2025

copybara-service bot merged commit 32380ea into main Nov 19, 2025
41 of 44 checks passed

copybara-service bot deleted the shuningjin-fix branch November 19, 2025 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix matmul precision and vmem in test scripts #2712

Fix matmul precision and vmem in test scripts #2712

Uh oh!

shuningjin commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix matmul precision and vmem in test scripts #2712

Fix matmul precision and vmem in test scripts #2712

Uh oh!

Conversation

shuningjin commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shuningjin commented Nov 18, 2025 •

edited

Loading