Skip GCS-dependent tokenizer unit tests under decoupled offline mode#4001
Merged
Conversation
601a77d to
541ce2b
Compare
…to eliminate 5 collection errors
541ce2b to
1ede2aa
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
igorts-git
approved these changes
May 28, 2026
bvandermoon
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds code-level skip decorators to
TrainTokenizerTest,TikTokenTest, andHFTokenizerTestintests/unit/tokenizer_test.pywhen running in decoupled offline mode (DECOUPLE_GCLOUD=TRUE).Problem & Context
These unit tests are hardcoded to train and verify tokenizers on GCS parquet datasets (
gs://maxtext-dataset/...) and copy models using thegcloudCLI in a subprocess.In an air-gapped GKE container running offline:
FileNotFoundErrorwhen JAX tries to glob the buckets.gcloudCLI), causingFileNotFoundError: 'gcloud'when spawning the download subprocess.Solution
Bypass these cloud-dependent tests dynamically using
@unittest.skipIf(is_decoupled(), ...)when offline decoupled mode is enabled. This allows the decoupled test suite to execute completely green out-of-the-box in offline environments, without affecting standard online runs.Tests
This change has been fully verified on a physical TPU7x VM slice.
Commands to Reproduce & Verify:
export DECOUPLE_GCLOUD=TRUE python3 -m pytest tests/unit/tokenizer_test.py -vvExpected Output:
All 5 collection errors are resolved and the classes are cleanly skipped:
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.