Fix decoupled-mode pytest collection: stub google.cloud.storage import#88
Merged
Conversation
Wire ROCm (rocm-unit, rocm-decoupled) jobs into the existing test coordinator and package-test workflows, and add install_te_rocm_wheel.py to fetch the MI355 Transformer Engine wheel during container setup. - build_and_test_maxtext.yml: add ROCm jobs and ROCM_ONLY gating on all sibling jobs; switch the daily schedule to 03:00 UTC; expand concurrency to cover manual dispatch per (branch + actor). - run_tests_against_package.yml: select the ROCm base image for rocm device_type, add decoupled_mode + requirements_file + extra pip deps inputs, install the TE wheel and select arch before running tests, ulimit + libtpu init guards for rocm. - run_tests_coordinator.yml: add rocm-unit / rocm-decoupled flavors with their pytest markers, runner labels, container options and ROCm requirements files; route decoupled_mode for the decoupled flavor. - install_te_rocm_wheel.py: download the MI355 TE wheel from the repo's te-rocm-wheels release, falling back to the pinned ROCm/ maxtext release asset. MI355-only (no MI300 path, no arch detection).
Scheduled workflow that keeps rocm-main in sync with AI-Hypercomputer/main.
The top-level `from google.cloud.storage import Client, transfer_manager` in checkpoint_conversion/utils/utils.py broke pytest collection for the ROCm decoupled tests (DECOUPLE_GCLOUD=TRUE), since the package isn't installed in that environment. - gcloud_stub.gcs_storage(): also import and attach the transfer_manager submodule (it isn't auto-imported by `from google.cloud import storage`); extend _gcs_stubs() with a no-op transfer_manager stub. - checkpoint_conversion/utils/utils.py: drop the direct google.cloud import and bind Client/transfer_manager via gcs_storage(), matching the existing pattern in src/maxtext/utils/gcs_utils.py.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
from google.cloud.storage import Client, transfer_managerincheckpoint_conversion/utils/utils.pywith the centralizedgcloud_stub.gcs_storage()pattern, so the module imports cleanly whenDECOUPLE_GCLOUD=TRUEandgoogle-cloud-storageisn't installed.gcloud_stub.gcs_storage()to also attach thetransfer_managersubmodule (real path) and provides a no-op stub for it (decoupled path).Tests
python3 -c 'from maxtext.checkpoint_conversion.utils import utils'succeeds inside.venv_rocm_decoupled(nogoogle-cloud-storageinstalled).pytest --collect-only -m 'not cpu_only and not tpu_only and not post_training and decoupled and not scheduled_only' --ignore=tests/post_trainingcollects719/1064tests, exit 0.Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.