Skip to content

Fix decoupled-mode pytest collection: stub google.cloud.storage import#88

Merged
gulsumgudukbay merged 3 commits into
rocm-mainfrom
fix-decoupled-gcs
May 28, 2026
Merged

Fix decoupled-mode pytest collection: stub google.cloud.storage import#88
gulsumgudukbay merged 3 commits into
rocm-mainfrom
fix-decoupled-gcs

Conversation

@gulsumgudukbay
Copy link
Copy Markdown
Collaborator

Description

  • This PR replaces the top-level from google.cloud.storage import Client, transfer_manager in checkpoint_conversion/utils/utils.py with the centralized gcloud_stub.gcs_storage() pattern, so the module imports cleanly when DECOUPLE_GCLOUD=TRUE and google-cloud-storage isn't installed.
  • Extends gcloud_stub.gcs_storage() to also attach the transfer_manager submodule (real path) and provides a no-op stub for it (decoupled path).

Tests

  • python3 -c 'from maxtext.checkpoint_conversion.utils import utils' succeeds inside .venv_rocm_decoupled (no google-cloud-storage installed).
  • pytest --collect-only -m 'not cpu_only and not tpu_only and not post_training and decoupled and not scheduled_only' --ignore=tests/post_training collects 719/1064 tests, exit 0.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Wire ROCm (rocm-unit, rocm-decoupled) jobs into the existing test
coordinator and package-test workflows, and add install_te_rocm_wheel.py
to fetch the MI355 Transformer Engine wheel during container setup.
- build_and_test_maxtext.yml: add ROCm jobs and ROCM_ONLY gating on
  all sibling jobs; switch the daily schedule to 03:00 UTC; expand
  concurrency to cover manual dispatch per (branch + actor).
- run_tests_against_package.yml: select the ROCm base image for
  rocm device_type, add decoupled_mode + requirements_file + extra
  pip deps inputs, install the TE wheel and select arch before
  running tests, ulimit + libtpu init guards for rocm.
- run_tests_coordinator.yml: add rocm-unit / rocm-decoupled flavors
  with their pytest markers, runner labels, container options and
  ROCm requirements files; route decoupled_mode for the decoupled
  flavor.
- install_te_rocm_wheel.py: download the MI355 TE wheel from the
  repo's te-rocm-wheels release, falling back to the pinned ROCm/
  maxtext release asset. MI355-only (no MI300 path, no arch
  detection).
Scheduled workflow that keeps rocm-main in sync with
AI-Hypercomputer/main.
The top-level `from google.cloud.storage import Client, transfer_manager`
in checkpoint_conversion/utils/utils.py broke pytest collection for
the ROCm decoupled tests (DECOUPLE_GCLOUD=TRUE), since the package
isn't installed in that environment.
- gcloud_stub.gcs_storage(): also import and attach the
  transfer_manager submodule (it isn't auto-imported by
  `from google.cloud import storage`); extend _gcs_stubs() with a
  no-op transfer_manager stub.
- checkpoint_conversion/utils/utils.py: drop the direct google.cloud
  import and bind Client/transfer_manager via gcs_storage(), matching
  the existing pattern in src/maxtext/utils/gcs_utils.py.
@gulsumgudukbay gulsumgudukbay merged commit fa22ae6 into rocm-main May 28, 2026
12 of 14 checks passed
@gulsumgudukbay gulsumgudukbay deleted the fix-decoupled-gcs branch May 28, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant