Add ROCm benchmark workflow for MaxText by psanal35 · Pull Request #755 · ROCm/jax

psanal35 · 2026-04-23T17:32:18Z

No description provided.

mminutoli

many files are missing the new line at the end of the file.

mminutoli · 2026-04-23T20:39:12Z

+            WHEELS_URL="${ROCM_WHEELS_BASE_URL}/${WHEELS_PATH%/}"
+            echo "Downloading ROCm wheels from ${WHEELS_URL}..."
+
+            LISTING=$(curl -fsSL "${WHEELS_URL}/")


Isn't just simpler to download the wheels locally instead of checking if they are list of files available on the page?

if the download of pjrt or plugin fail, then the action fails.

That’s fair. The listing step is just for discovering the exact nightly filenames; it lets us fetch only the required PJRT + Python-specific plugin, instead of downloading all plugin wheels in that folder.

mminutoli · 2026-04-23T20:42:35Z

+        run: |
+          set -euxo pipefail
+          chmod +x "./targets/${TARGET}/run.sh"
+          "./targets/${TARGET}/run.sh" --workload "${WORKLOAD}"


Maybe instead of an input TARGET should be set through a matrix. Thoughts?

Yes, we can pass a matrix from the nightly-benchmark workflow.

mminutoli · 2026-04-23T20:44:42Z

I have the feeling they won't like adding all of this stuff.

Configs folder will live under ROCm/maxtext, along with requirements.txt.

mminutoli · 2026-04-23T20:55:17Z

+export XLA_PYTHON_CLIENT_MEM_FRACTION=.97
+export LD_LIBRARY_PATH=/usr/local/lib/:/opt/rocm/lib:$LD_LIBRARY_PATH
+export NVTE_USE_HIPBLASLT=1
+export XLA_FLAGS="--xla_gpu_memory_limit_slop_factor=95 --xla_gpu_reduce_scatter_combine_threshold_bytes=8589934592 --xla_gpu_enable_command_buffer='' --xla_gpu_enable_latency_hiding_scheduler=True --xla_gpu_all_gather_combine_threshold_bytes=8589934592 --xla_gpu_enable_triton_gemm=False --xla_gpu_enable_cublaslt=True --xla_gpu_autotune_level=4 --xla_gpu_enable_all_gather_combine_by_dim=FALSE"


are these specific to ROCm? I guess upstream might want to set their own flags

Some of these are ROCm-related, but most are really model-run-specific. Since this lives under ROCm/maxtext, it might make sense to keep upstream‑level flags in run.sh or something like ci/envs/default.env, rather than in the config itself.

charleshofer · 2026-05-05T14:43:20Z

-      clone_main_xla: 0
-
-  run-pytest-rocm:
+  run-benchmark-rocm:


It will be easier to maintain the amd-main branch if we create a new workflow for downstream and run it with on: schedule instead of removing upstream's nightly workflow and replacing it with our benchmark. Also, I think it's a good idea to keep unit tests and performance benchmarks in separate workflows.

charleshofer · 2026-05-05T15:01:29Z

+      gcs_download_uri: ${{ inputs.gcs_download_uri }}
+      s3_download_uri: ${{ inputs.s3_download_uri }}
+      use-te: "1"
+      te-wheel-url: "https://github.com/ROCm/maxtext/releases/download/te-rocm-wheels-2026-05-04-86438dc3d04e/transformer_engine-2.12.0.dev0+86438dc3-1.mi355-cp312-cp312-linux_x86_64.whl"


Do we want to use the latest TE instead of hardcoding? We should use the gh CLI to do that https://cli.github.com/manual/gh_release_download. I'm pretty sure that the jax-dev image has it installed.

Yes, I first updated the TE installation flow to resolve the wheel dynamically. I did this using the GitHub API&curl in bash script (See. 9183f20). After testing, I removed the TE dependency entirely from this benchmark workflow because the model can run w/o TE. For the current lightweight benchmark, TE doesnot provide additional value for detecting performance regressions and keeping it removed makes the workflow simpler and avoids extra debugging overhead.

charleshofer · 2026-05-05T15:40:58Z

+
+CFG_FILE="${REPO_DIR}/configs/models/${WORKLOAD}.yml"
+ENV_FILE="${REPO_DIR}/configs/models/${WORKLOAD}.env.sh"
+REQ_FILE="${WORK_DIR}/dependencies/requirements/requirements_rocm_jax_0.8.2.txt"


Is there a reason we're still using JAX 0.8.2? Do we want to move to something more recent?

Yes, at that time I reused the existing requirements file because it was already working for validation/testing, even though the filename postfixed 0_8_2, it actually was using unpinned package versions. I aligned the setup with the planned configuration from ROCm/maxtext#87, so we can move forward with this. MaxText is also planning to provide rocm_extra, which can help remove the need for requirement file later as well.

charleshofer · 2026-05-05T16:04:46Z

+BENCHMARK_JSON="${RUN_DIR}/benchmark.json"
+RESULT_JSON="${RUN_DIR}/result.json"
+
+PYTHON_BIN="${JAXCI_PYTHON:-python3}"


Same comment as below about passing things into scripts via environment variables and command line arguments. It breaks modularity and makes scripts hard to understand when we depend on environment variables rather than passing things in via command line arguments.

Passing in command line arguments is more of a pain in Bash, so I'm more okay with letting environment variables slide. But could we at least put all the environment variables that the script expects to use near the top of the script? It still breaks modularity, but at least it's easy for a person reading the script to see which environment variables need to be set.

Make sense. I kept the exisiting CI/JAX environment variables in the bash script, but grouped the expected ones near the top so the dependencies are more visible.

#560)

…tignore (#563) When jaxlib was built in debug more, an assertion in LLVM code that lazy-loads VHLO dialect could fire, since the code path could execute in a multi-threaded environment, and LLVM dialect repositories aren't thread safe to modify. This patch applies the same changes that upstream makes to fix this: jax-ml@48c8762 (this includes disabling a call to `jax_mlir_ext.enter_multi_threaded_execution(context)` in `mlir.py`. Presumably, the whole functionality related to `enter_multi_threaded_execution()` multithreaded checks isn't ready yet, and it was prematurely rolled into the production code. Manual testing

(forgot this skip in the previous PR)

…t tests (#582)

…597)

…eases

psanal35 · 2026-05-20T03:24:08Z

No further action needed, closing the PR.

psanal35 force-pushed the add-rocm-model-benchmarks branch 5 times, most recently from 96a4966 to 5e8fd90 Compare April 23, 2026 18:58

mminutoli reviewed Apr 23, 2026

View reviewed changes

psanal35 force-pushed the add-rocm-model-benchmarks branch 6 times, most recently from fff9d8b to 517bb14 Compare April 24, 2026 20:07

psanal35 force-pushed the add-rocm-model-benchmarks branch 2 times, most recently from eedae29 to 6f08c2a Compare May 4, 2026 19:44

charleshofer requested changes May 5, 2026

View reviewed changes

charleshofer and others added 15 commits May 6, 2026 13:07

Remove nvidia_wheel_versions

43c0570

Make jaxlib targets visible

bcef89c

hipblas typedef fix

733b7bf

No GPU fail

793d312

Wrap HIP inline functions in anonymous namespaces in vendor.h

e3ad0ec

SWDEV-512768 - Replace hipGetLastError with hipExtGetLastError

a831ef2

Add shared utility function get_rocm_version to test_util.py

58249a4

Fix hipSparse CSR algorithm mappings for ROCm 7

e587f90

Fix v_pages quantization and adjust test params for ROCm compatibilit… (

8089947

#560)

Add skip of test_is_finite() on Cuda (#565)

42a3be6

(forgot this skip in the previous PR)

Add rocm test requirements file (#570)

544c6d4

Let the unit tests use build.py for setting up Bazel commands for uni…

4673584

…t tests (#582)

adding abort logic to rocm/jax (#590)

1c79814

Skip is_finite tests on ROCm (not in Triton lowering for jax 0.8.0) (#…

9b5d708

…597)

psanal35 added 3 commits May 10, 2026 14:26

Add placeholder for nightly benchmark workflow (#768)

8d4fbef

Rename the benchmarks workflow consistently (#770)

eb58ba2

Add ROCm benchmark workflow for MaxText

2483b46

psanal35 force-pushed the add-rocm-model-benchmarks branch 5 times, most recently from 7101e89 to 4a5afee Compare May 10, 2026 22:22

Resolve latest MaxText Transformer Engine wheel from ROCm MaxText rel…

1d25b87

…eases

psanal35 force-pushed the add-rocm-model-benchmarks branch 5 times, most recently from 555e4c2 to 8405330 Compare May 11, 2026 01:48

Load MaxText ROCm benchmark configs and requirements from ROCm/maxtext

24fb105

psanal35 force-pushed the add-rocm-model-benchmarks branch 6 times, most recently from 0892795 to cd5ad81 Compare May 11, 2026 15:07

Revisit ROCm benchmark results and run-manifest collection

34b2d0f

psanal35 force-pushed the add-rocm-model-benchmarks branch from cd5ad81 to 34b2d0f Compare May 11, 2026 17:35

psanal35 added 3 commits May 11, 2026 18:26

Revisit ROCm artifact upload to S3 for reusability

37fe3f7

Remove TE installation to keep the model lightweight

9183f20

Update benchmark target scripts for more generic use cases

182dd2a

mminutoli force-pushed the amd-main branch from eb58ba2 to fb48593 Compare May 19, 2026 19:39

psanal35 closed this May 20, 2026

psanal35 deleted the add-rocm-model-benchmarks branch May 20, 2026 03:24

Conversation

psanal35 commented Apr 23, 2026

Uh oh!

mminutoli left a comment

Choose a reason for hiding this comment

Uh oh!

mminutoli Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psanal35 Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psanal35 Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psanal35 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

mminutoli Apr 23, 2026 •

edited

Loading

psanal35 Apr 23, 2026 •

edited

Loading

psanal35 Apr 23, 2026 •

edited

Loading