Skip to content

test(async_ll): probe SDMA path via MORI_ENABLE_SDMA in CI#313

Closed
jhchouuu wants to merge 4 commits into
mainfrom
jiahzhou/test-async-ll-sdma
Closed

test(async_ll): probe SDMA path via MORI_ENABLE_SDMA in CI#313
jhchouuu wants to merge 4 commits into
mainfrom
jiahzhou/test-async-ll-sdma

Conversation

@jhchouuu
Copy link
Copy Markdown
Collaborator

Summary

Switches tests/python/ops/test_dispatch_combine_async_ll.py from MORI_DISABLE_P2P=1 to MORI_ENABLE_SDMA=1 so the async_ll kernel runs over the intra-node SDMA transport instead of the RDMA / IBGDA path.

This is a probe PR — the goal is to see whether CI can run the async_ll kernel end-to-end over SDMA on the current hardware/setup.

Why

The previous setting (MORI_DISABLE_P2P=1) gates Context::InitializePossibleTransports so same-host peers fall through to the RDMA branch. With MORI_ENABLE_SDMA=1 (and P2P NOT disabled), context.cpp instead picks TransportType::SDMA for same-host peers and exercises the SDMA signal-pointer exchange in symmetric_memory.cpp.

Test plan

  • CI runs tests/python/ops/test_dispatch_combine_async_ll.py and reports pass/fail under SDMA

jhchouuu added 4 commits May 12, 2026 11:32
Replace MORI_DISABLE_P2P=1 (which routes async_ll over RDMA / IBGDA) with
MORI_ENABLE_SDMA=1 so the test exercises the SDMA intra-node path instead.
Probes whether CI can run the async_ll kernel end-to-end over SDMA.
…rdingly

Switches CI BASE_IMAGE to rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0
and renames the built CI image to rocm/mori:ci_rocm723 so the rocm version is
visible in the tag.
The previously selected rocm7.2.3 base image is unavailable, so pin to
rocm/pytorch:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0 and rename
the built CI image to rocm/mori:ci_rocm722 so the rocm version is visible
in the tag.
Forces the intranode async_ll pytest step to run over the SDMA transport
instead of the default P2P path, so the SDMA branch in
symmetric_memory.cpp's signal-pointer exchange is exercised in CI.
@jhchouuu jhchouuu closed this May 13, 2026
@jhchouuu jhchouuu deleted the jiahzhou/test-async-ll-sdma branch May 13, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant