Skip to content

Conversation

@okakarpa
Copy link
Collaborator

@okakarpa okakarpa commented Jul 30, 2025

Fix distributed failures

  • Skip *_stress_cuda UTs for all archs
  • Symmetric Memory is not yet supported on rocm7.0_internal_testing branch
  • test_extra_cuda_context - add a barrier to ensure all nodes finish init_process_group before continuing with the test
  • test_sac_ilp: skip for all rocm arch (was already skipped for MI300 and NAVI)
  • test_fsdp2_mem_tracker: update tol
  • test_scaled_mm - this is row-wise scaling dependent, skipped for now
  • test_allreduce_inductor_cudagraph_trees: Skipped as flaky upstream as well
  • test_distributed_spawn - skipped, will be fixed in next IFU

Also fixes: https://ontrack-internal.amd.com/browse/SWDEV-544875

Cherry-pick of #2425

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 30, 2025

Jenkins build for 0f0ba996d726635ababd38e08c5f9aa6574a7c18 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

…][MI355]

Fix distributed failures
- Skip *_stress_cuda UTs for all archs
- Symmetric Memory is not yet supported on rocm7.0_internal_testing branch
- test_extra_cuda_context - add a barrier to ensure all nodes finish init_process_group before continuing with the test
- test_sac_ilp: skip for all rocm arch (was already skipped for MI300 and NAVI)
- test_fsdp2_mem_tracker: update tol
- test_scaled_mm - this is row-wise scaling dependent, skipped for now
- test_allreduce_inductor_cudagraph_trees: Skipped as flaky upstream as well
- test_distributed_spawn - skipped, will be fixed in next IFU

Also fixes: https://ontrack-internal.amd.com/browse/SWDEV-544875
@pragupta pragupta force-pushed the autogenerated/release/2.7_cherry-pick_pr-2425 branch from 0f0ba99 to f25e60b Compare July 30, 2025 21:09
@pragupta pragupta marked this pull request as ready for review July 30, 2025 21:10
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 30, 2025

Jenkins build for f25e60ba1c7e68663117ab4863e8f25bf6df282c commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@pragupta pragupta merged commit 44c0e44 into release/2.7 Jul 31, 2025
0 of 2 checks passed
@pragupta pragupta deleted the autogenerated/release/2.7_cherry-pick_pr-2425 branch July 31, 2025 14:47
@pragupta pragupta restored the autogenerated/release/2.7_cherry-pick_pr-2425 branch July 31, 2025 15:01
@jithunnair-amd jithunnair-amd deleted the autogenerated/release/2.7_cherry-pick_pr-2425 branch August 19, 2025 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants