Skip to content

Conversation

@pragupta
Copy link
Collaborator

@pragupta pragupta commented Aug 5, 2025

  • Remove building tensorpipe for ROCm by reverting 550bc77 as this support is going to get dropped upstream as well.
  • External/aotriton.cmake: remove use of __AOTRITON_VER_WITH_COMMIT
  • macros/Export.h: remove TORCH_HIP_CPP_API/TORCH_HIP_API and other hipified instances as CUDA ones get hipified and converted correctly (need to upstream this)
  • CUDALoops.cuh: Bad merge
  • Blas.cpp: remove MX patch (Blockwise support is not upstreamed)
  • cuda_vectorized_test.cu: remove ROCmloops specific test, this was removed in rocm7.0_internal_testing branch. I had incorrectly addressed the merge conflicts when merging with upstream
  • Update requirements-ci.txt to reflect both upstream and rocm/release/2.8 changes.

I tested this with the following docker image: registry-sc-harbor.amd.com/framework/compute-rocm-rel-7.0:24_ubuntu24.04_py3.12_pytorch_lw_release-2.7_faae1f39 and ran all the "core" UTs.

export TESTS_TO_INCLUDE="test_nn test_torch test_cuda test_ops test_unary_ufuncs test_binary_ufuncs test_autograd inductor/test_torchinductor"
export TESTS_TO_INCLUDE="distributed/test_c10d_common distributed/test_c10d_nccl distributed/test_distributed_spawn"

Only the following UTs failed with accuracy issues:

  • test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda_tf32
  • test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot
  • test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
  • test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce

Fixes #ISSUE_NUMBER

- Fix the tensorpipe branch for ROCm
- External/aotriton.cmake: remove use of __AOTRITON_VER_WITH_COMMIT
- macros/Export.h: remove TORCH_HIP_CPP_API/TORCH_HIP_API as CUDA ones
  get hipified and converted correctly (need to upstream this)
- CUDALoops.cuh: Bad merge
- Blas.cpp: remove MX patch
- cuda_vectorized_test.cu: remove ROCmloops specific test, this was
  removed in rocm7.0_internal_testing branch. I had incorrectly
  addressed the merge conflicts when merging with upstream
@pragupta pragupta requested review from jithunnair-amd and pruthvistony and removed request for pruthvistony August 5, 2025 20:43
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 5, 2025

Jenkins build for 13e472a3bca6121446e9911e3b6f1970f9a64f90 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@pragupta pragupta changed the title Fix issues with merge conflicts [rocm7.1_internal_testing] Fix issues with merge conflicts Aug 6, 2025
- Update triton commit to point to ToT of release/internal/3.4.x
- Update related_commits file with ToT of respective repos
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 6, 2025

Jenkins build for 7bd7047ad708ae240f538c3159fe2d5a5ab49a60 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 6, 2025

Jenkins build for 9025071507f9ca4ba1f9bfecf9c7630e0100930b commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

#else
TORCH_CHECK(false, "Block-wise scaling for Float8_e8m0fnu requires ROCm 7.0 or later");
#endif
"hipblaslt rowwise _scaled_mm only supports BFloat16 output but got ", out.scalar_type());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pragupta Sorry, it's not clear to me why we are removing these lines... Is the gfx950-specific code not needed?


// Enums only need to be exported on windows for non-CUDA files
#if defined(_WIN32) && defined(__CUDACC__)
#if defined(_WIN32) && defined(__HIPCC__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pragupta The changes to the comments in this file and the change in this line seem to contradict the PR description: "macros/Export.h: remove TORCH_HIP_CPP_API/TORCH_HIP_API as CUDA ones get hipified and converted correctly (need to upstream this)"?

Copy link
Collaborator

@jithunnair-amd jithunnair-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes look good, left questions on a couple of changes that weren't clear

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 8, 2025

Jenkins build for 8ba5af94c7932107119bb841a3d1ad4060ddbeef commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 8, 2025

Jenkins build for 8ba5af94c7932107119bb841a3d1ad4060ddbeef commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@pragupta pragupta force-pushed the pg_rocm7.1_internal_testing_IFU_08052025 branch from 8ba5af9 to 9ce6b1a Compare August 8, 2025 21:04
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 8, 2025

Jenkins build for 1ee30ac39628b611d22a8fdf331e07b4a697ec5a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jithunnair-amd jithunnair-amd merged commit 36e36c8 into ROCm:rocm7.1_internal_testing Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants