[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols #2415

tvukovic-amd · 2025-07-25T09:50:06Z

Changing TORCH_CUDA_CPP_API macros to TORCH_HIP_CPP in MIOpen header files according to closed PR in pytorch upstream.

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date

(Changes selected from ef226be and pytorch@fadc936)

Cherry-pick of #2130 Validation: http://rocm-ci.amd.com/job/rocm-pytorch-manylinux-wheel-builder-lw/155/ --------- Co-authored-by: Ethan Wee <Ethan.Wee@amd.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit 2c220b2)

…ersion since we need to rebuild magma for all supported architectures, we cannot use upstream magma tarball anyway

…for py3.9; upgrade tensorboard compatible with numpy 2 Co-authored-by: Ethan Wee <Ethan.Wee@amd.com> (cherry picked from commit e867a3d) (cherry picked from commit c7a1e32) (cherry picked from commit 2a215e4) (cherry picked from commit 866cc1d)

From upstream PR pytorch#154900 Resolves:https://ontrack-internal.amd.com/browse/SWDEV-536994 After following steps to reproduce in container **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16231_ubuntu22.04_py3.10_pytorch_lw_release2.7_no_user_66a18277**: ``` root@ubb4-rack-22:/var/lib/jenkins/pytorch# history 1 cd /var/lib/jenkins/pytorch 2 TEST_CONFIG=default CONTINUE_THROUGH_ERROR=True .ci/pytorch/test.sh Name: gfx90a Marketing Name: AMD Instinct MI250X/MI250 + MAYBE_ROCM=rocm/ + [[ rocm == *xpu* ]] + [[ rocm != *-bazel-* ]] + pip_install ninja==1.10.2 + pip_install_pkg='python3 -m pip install --progress-bar off' + python3 -m pip install --progress-bar off ninja==1.10.2 Collecting ninja==1.10.2 Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) Installing collected packages: ninja Attempting uninstall: ninja Found existing installation: ninja 1.11.1.4 Uninstalling ninja-1.11.1.4: Successfully uninstalled ninja-1.11.1.4 Successfully installed ninja-1.10.2 + export PATH=/root/.local/bin:/opt/venv/bin:/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + PATH=/root/.local/bin:/opt/venv/bin:/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + [[ rocm == *aarch64* ]] + [[ rocm == *asan* ]] + [[ rocm == *-debug* ]] + [[ rocm != *-bazel-* ]] + echo 'We are not in debug mode: rocm. Expect the assertion to pas ``` http://rocm-ci.amd.com/job/mainline-pytorch2.7-manylinux-wheels/126/ --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit 0bd4030)

(cherry picked from upstream commit 04bd7e6)

Cherry-pick of #2328 Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit fe3d37a)

…to enable PyTorch build on ROCm7.0

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e)

This PR enables NHWC batchnorm on MIOpen in release/2.6 branch `ROCm version >= 6.5` and `PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM=1` environment variable required to enable nhwc batchnorm This PR branch for `release/2.6` was built and tested using docker image: `compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-dkms-no-npi-hipclang:15845_ubuntu22.04_py3.10_pytorch_rocm6.4_internal_testing_8190c80`. New batchnorm tests introduced: train: ``` test_batchnorm_train_NCHW_vs_cpu_float32 (__main__.TestNN) ... ok (0.040s) test_batchnorm_train_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN) ... ok (0.007s) test_batchnorm_train_NCHW_vs_cpu_mixed_float16 (__main__.TestNN) ... ok (0.005s) test_batchnorm_train_NCHW_vs_native_float32 (__main__.TestNN) ... ok (0.089s) test_batchnorm_train_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_NCHW_float32 (__main__.TestNN) ... ok (0.020s) test_batchnorm_train_NHWC_vs_NCHW_mixed_bfloat16 (__main__.TestNN) ... ok (0.006s) test_batchnorm_train_NHWC_vs_NCHW_mixed_float16 (__main__.TestNN) ... ok (0.006s) test_batchnorm_train_NHWC_vs_cpu_float32 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_cpu_mixed_bfloat16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_cpu_mixed_float16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_native_float32 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_native_mixed_bfloat16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_train_NHWC_vs_native_mixed_float16 (__main__.TestNN) ... ok (0.004s) ``` inference: ``` test_batchnorm_inference_NCHW_vs_cpu_float32 (__main__.TestNN) ... ok (0.025s) test_batchnorm_inference_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN) ... ok (0.005s) test_batchnorm_inference_NCHW_vs_cpu_mixed_float16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NCHW_vs_native_float32 (__main__.TestNN) ... ok (0.102s) test_batchnorm_inference_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... ok (0.003s) test_batchnorm_inference_NHWC_vs_NCHW_float32 (__main__.TestNN) ... ok (0.018s) test_batchnorm_inference_NHWC_vs_NCHW_mixed_bfloat16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NHWC_vs_NCHW_mixed_float16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NHWC_vs_cpu_float32 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NHWC_vs_cpu_mixed_bfloat16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NHWC_vs_cpu_mixed_float16 (__main__.TestNN) ... ok (0.004s) test_batchnorm_inference_NHWC_vs_native_float32 (__main__.TestNN) ... ok (0.003s) test_batchnorm_inference_NHWC_vs_native_mixed_bfloat16 (__main__.TestNN) ... ok (0.003s) test_batchnorm_inference_NHWC_vs_native_mixed_float16 (__main__.TestNN) ... ok (0.003s) ``` --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit 45896ac) (cherry picked from commit 7010d60)

(cherry picked from commit 03c7da0)

NHWC batchnorm enabled by default if ROCm>=7.0 (cherry picked from commit e0afc3a)

Bug introduced by 03c7da0 (cherry picked from commit bbd0112)

…m specific skips to generalized conditions (#2126) Cherry-pick of #2100 Need to resolve conflicts --------- Co-authored-by: iupaikov-amd <Iurii.Paikov@amd.com> (cherry picked from commit f0c1ce8)

…and batchnorm (#2232) Cherry-pick of #2209 Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit bf0079d)

Cherry-pick of #2214 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit 5631e07)

…_rcpf(x) instead of 1.f/x (#1800) Cherry-pick of #1688 Co-authored-by: Michael Halkenhäuser <michaelhalk@web.de> Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com> (cherry picked from commit f8544af) (cherry picked from commit ed48754) (cherry picked from commit d62a39e)

remove `xfail` from `batch_norm_with_update` op in `test_grad` and `test_vmap_autograd_grad` these tests are passed since ROCm6.4 Fixes https://ontrack-internal.amd.com/browse/SWDEV-529820 (cherry picked from commit 99b0758) (cherry picked from commit a7044a4) (cherry picked from commit 3fc00a8)

Ported mx fp8 part from #2046 Current test stats (accounting only blockwise scale tests) PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 225 tests in 8.256s FAILED (failures=1, skipped=150) _74 test pass_ **fp8 mx data type sample test case.** test_blockwise_mxfp8_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_cuda (__main__.TestFP8MatmulCudaCUDA) hipblaslt-bench --api_method c -m 256 -n 512 -k 128 --lda 128 --ldb 128 --ldc 256 --ldd 256 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2146957310 --rotating 0 --cold_iters 0 --iters 0 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> (cherry picked from commit d17e222)

Update CK for gfx950 (#49) (cherry picked from commit 8ccfc47) (cherry picked from commit b5d5987)

…ilure (#2204) - Previously expected values were calculated on GPU using same dtype as result values - Now expected values are calculated on CPU using Float32 dtype - This fixes a test failure that was observed on Navi48 where difference between Eager mode (expected) and Inductor / Triton (result) did not meet the error tolerance when sum was evaluated on an array of Float16 values Co-authored-by: pnikolic-amd <pnikolic@amd.com> (cherry picked from commit 8fe3cdd) (cherry picked from commit 34f3b3e)

…2259) * thread_work_size of 16 is giving better perf with many workloads (cherry picked from commit 7edf50c)

fixes https://ontrack-internal.amd.com/browse/SWDEV-522391 for PT 2.7 (cherry picked from commit df38cca)

Related to c7a1e32 Fixes https://ontrack-internal.amd.com/browse/SWDEV-537835 Not a Navi specific failure: ``` File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1412, in only_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1671, in test_cuda_tensor_pow_scalar_tensor self._test_pow(base, exp) File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1482, in _test_pow self.assertEqual(actual, expected) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4052, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: The values for attribute 'dtype' do not match: torch.float32 != torch.float64. ``` Using .to(actual) without specifying dtype/device assumes actual is a tensor or tensor-like, which may fail silently or promote. Fixed by explicitly matching dtype and device. Going from pytorch#107302 Fix: ``` root@ubb4-rack-22:/var/lib/jenkins/pytorch# TEST_CONFIG=default HIP_VISIBLE_DEVICES=0 PYTORCH_TEST_WITH_ROCM=1 python test/test_binary_ufuncs.py TestBinaryUfuncsCUDA.test_cuda_tensor_pow_scalar_tensor_cuda /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Running tests... ---------------------------------------------------------------------- . ---------------------------------------------------------------------- Ran 1 test in 0.141s OK Generating XML reports... root@ubb4-rack-22:/var/lib/jenkins/pytorch# pip list | grep numpy numpy 2.1.2 ``` (cherry picked from commit a4d60fa)

Cherry-pick of #2319 Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit e725e2e)

This PR fixes the unit test, test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s] ``` Traceback (most recent call last): File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda") RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432] ``` This error occurs only on gfx1101 arch. This error is coming from an integer overflow when another unit test, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel creates a tensor with a huge numel, which overflows into a higher torch.cuda.max_memory_reserved() when you call test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction afterward. To avoid this we introduced torch.cuda.empty_cache() and torch.cuda.reset_peak_memory_stats() to clean up CUDA states. JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295 (cherry picked from commit f86d184)

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit f0aebdc)

…sed on heuristics (#2441)

ROCm/triton#846

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

…d for reductions on three dimensions (#2469) Cherry-pick of #2460 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>

Fixes SWDEV-543698 Cherry-picked from #2468 This PR fixes the errors like below: [rank7]: RuntimeError: /tmp/comgr-c3c81b/input/CompileSourceejOPx6:34:8: error: unknown type name 'uint64_t'; did you mean '__hip_internal::uint64_t'? [rank7]: 34 | if(((uint64_t) t0.data) % (4 * sizeof(half)) != 0) flag_vec4 = false; Earlier uint64_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0. Fixes https://ontrack-internal.amd.com/browse/SWDEV-543698

mx fp8 is enabled though cherrypick patch from rel 2.7. This patch adds support to enable mx fp4. PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 452 tests in 23.776s OK (skipped=340) Passed 112 --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ansformer_req_grad on Navi32/Navi4x (#2464) Cherry-pick of #2385 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com>

docker image used; registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16510_ubuntu24.04_py3.12_pytorch_release-2.8_b4af472d Keeping cmake at 3.31.4 or greater

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

fp8 rowwise scaling is not supported on ROCm 7.0 w/ gfx950, works on mainline. Skip the test for now. Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

Commit Messages: - update the param_id calculation so that it works on both CPX and SPX modes (#271) (#272) - reset parameters for FusedDenseGeluDense similar to FusedDense to make the test_gelu pass (#269) (#270) PRs: - ROCm/apex#272 - ROCm/apex#269 Fixes: - https://ontrack-internal.amd.com/browse/SWDEV-540029 - https://ontrack-internal.amd.com/browse/SWDEV-548434

Perf improvement for triton tanh

…tions on MIOpen (#2410) Cherry-pick of #2405 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com>

Fixes ROCm/frameworks-internal#13294

#2505) Cherry-pick of #2492 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>

… test_c10d_nccl.py (#2522) Cherry-pick of #2447 Co-authored-by: akashveramd <Akash.Verma3@amd.com>

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <Ethan.Wee@amd.com>

Cherry-pick of #2450 --------- Co-authored-by: iupaikov-amd <iurii.paikov@amd.com>

Fixes SWDEV-543698 (https://ontrack-internal.amd.com/browse/SWDEV-543698) Cherry-picked from #2502 This PR fixes the errors like below: ``` [rank3]: RuntimeError: The following operation failed in the TorchScript interpreter. [rank3]: Traceback of TorchScript (most recent call last): [rank3]: RuntimeError: /tmp/comgr-28f951/input/CompileSourceACC062:67:7: error: unknown type name 'uint32_t'; did you mean '__hip_internal::uint32_t'? [rank3]: 67 | uint32_t int32; [rank3]: | ^~~~~~~~ [rank3]: | __hip_internal::uint32_t ``` Earlier uint32_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0.

Cherry-pick of #2535 --------- Co-authored-by: Ethan Wee <Ethan.Wee@amd.com>

jeffdaily

Remove the triton change. Otherwise LGTM.

jeffdaily · 2025-08-22T15:29:02Z

.ci/docker/ci_commit_pins/triton.txt

@@ -1 +1 @@
-f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2
+f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2


Nope, don't do this. Remove the triton change.

rocm-repo-management-api · 2025-08-25T09:47:22Z

Jenkins build for 52684a65990781e936fb17fd1276480392e70d5d commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

jithunnair-amd and others added 30 commits July 18, 2025 01:06

[release/2.8] Enable wheels

8734655

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date

Updates to build for Noble (Ubuntu 24.04) and py3.12

dc95b0c

(Changes selected from ef226be and pytorch@fadc936)

[release/2.8] Replace upstream install_rocm_magma.sh with rocm fork v…

b4c293a

…ersion since we need to rebuild magma for all supported architectures, we cannot use upstream magma tarball anyway

[ROCm] Remove use of warpsize on host-side compilation (pytorch#156979)

90d7d4b

(cherry picked from upstream commit 04bd7e6)

[release/2.8] Improve C10_WARP_SIZE compatibility

186180d

Cherry-pick of #2328 Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit fe3d37a)

Fix sha256 for aotriton ROCm7.0 tarball

8e7b99f

Update third_party/composable_kernel submodule commit as per 80cca70 …

d7c64fc

…to enable PyTorch build on ROCm7.0

Use ROCm/triton and update triton.txt

b81d4d1

Add related_commits file (#2396)

98e9537

Add QA automation scripts for running PyTorch unit tests

12a145a

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e)

test_decompose_mem_bound_mm.py tolerance increase for navi3x

fb20451

(cherry picked from commit 03c7da0)

[release/2.7] enable NHWC batchnorm by default on ROCm7.0+ (#2180)

32449c9

NHWC batchnorm enabled by default if ROCm>=7.0 (cherry picked from commit e0afc3a)

[release/2.7] import 'Dict' to fix common_utils.py (#2181)

23f0b5f

Bug introduced by 03c7da0 (cherry picked from commit bbd0112)

[AUTOGENERATED] [release/2.7] [rocm6.4_internal_testing] Replaced ROC…

48630d8

…m specific skips to generalized conditions (#2126) Cherry-pick of #2100 Need to resolve conflicts --------- Co-authored-by: iupaikov-amd <Iurii.Paikov@amd.com> (cherry picked from commit f0c1ce8)

[release/2.7] [SWDEV-535259] enable miopen channels last 3d for conv …

ae17c3a

…and batchnorm (#2232) Cherry-pick of #2209 Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit bf0079d)

[AUTOGENERATED] [release/2.7] Add 3D batchnorm tests (#2243)

e4d62b1

Cherry-pick of #2214 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit 5631e07)

Extend CK gemm/sdpa support to gfx950 (#45)

e0160f1

Update CK for gfx950 (#49) (cherry picked from commit 8ccfc47) (cherry picked from commit b5d5987)

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels (#…

01857c6

…2259) * thread_work_size of 16 is giving better perf with many workloads (cherry picked from commit 7edf50c)

[release/2.7] Fix SDPA skip logic (#2281)

e60c0c4

fixes https://ontrack-internal.amd.com/browse/SWDEV-522391 for PT 2.7 (cherry picked from commit df38cca)

[AUTOGENERATED] [release/2.7] fix jit_utils.cpp (#2320)

80e8974

Cherry-pick of #2319 Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit e725e2e)

jataylo and others added 24 commits August 20, 2025 20:19

[release/2.7] [SWDEV-543214] Reland #2416 Fix warps runtime (#2421)

0826c75

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit f0aebdc)

[AUTOGENERATED] [release/2.8] [ROCm] Use opportunistic fastatomics ba…

af7b538

…sed on heuristics (#2441)

Update triton pin for gfx950 improvements (#2443)

b10cd6b

ROCm/triton#846

[AUTOGENERATED] [release/2.8] [release/2.7] [SWDEV-543214] Reland #2416…

5413133

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

[AUTOGENERATED] [release/2.8] [ROCm] Limit number of values per threa…

d6a6383

…d for reductions on three dimensions (#2469) Cherry-pick of #2460 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>

[AUTOGENERATED] [release/2.8] [rocm7.0_internal_testing] skip test_tr…

016bbef

…ansformer_req_grad on Navi32/Navi4x (#2464) Cherry-pick of #2385 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com>

Update version as 2.8.0

8e96f16

[release/2.8] pin requirements.txt (#2481)

29b4c24

docker image used; registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16510_ubuntu24.04_py3.12_pytorch_release-2.8_b4af472d Keeping cmake at 3.31.4 or greater

[AUTOGENERATED] [release/2.8] [SWDEV-539215] - Autotune support for p…

16cac0c

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

[release/2.8] fp8: skip rowwise tests (#2477)

0856917

fp8 rowwise scaling is not supported on ROCm 7.0 w/ gfx950, works on mainline. Skip the test for now. Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

[SWDEV-539119] [release/2.8] Add fast_tanh support (#2484)

f7921f4

Perf improvement for triton tanh

[AUTOGENERATED] [release/2.8] remove extra transposes in NHWC convolu…

2b29216

…tions on MIOpen (#2410) Cherry-pick of #2405 Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com>

[release/2.8] [triton] Triton bump to fix ROCm 7.0 issues (#2498)

4634272

Fixes ROCm/frameworks-internal#13294

[AUTOGENERATED] [release/2.8] [ROCm] Improve reduction sum performance (

0e1a3e9

#2505) Cherry-pick of #2492 Co-authored-by: Jerry Mannil <65309407+jerrymannil@users.noreply.github.com>

[release/2.8] Using c10d.barrier() in test_extra_cuda_context test in…

fe840fa

… test_c10d_nccl.py (#2522) Cherry-pick of #2447 Co-authored-by: akashveramd <Akash.Verma3@amd.com>

[AUTOGENERATED] [release/2.8] Change triton package name depending on…

d9d5b96

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <Ethan.Wee@amd.com>

[AUTOGENERATED] [release/2.8] NAVI32 specific fixes (#2467)

608069b

Cherry-pick of #2450 --------- Co-authored-by: iupaikov-amd <iurii.paikov@amd.com>

[AUTOGENERATED] [release/2.8] Remove tb-nightly (#2538)

5d3dec1

Cherry-pick of #2535 --------- Co-authored-by: Ethan Wee <Ethan.Wee@amd.com>

Use ROCm/triton and update triton.txt

8ade7b5

Merge branch 'release/2.8' into fix_torch_macros_for_miopen

e4df565

jeffdaily requested changes Aug 22, 2025

View reviewed changes

jeffdaily approved these changes Aug 25, 2025

View reviewed changes

jeffdaily merged commit 1a9ca16 into release/2.8 Aug 25, 2025
1 of 3 checks passed

jeffdaily deleted the fix_torch_macros_for_miopen branch August 25, 2025 16:10

revert triton change

52684a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols #2415

[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols #2415

Uh oh!

tvukovic-amd commented Jul 25, 2025

Uh oh!

jeffdaily left a comment

Uh oh!

jeffdaily Aug 22, 2025

Uh oh!

rocm-repo-management-api bot commented Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants

		@@ -1 +1 @@
		f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2
		f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2 No newline at end of file

[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols #2415

[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols #2415

Uh oh!

Conversation

tvukovic-amd commented Jul 25, 2025

Uh oh!

jeffdaily left a comment

Choose a reason for hiding this comment

Uh oh!

jeffdaily Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants

rocm-repo-management-api bot commented Aug 25, 2025 •

edited

Loading