Skip to content

[release/2.11] Fix Windows access violation in MIOpen CTC loss dispatch#3161

Merged
jeffdaily merged 1 commit intorelease/2.11from
fix_windows_access_violation_miopen_ctc
Apr 27, 2026
Merged

[release/2.11] Fix Windows access violation in MIOpen CTC loss dispatch#3161
jeffdaily merged 1 commit intorelease/2.11from
fix_windows_access_violation_miopen_ctc

Conversation

@tvukovic-amd
Copy link
Copy Markdown

Cherry pick of pytorch#178284

Fixes ROCm/TheRock#3987

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 16, 2026

Jenkins build for 2529af76adf7e0fe36ddfe12cca198d44b3e1aba commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

      |                                            ^
1 warning generated when compiling for gfx908.
[7308/8165] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_WeightNorm.hip.o
[7309/8165] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_NaiveConvolutionTranspose3d.hip.o
[7310/8165] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ck_gemm_float.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o.cmake
sccache: encountered fatal error
sccache: error: Failed to parse included file path
sccache: caused by: Failed to parse included file path
failed to execute:/opt/rocm/llvm/bin/clang++  --offload-arch=gfx90a --offload-arch=gfx908 --offload-arch=gfx942 -O3  -c -x hip /var/lib/jenkins/pytorch/aten/src/ATen/native/hip/ck_gemm_float.hip -o "/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ck_gemm_float.hip.o" --offload-compress -std=c++17 --rocm-device-lib-path=/opt/rocm/amdgcn/bitcode -fclang-abi-compat=17 -DUSE_NCCL -DUSE_ROCM -D__HIP_PLATFORM_AMD__ -DUSE_FLASH_ATTENTION -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASH_NAMESPACE=pytorch_flash -DUNFUSE_FMA -DUSE_MEM_EFF_ATTENTION -DUSE_C10D_NCCL -DTORCH_CUDA_BUILD_MAIN_LIB -DROCM_VERSION=70202 -DTORCH_HIP_VERSION=702 -DUSE_LAYERNORM_FAST_RECIPROCAL -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -DIDEEP_USE_MKL -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_POSIX_FALLOCATE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -D__HIP_PLATFORM_AMD__=1 -DUSE_PROF_API=1 -DAT_PER_OPERATOR_HEADERS -DUSE_DISTRIBUTED -DUSE_C10D_GLOO -DUSE_RPC -DUSE_TENSORPIPE -D__HIP_PLATFORM_AMD__ -DHIPBLASLT_USE_ROCROLLER -DFMT_HEADER_ONLY=1 -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=702 -Wno-shift-count-negative -Wno-shift-count-overflow -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIPBLASLT_OUTER_VEC -DUSE_ROCM_CK_GEMM -fno-gpu-rdc -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.2/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/opt/rocm-7.2.2/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/concurrentqueue -I/var/lib/jenkins/pytorch/aten/src/THH -I/var/lib/jenkins/pytorch/third_party/mslk/include/ -I/var/lib/jenkins/pytorch/aten/src/ATen/hip -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/library/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/example/ck_tile/01_fmha -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/composable_kernel -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/aiter/csrc/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/torch/include -I/opt/rocm-7.2.2/include -I/opt/rocm/include -I/var/lib/jenkins/pytorch/c10/hip/../.. -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch/c10/../ -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/opt/rocm-7.2.2/include -I/opt/rocm/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include/hiprand -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include/rocrand -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm-7.2.2/include -I/opt/rocm/include -I/usr/include/libdrm -I/opt/rocm/include -I/var/lib/jenkins/pytorch/build/third_party/gloo/hip -I/opt/rocm/magma/include -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.2/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/third_party/concurrentqueue

…ytorch#178284)

### Summary

Add the missing #include <ATen/ops/_use_miopen_ctc_loss_native.h> to LossCTC_miopen.cpp. Without this include, the _use_miopen_ctc_loss and _use_miopen_ctc_loss_tensor functions are defined without DLL linkage attributes on Windows, causing an unresolved Import Address Table (IAT) entry that crashes with an access violation (0xC0000005) at torch_hip.dll base address when CTC loss is called with CUDA tensors.

### Problem

On Windows ROCm builds, calling torch.nn.functional.ctc_loss with CUDA tensors crashes with a fatal access violation:

Windows fatal exception: access violation
Exception Code: 0xC0000005
torch_hip.dll + 0x0 byte(s)

The crash occurs in test_CTCLoss_critical_target_len and any other test that invokes ctc_loss with cudnn.flags(enabled=True) on CUDA tensors.

### Root Cause

The issue is a Windows DLL linkage mismatch between the caller and the callee of at::native::_use_miopen_ctc_loss.

The caller (RegisterCUDA_0.cpp, auto-generated, compiled into torch_hip.dll):

The generated CUDA dispatch wrapper includes <ATen/ops/_use_miopen_ctc_loss_native.h>, which declares the function with TORCH_API. When building torch_hip.dll, TORCH_API expands to __declspec(dllimport). MSVC generates an indirect call through the Import Address Table: call [__imp_?_use_miopen_ctc_loss@native@at@@...].

The callee (LossCTC_miopen.cpp, compiled into torch_hip.dll):

The implementation file does NOT include <ATen/ops/_use_miopen_ctc_loss_native.h>. The functions are defined without any DLL linkage attribute — just plain bool _use_miopen_ctc_loss(...). The compiler does not generate an __imp_ thunk for these definitions.

### The linker mismatch

When linking torch_hip.dll, the linker needs to resolve the __imp_?_use_miopen_ctc_loss@native@at@@... symbol (referenced by RegisterCUDA_0.cpp.obj). This is a different symbol from ?_use_miopen_ctc_loss@native@at@@... (provided by LossCTC_miopen.cpp.obj). Since no import library (.lib) exports this function, the __imp_ IAT entry remains unresolved at RVA=0. At runtime, the indirect call jumps to DLL_base + 0x0 (the PE header), which is not executable code, causing the access violation.

### Test plan

- test_CTCLoss_critical_target_len passes on Windows
- test_CTCLoss_cudnn_cuda no longer crashes on Windows
- Linux ROCm builds are unaffected

Pull Request resolved: pytorch#178284
Approved by: https://github.com/Skylion007
@tvukovic-amd
Copy link
Copy Markdown
Author

tvukovic-amd commented Apr 22, 2026

It seems that there is the same Jenkins error as on #3160. It is unrelated to the PR changes.
@ScottTodd

@jeffdaily jeffdaily changed the title [ROCm][CI] Fix Windows access violation in MIOpen CTC loss dispatch (… [release/2.11] Fix Windows access violation in MIOpen CTC loss dispatch Apr 27, 2026
@jeffdaily jeffdaily merged commit 48211a7 into release/2.11 Apr 27, 2026
0 of 2 checks passed
@jeffdaily jeffdaily deleted the fix_windows_access_violation_miopen_ctc branch April 27, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants