Skip to content

[Issue]: llvm Compile bug: HIP Debug build fails on Windows (gfx1100) with illegal DPP instruction, Release wo #5826

@yanite

Description

@yanite

Problem Description

Git commit
commit ggml-org/llama.cpp@b96b82f

Operating systems
Windows

GGML backends
HIP

Problem description & steps to reproduce
Environment

OS: Windows 11
GPU: AMD Radeon RX 7900 XTX (gfx1100 / RDNA3)
HIP SDK / ROCm: 6.4
Compiler: clang++ from ROCm 6.4
Build system: CMake + Ninja
Summary

When building llama.cpp with HIP enabled on Windows using Debug configuration, compilation fails with an illegal DPP instruction error targeting gfx1100.
The same configuration builds successfully in Release mode.

This appears to be a compiler/backend issue triggered only in Debug builds (-O0), where LLVM emits unsupported DPP wavefront shift instructions for GFX10+ GPUs.

CMake Configuration

cmake .. -G Ninja ^
-DCMAKE_BUILD_TYPE=Debug ^
-DLLAMA_CURL=OFF ^
-DCMAKE_C_COMPILER="D:/AMD/ROCm/6.4/bin/clang.exe" ^
-DCMAKE_CXX_COMPILER="D:/AMD/ROCm/6.4/bin/clang++.exe" ^
-DCMAKE_CXX_FLAGS="-Og -g -Wno-everything" ^
-DCMAKE_C_FLAGS="-Og -g -Wno-everything" ^
-DGGML_HIP=ON ^
-DGGML_HIPBLAS=ON ^
-DGGML_HIP_ROCWMMA_FATTN=OFF ^
-DGGML_HIP_CXX_FLAGS="-w" ^
-DGPU_TARGETS="gfx1100"
Build Failure

The build fails while compiling a HIP source file:

ggml/src/ggml-cuda/mmf.cu

with the following error:

error: Illegal instruction detected: Invalid dpp_ctrl value:
wavefront shifts are not supported on GFX10+

renamable $vgpr2 = V_MOV_B32_dpp $vgpr2(tied-def 0), $vgpr1,
dpp_ctrl:304, row_mask:15, bank_mask:15

16 warnings and 1 error generated when compiling for gfx1100.

This happens only in Debug builds.
Release builds succeed without any issues.

Additional Observations

GGML_HIP_ROCWMMA_FATTN is explicitly disabled, so this does not appear to be related to ROCWMMA attention kernels.

The error is triggered during compilation, not at runtime.

The generated instruction uses a DPP wavefront shift, which is not supported on GFX10+ (including GFX11 / gfx1100).

This strongly suggests a LLVM/HIP codegen issue under Debug / -O0, where unsupported DPP instructions are emitted.

Using optimization levels like -Og or -O1 avoids similar issues elsewhere, but the Debug configuration still ends up passing -O0 to the HIP compilation.

Expected Behavior

Debug builds with HIP enabled should not emit illegal DPP instructions for GFX10+ GPUs, and should compile successfully, similar to Release builds.

Actual Behavior

Debug build fails during HIP compilation due to illegal DPP wavefront shift instructions.

Possible Cause (Speculation)

LLVM HIP backend appears to emit legacy DPP wavefront shift instructions when compiling with -O0 for gfx1100, which are invalid on GFX10+ architectures.
Release builds likely optimize away or rewrite the affected IR paths.

Please let me know if additional logs, compiler flags, or a minimal reproduction would be helpful.

Thank you for your work on llama.cpp.

First Bad Commit
No response

Compile command
cmake .. -G Ninja ^
-DCMAKE_BUILD_TYPE=Debug ^
-DLLAMA_CURL=OFF ^
-DCMAKE_C_COMPILER="D:/AMD/ROCm/6.4/bin/clang.exe" ^
-DCMAKE_CXX_COMPILER="D:/AMD/ROCm/6.4/bin/clang++.exe" ^
-DCMAKE_CXX_FLAGS="-Og -g -Wno-everything" ^
-DCMAKE_C_FLAGS="-Og -g -Wno-everything" ^
-DGGML_HIP=ON ^
-DGGML_HIPBLAS=ON ^
-DGGML_HIP_ROCWMMA_FATTN=OFF ^
-DGGML_HIP_CXX_FLAGS="-w" ^
-DGPU_TARGETS="gfx1100"
Relevant log output
[383/462] Building CXX object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir//ggml-cuda/mmf.cu.obj
FAILED: ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/
/ggml-cuda/mmf.cu.obj
ccache D:\AMD\ROCm\6.4\bin\clang++.exe -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_HIP_NO_VMM -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_HIP -DUSE_PROF_API=1 -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -Dggml_hip_EXPORTS -IU:/cpp/llama.cpp/ggml/src/ggml-hip/.. -IU:/cpp/llama.cpp/ggml/src/../include -isystem D:/AMD/ROCm/6.4/include -Og -g -Wno-everything -O0 -g -Xclang -gcodeview -D_DEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrtd -std=gnu++17 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -x hip --offload-arch=gfx1100 -MD -MT ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir//ggml-cuda/mmf.cu.obj -MF ggml\src\ggml-hip\CMakeFiles\ggml-hip.dir__\ggml-cuda\mmf.cu.obj.d -o ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir//ggml-cuda/mmf.cu.obj -c U:/cpp/llama.cpp/ggml/src/ggml-cuda/mmf.cu
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda/mmf.cu:2:
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda\mmf.cuh:3:
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:19:
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda\common.cuh:21:
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:180:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
180 | struct {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:208:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
208 | struct {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:229:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
229 | struct {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:270:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
270 | struct {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:297:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
297 | struct {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-hip/..\ggml-common.h:314:9: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
314 | struct {
| ^
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda/mmf.cu:2:
In file included from U:/cpp/llama.cpp/ggml/src/ggml-cuda\mmf.cuh:3:
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:754:87: warning: function 'load_ldmatrix' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
754 | tile<32, 4, half2> & t, const half2 * restrict xs0, const int stride) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:779:90: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
779 | tile<16, 8, int> & D, const tile<16, 4, int> & A, const tile<8, 4, int> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:801:90: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
801 | tile<16, 8, int> & D, const tile<16, 8, int> & A, const tile<8, 8, int> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:829:96: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
829 | tile<16, 4, half2> & D, const tile<16, 8, half2> & A, const tile<8, 8, half2> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:854:97: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
854 | tile<16, 8, half2> & D, const tile<16, 8, half2> & A, const tile<16, 8, half2> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:925:96: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
925 | tile<16, 8, float> & D, const tile<16, 8, half2> & A, const tile<8, 8, half2> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:950:110: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
950 | tile<16, 8, float> & D, const tile<16, 8, nv_bfloat162> & A, const tile<8, 8, nv_bfloat162> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:1127:92: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
1127 | tile<32, 32, int> & D, const tile<32, 4, int> & A, const tile<32, 4, int> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:1163:126: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
1163 | tile<32, 8, float> & D, const tile<32, 4, half2> & A, const tile<8, 4, half2, DATA_LAYOUT_I_MAJOR_MIRRORED> & B) {
| ^
U:/cpp/llama.cpp/ggml/src/ggml-cuda\mma.cuh:1183:126: warning: function 'mma' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
1183 | tile<32, 4, half2> & D, const tile<32, 4, half2> & A, const tile<8, 4, half2, DATA_LAYOUT_J_MAJOR_MIRRORED> & B) {
| ^
error: Illegal instruction detected: Invalid dpp_ctrl value: wavefront shifts are not supported on GFX10+
renamable $vgpr2 = V_MOV_B32_dpp $vgpr2(tied-def 0), $vgpr1, 304, 15, 15, -1, implicit $exec
16 warnings and 1 error generated when compiling for gfx1100.
ninja: build stopped: subcommand failed.

Operating System

Windows

CPU

AMD 7950x

GPU

AMD Radeon RX 7900xtx

ROCm Version

ROCm 6.4

ROCm Component

llvm-project

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions