ROCm Build Error #32

lufixSch · 2023-12-28T14:30:42Z

Are you planning on adding ROCm support or did you already test it on AMD?

I just tried building the package and it crashes with the following error:

FAILED: /data/linux_data/AI/LLM/WebUI/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-310/quiptools_e8p_gemv.o 
/opt/rocm/bin/hipcc  -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/include -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/include/TH -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/include/THC -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/include -I/home/lukas/.pyenv/versions/3.10.12/include/python3.10 -c -c /data/linux_data/AI/LLM/WebUI/repositories/quip-sharp/quiptools/quiptools_e8p_gemv.hip -o /data/linux_data/AI/LLM/WebUI/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-310/quiptools_e8p_gemv.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O2 -g -Xcompiler -rdynamic -lineinfo -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=quiptools_cuda -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -fno-gpu-rdc -std=c++17
clang++: warning: -lineinfo: 'linker' input unused [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-Xcompiler' [-Wunused-command-line-argument]
clang++: warning: argument unused during compilation: '-rdynamic' [-Wunused-command-line-argument]
/data/linux_data/AI/LLM/WebUI/repositories/quip-sharp/quiptools/quiptools_e8p_gemv.hip:11:10: fatal error: 'mma.h' file not found
#include <mma.h>
         ^~~~~~~
1 error generated when compiling for gfx1030.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/data/linux_data/AI/LLM/WebUI/.venvs/llm1100/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2099, in _run_ninja_build
    subprocess.run(
  File "/home/lukas/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

tsengalb99 · 2023-12-28T20:07:49Z

We do not have access to AMD hardware to develop on. You are welcome to modify the code to run on ROCm devices yourself, though. Most of the kernel should port over pretty easily. I think the main differences are intrinsic names.

lufixSch · 2023-12-28T20:18:23Z

Unfortunately I'm not experienced enough with HIP (at the moment) to modify anything.

If your ok with it I would like to keep this issue open (Maybe renamed to 'ROCm Support') to discuss this topic.
Maybe someone more experienced with ROCm will find this and can help me figure it out.

tsengalb99 · 2023-12-28T20:34:46Z

@chaosagent might know rocm and have access to a amd gpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm Build Error #32

ROCm Build Error #32

lufixSch commented Dec 28, 2023

tsengalb99 commented Dec 28, 2023

lufixSch commented Dec 28, 2023

tsengalb99 commented Dec 28, 2023

ROCm Build Error #32

ROCm Build Error #32

Comments

lufixSch commented Dec 28, 2023

tsengalb99 commented Dec 28, 2023

lufixSch commented Dec 28, 2023

tsengalb99 commented Dec 28, 2023