Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotary Emb compile fail under Ubuntu 22.04 with gcc/g++ v12 installed #484

Open
Qubitium opened this issue Aug 24, 2023 · 7 comments
Open

Comments

@Qubitium
Copy link
Contributor

flash_attn core was compiled correctly but runtime error asks to compile rotary module for llama2. However, the compilation fails on Ubuntu 22.0 with cuda 12.1, pytorch nightly for 12.1 and gcc/g++ 12.

Thanks for any pointers. I am scratching my heads on this one.

Collecting git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
  Cloning https://github.com/HazyResearch/flash-attention.git to /tmp/pip-req-build-jtrc0_ne
  Running command git clone --filter=blob:none --quiet https://github.com/HazyResearch/flash-attention.git /tmp/pip-req-build-jtrc0_ne
  Resolved https://github.com/HazyResearch/flash-attention.git to commit 6711b3bc40073e7ced2a4c7d8266feec7e6e137f
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: rotary-emb
  Building wheel for rotary-emb (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [87 lines of output]


      torch.__version__  = 2.1.0.dev20230824+cu121


      running bdist_wheel
      running build
      running build_ext
      /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
      building 'rotary_emb' extension
      creating /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build
      creating /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10
      Emitting ninja build file /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/2] c++ -MMD -MF /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary.cpp -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary.o -g -march=native -funroll-loops -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [2/2] /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary_cuda.cu -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      FAILED: /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o
      /usr/local/cuda/bin/nvcc  -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pip-req-build-jtrc0_ne/csrc/rotary/rotary_cuda.cu -o /tmp/pip-req-build-jtrc0_ne/csrc/rotary/build/temp.linux-x86_64-3.10/rotary_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=rotary_emb -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                        ^
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                           ^
      /usr/local/lib/python3.10/dist-packages/torch/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
         45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
            |                                                                                                                              ^
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-jtrc0_ne/csrc/rotary/setup.py", line 120, in <module>
          setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
          build_ext.build_extensions(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
          _build_ext.build_extension(self, ext)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
          objects = self.compiler.compile(sources,
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]
@tridao
Copy link
Contributor

tridao commented Aug 24, 2023

Yeah idk, maybe the compiler version is too new. It's erroring on some pybind11 code :D
You can try gcc 11 or lower.

@Saimshaikh8297
Copy link

$ sudo apt install gcc-10 g++-10
$ export CC=/usr/bin/gcc-10
$ export CXX=/usr/bin/g++-10
$ export CUDA_ROOT=/usr/local/cuda
$ ln -s /usr/bin/gcc-10 $CUDA_ROOT/bin/gcc
$ ln -s /usr/bin/g++-10 $CUDA_ROOT/bin/g++

I installed gcc-10 as per NVlabs/instant-ngp#119.

This worked for me

@andersonbcdefg
Copy link

FYI, this happens on Lambda stack which means anyone trying to use rotary on Lambda H100s is in for a bad time. I tried the above ^ and it didn't fix the problem for me (the ln commands didn't work because stuff wasn't where they expected it to be).

@Birch-san
Copy link

Birch-san commented Sep 5, 2023

Yes, CUDA 12.0 and 12.1's nvcc compiler, cannot compile pybind11 2.11.1:
pybind/pybind11#4606
Note: fixed in CUDA 12.2.

Specifically, it cannot compile pybind11/cast.h#L45.

The fix is simple (thanks @archibate):

-    return caster.operator typename make_caster<T>::template cast_op_type<T>();
+    return caster;

So, you just need to find the pybind11/cast.h used by your current Python environment, and modify it as above.
Check which file is cited in your compile error:

/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();

In this example it's:
/home/birch/anaconda3/envs/p311-cu121-bnb-opt/lib/python3.11/site-packages/torch/include/pybind11/cast.h.

So, we modify this cast.h as above.

Then try compiling rotary-emb again:

MAX_JOBS=2 pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

And voilà:

Successfully built rotary-emb

@andersonbcdefg
Copy link

Thanks for that! Is this something you'd expect one of pybind11 or CUDA to fix at some point?

@brettbj
Copy link

brettbj commented Nov 21, 2023

@Birch-san thank you! fixed it for me as well, really appreciate it

@Birch-san
Copy link

Thanks for that! Is this something you'd expect one of pybind11 or CUDA to fix at some point?

@andersonbcdefg yes, CUDA 12.2's nvcc compiler can now compile pybind11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants