Building wheel for flash-attn (pyproject.toml) did not run successfully #224

jesswhitts · 2023-05-16T10:21:13Z

Hello,

I am trying to install via pip into a conda environment, with A100 GPU, cuda version 11.6.2.
I get the following, not very informative, error:

Building wheels for collected packages: flash-attn
error: subprocess-exited-with-error

× Building wheel for flash-attn (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for flash-attn (pyproject.toml) ... error
ERROR: Failed building wheel for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Many thanks,

Jess

tridao · 2023-05-16T17:09:32Z

There should be a longer log than that, do you have it?

Jingsong-Yan · 2023-05-17T07:27:27Z

apt install g++, in my case, it works.

jesswhitts · 2023-05-17T11:04:59Z

Seems to be an incompatibility with g++ version, thanks @Jingsong-Yan !

ShoufaChen · 2023-06-29T14:31:29Z

Hi, @jesswhitts

how to determine which g++ version is compatible?

jesswhitts · 2023-07-11T14:55:20Z

I got the following error which states the compatible version:

RuntimeError: The current installed version of g++ (4.8.5) is less than the minimum required version by CUDA 11.6 (6.0.0). Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

jackaihfia2334 · 2023-08-03T11:08:53Z

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [127 lines of output]

  torch.__version__  = 2.1.0.dev20230621+cu117
  
  
  fatal: detected dubious ownership in repository at '/data/llm/code/Qwen-7B/flash-attention'
  To add an exception for this directory, call:
  
      git config --global --add safe.directory /data/llm/code/Qwen-7B/flash-attention
  running bdist_wheel
  /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:478: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.10
  creating build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn
  creating build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  creating build/lib.linux-x86_64-3.10/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
  creating build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/llama.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  creating build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  creating build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  creating build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  running build_ext
  building 'flash_attn_cuda' extension
  creating build/temp.linux-x86_64-3.10
  creating build/temp.linux-x86_64-3.10/csrc
  creating build/temp.linux-x86_64-3.10/csrc/flash_attn
  creating build/temp.linux-x86_64-3.10/csrc/flash_attn/src
  x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-3.10/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:42,
                   from csrc/flash_attn/fmha_api.cpp:33:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h: In function ‘void set_alpha(uint32_t&, float, Data_type)’:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:63:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     63 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:68:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     68 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:70:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     70 |         alpha = reinterpret_cast<const uint32_t &>( norm );
        |                                                     ^~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘void set_params_fprop(FMHA_fprop_params&, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, void*, float, float, bool, int)’:
  csrc/flash_attn/fmha_api.cpp:64:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct FMHA_fprop_params’; use assignment or value-initialization instead [-Wclass-memaccess]
     64 |     memset(&params, 0, sizeof(params));
        |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from csrc/flash_attn/fmha_api.cpp:33:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:75:8: note: ‘struct FMHA_fprop_params’ declared here
     75 | struct FMHA_fprop_params : public Qkv_params {
        |        ^~~~~~~~~~~~~~~~~
  csrc/flash_attn/fmha_api.cpp:60:15: warning: unused variable ‘acc_type’ [-Wunused-variable]
     60 |     Data_type acc_type = DATA_TYPE_FP32;
        |               ^~~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd(const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, bool, int, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:208:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    208 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd_block(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:533:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    533 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  /usr/local/cuda/bin/nvcc -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o build/temp.linux-x86_64-3.10/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

gpww · 2023-08-08T10:57:30Z

Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [105 lines of output]

  torch.__version__  = 2.0.1+cu117


  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-310
  creating build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  creating build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  creating build/lib.linux-x86_64-cpython-310/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
  creating build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  creating build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  creating build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  creating build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/modelscope/flash-attention/setup.py", line 175, in <module>

File "/root/miniconda3/envs/Modelscope/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

gpww · 2023-08-08T10:59:02Z

不行，还是报错
g++ is already the newest version (4:11.2.0-1ubuntu1).
g++ set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 71 not upgraded.

UCC-team · 2023-08-11T03:07:32Z

lonngxiang · 2023-08-28T09:13:24Z

g++

same error； g++ 10.2

nahidalam · 2023-08-31T05:58:33Z

still same error. I have g++ 11.4 in an ubuntu system with CUDA 11.5

wbbeyourself · 2023-08-31T10:53:43Z

torch 2.1.0
cuda 12.1
g++ 10.2.1

执行：

apt-get update && apt-get install -y g++
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation

报错如下：

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): still running...
Building wheel for flash-attn (setup.py): finished with status 'error'
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
fatal: not a git repository (or any of the parent directories): .git

torch.__version__  = 2.1.0.dev20230815+cu121


running bdist_wheel
Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.1.1/flash_attn-2.1.1+cu121torch2.1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
error: Remote end closed connection without response
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
ERROR: executor failed running [/bin/sh -c pip install flash-attn --no-build-isolation]: runc did not terminate successfully: exit status 1

WangSheng21s · 2023-09-15T14:55:07Z

conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit

shuyhere · 2023-10-04T02:40:53Z

我遇到了同样的问腿，感觉可能是重定向下载某个东西时卡住了
我没有更改g++和cuda。是按照下面的操作成功的：

git clone git@github.com:Dao-AILab/flash-attention.git
cd /flash-attention
python setup.py install

注意这里会从出现错误提示flash-attention/csrc/cutlass找不到，git下载cutlass失败
所以cd flash-attention/csrc/ 然后 git@github.com:NVIDIA/cutlass.git

重新运行python setup.py install 就可以编译成功了

hsingyu-chou · 2023-10-07T10:30:04Z

Hi @shuyhere ,

After I try your solution and use flash-atten ver.1.0.5 , it works.
Thank you.

(Remark : I use ver.1.0.5 because I use T4 GPU.)

SunLemuria · 2023-10-27T03:10:13Z

install the pre-build wheel list in release page works for me, in my case:
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

YundongGai · 2023-11-23T12:24:18Z

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

It works for me.

terry-for-github · 2024-01-10T12:23:42Z

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

It works for me. Thanks!

lihanghang · 2024-01-18T05:42:59Z

Hi @shuyhere ,

After I try your solution and use flash-atten ver.1.0.5 , it works. Thank you.

(Remark : I use ver.1.0.5 because I use T4 GPU.)

Thank you! It works for me.

tongjingqi · 2024-01-20T09:55:24Z

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

Thank you! It works for me.

tiansiyuan · 2024-01-26T07:38:37Z

我遇到了同样的问腿，感觉可能是重定向下载某个东西时卡住了我没有更改g++和cuda。是按照下面的操作成功的：

git clone git@github.com:Dao-AILab/flash-attention.git cd /flash-attention python setup.py install

注意这里会从出现错误提示flash-attention/csrc/cutlass找不到，git下载cutlass失败所以cd flash-attention/csrc/ 然后 git@github.com:NVIDIA/cutlass.git

重新运行python setup.py install 就可以编译成功了

This is great, works for me.

Thanks a lot!

liuyongqiangjava · 2024-01-26T13:36:59Z

cutlass

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

Thank you! it work for me

baihuier · 2024-02-28T11:33:52Z

apt install g++, in my case, it works.

Thank you! It works for me.

Hansyvea · 2024-04-30T04:40:46Z

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

this works, thanks!
note: has to be abiFALSE rather than abiTrue

d-kleine · 2024-06-13T07:05:21Z

I have this issue on Windows, any fix for that?

jesswhitts closed this as completed May 17, 2023

logicwong mentioned this issue Aug 3, 2023

安装flash-attn错误 QwenLM/Qwen#10

Closed

ayulockin mentioned this issue Sep 1, 2023

Error building flash attention ayulockin/neurips-llm-efficiency-challenge#3

Closed

yecphaha mentioned this issue Jan 25, 2024

求求了能不能出个正常点的部署启动方案，流程清晰点的能不能这么折磨人 OrionStarAI/Orion#14

Open

RJPenic mentioned this issue Apr 1, 2024

error when running small inference code: "list_to_cuuint64_array" lbcb-sci/RiNALMo#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building wheel for flash-attn (pyproject.toml) did not run successfully #224

Building wheel for flash-attn (pyproject.toml) did not run successfully #224

jesswhitts commented May 16, 2023

tridao commented May 16, 2023

Jingsong-Yan commented May 17, 2023

jesswhitts commented May 17, 2023

ShoufaChen commented Jun 29, 2023

jesswhitts commented Jul 11, 2023

jackaihfia2334 commented Aug 3, 2023

gpww commented Aug 8, 2023

gpww commented Aug 8, 2023

UCC-team commented Aug 11, 2023

lonngxiang commented Aug 28, 2023

nahidalam commented Aug 31, 2023

wbbeyourself commented Aug 31, 2023

WangSheng21s commented Sep 15, 2023

shuyhere commented Oct 4, 2023

hsingyu-chou commented Oct 7, 2023

SunLemuria commented Oct 27, 2023

YundongGai commented Nov 23, 2023

terry-for-github commented Jan 10, 2024

lihanghang commented Jan 18, 2024

tongjingqi commented Jan 20, 2024

tiansiyuan commented Jan 26, 2024

liuyongqiangjava commented Jan 26, 2024

baihuier commented Feb 28, 2024

Hansyvea commented Apr 30, 2024 •

edited

d-kleine commented Jun 13, 2024

Building wheel for flash-attn (pyproject.toml) did not run successfully #224

Building wheel for flash-attn (pyproject.toml) did not run successfully #224

Comments

jesswhitts commented May 16, 2023

tridao commented May 16, 2023

Jingsong-Yan commented May 17, 2023

jesswhitts commented May 17, 2023

ShoufaChen commented Jun 29, 2023

jesswhitts commented Jul 11, 2023

jackaihfia2334 commented Aug 3, 2023

gpww commented Aug 8, 2023

gpww commented Aug 8, 2023

UCC-team commented Aug 11, 2023

lonngxiang commented Aug 28, 2023

nahidalam commented Aug 31, 2023

wbbeyourself commented Aug 31, 2023

WangSheng21s commented Sep 15, 2023

shuyhere commented Oct 4, 2023

hsingyu-chou commented Oct 7, 2023

SunLemuria commented Oct 27, 2023

YundongGai commented Nov 23, 2023

terry-for-github commented Jan 10, 2024

lihanghang commented Jan 18, 2024

tongjingqi commented Jan 20, 2024

tiansiyuan commented Jan 26, 2024

liuyongqiangjava commented Jan 26, 2024

baihuier commented Feb 28, 2024

Hansyvea commented Apr 30, 2024 • edited

d-kleine commented Jun 13, 2024

Hansyvea commented Apr 30, 2024 •

edited