Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building wheel for flash-attn (pyproject.toml) did not run successfully #224

Closed
jesswhitts opened this issue May 16, 2023 · 25 comments
Closed

Comments

@jesswhitts
Copy link

Hello,

I am trying to install via pip into a conda environment, with A100 GPU, cuda version 11.6.2.
I get the following, not very informative, error:

Building wheels for collected packages: flash-attn
error: subprocess-exited-with-error

× Building wheel for flash-attn (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for flash-attn (pyproject.toml) ... error
ERROR: Failed building wheel for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Many thanks,

Jess

@tridao
Copy link
Contributor

tridao commented May 16, 2023

There should be a longer log than that, do you have it?

@Jingsong-Yan
Copy link

apt install g++, in my case, it works.

@jesswhitts
Copy link
Author

Seems to be an incompatibility with g++ version, thanks @Jingsong-Yan !

@ShoufaChen
Copy link

Hi, @jesswhitts

how to determine which g++ version is compatible?

@jesswhitts
Copy link
Author

I got the following error which states the compatible version:

RuntimeError: The current installed version of g++ (4.8.5) is less than the minimum required version by CUDA 11.6 (6.0.0). Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

@jackaihfia2334
Copy link

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [127 lines of output]

  torch.__version__  = 2.1.0.dev20230621+cu117
  
  
  fatal: detected dubious ownership in repository at '/data/llm/code/Qwen-7B/flash-attention'
  To add an exception for this directory, call:
  
      git config --global --add safe.directory /data/llm/code/Qwen-7B/flash-attention
  running bdist_wheel
  /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:478: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.10
  creating build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-3.10/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn
  creating build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/layers
  creating build/lib.linux-x86_64-3.10/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/losses
  creating build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/llama.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/models
  creating build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/modules
  creating build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/ops
  creating build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-3.10/flash_attn/utils
  running build_ext
  building 'flash_attn_cuda' extension
  creating build/temp.linux-x86_64-3.10
  creating build/temp.linux-x86_64-3.10/csrc
  creating build/temp.linux-x86_64-3.10/csrc/flash_attn
  creating build/temp.linux-x86_64-3.10/csrc/flash_attn/src
  x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-3.10/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:42,
                   from csrc/flash_attn/fmha_api.cpp:33:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h: In function ‘void set_alpha(uint32_t&, float, Data_type)’:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:63:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     63 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:68:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     68 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha_utils.h:70:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     70 |         alpha = reinterpret_cast<const uint32_t &>( norm );
        |                                                     ^~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘void set_params_fprop(FMHA_fprop_params&, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, void*, float, float, bool, int)’:
  csrc/flash_attn/fmha_api.cpp:64:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct FMHA_fprop_params’; use assignment or value-initialization instead [-Wclass-memaccess]
     64 |     memset(&params, 0, sizeof(params));
        |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from csrc/flash_attn/fmha_api.cpp:33:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha.h:75:8: note: ‘struct FMHA_fprop_params’ declared here
     75 | struct FMHA_fprop_params : public Qkv_params {
        |        ^~~~~~~~~~~~~~~~~
  csrc/flash_attn/fmha_api.cpp:60:15: warning: unused variable ‘acc_type’ [-Wunused-variable]
     60 |     Data_type acc_type = DATA_TYPE_FP32;
        |               ^~~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd(const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, bool, int, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:208:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    208 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd_block(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:533:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    533 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  /usr/local/cuda/bin/nvcc -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src -I/data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o build/temp.linux-x86_64-3.10/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  In file included from /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/smem_tile.h:32,
                   from csrc/flash_attn/src/fmha_kernel.h:34,
                   from csrc/flash_attn/src/fmha_fprop_kernel_1xN.h:31,
                   from csrc/flash_attn/src/fmha_block_dgrad_kernel_1xN_loop.h:6,
                   from csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:5:
  /data/llm/code/Qwen-7B/flash-attention/csrc/flash_attn/src/fmha/gemm.h:32:10: fatal error: cutlass/cutlass.h: No such file or directory
     32 | #include "cutlass/cutlass.h"
        |          ^~~~~~~~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

@gpww
Copy link

gpww commented Aug 8, 2023

Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [105 lines of output]

  torch.__version__  = 2.0.1+cu117


  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-310
  creating build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn
  creating build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
  creating build/lib.linux-x86_64-cpython-310/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
  creating build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
  creating build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
  creating build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
  creating build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/modelscope/flash-attention/setup.py", line 175, in <module>

File "/root/miniconda3/envs/Modelscope/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

@gpww
Copy link

gpww commented Aug 8, 2023

不行,还是报错
g++ is already the newest version (4:11.2.0-1ubuntu1).
g++ set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 71 not upgraded.

@UCC-team
Copy link

image

@lonngxiang
Copy link

g++

same error; g++ 10.2

@nahidalam
Copy link

still same error. I have g++ 11.4 in an ubuntu system with CUDA 11.5

@wbbeyourself
Copy link

torch 2.1.0
cuda 12.1
g++ 10.2.1

执行:

apt-get update && apt-get install -y g++
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation

报错如下:

Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py): started
Building wheel for flash-attn (setup.py): still running...
Building wheel for flash-attn (setup.py): finished with status 'error'
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
fatal: not a git repository (or any of the parent directories): .git

torch.__version__  = 2.1.0.dev20230815+cu121


running bdist_wheel
Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.1.1/flash_attn-2.1.1+cu121torch2.1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
error: Remote end closed connection without response
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
ERROR: executor failed running [/bin/sh -c pip install flash-attn --no-build-isolation]: runc did not terminate successfully: exit status 1

@WangSheng21s
Copy link

conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit

@shuyhere
Copy link

shuyhere commented Oct 4, 2023

我遇到了同样的问腿,感觉可能是重定向下载某个东西时卡住了
我没有更改g++和cuda。是按照下面的操作成功的:

git clone git@github.com:Dao-AILab/flash-attention.git
cd /flash-attention
python setup.py install

注意这里会从出现错误提示flash-attention/csrc/cutlass找不到,git下载cutlass失败
所以cd flash-attention/csrc/ 然后 git@github.com:NVIDIA/cutlass.git

重新运行python setup.py install 就可以编译成功了

@hsingyu-chou
Copy link

Hi @shuyhere ,

After I try your solution and use flash-atten ver.1.0.5 , it works.
Thank you.

(Remark : I use ver.1.0.5 because I use T4 GPU.)

@SunLemuria
Copy link

install the pre-build wheel list in release page works for me, in my case:
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

@YundongGai
Copy link

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

It works for me.

@terry-for-github
Copy link

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

It works for me. Thanks!

@lihanghang
Copy link

Hi @shuyhere ,

After I try your solution and use flash-atten ver.1.0.5 , it works. Thank you.

(Remark : I use ver.1.0.5 because I use T4 GPU.)

Thank you! It works for me.

@tongjingqi
Copy link

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

Thank you! It works for me.

@tiansiyuan
Copy link

我遇到了同样的问腿,感觉可能是重定向下载某个东西时卡住了 我没有更改g++和cuda。是按照下面的操作成功的:

git clone git@github.com:Dao-AILab/flash-attention.git cd /flash-attention python setup.py install

注意这里会从出现错误提示flash-attention/csrc/cutlass找不到,git下载cutlass失败 所以cd flash-attention/csrc/ 然后 git@github.com:NVIDIA/cutlass.git

重新运行python setup.py install 就可以编译成功了

This is great, works for me.

Thanks a lot!

@liuyongqiangjava
Copy link

cutlass

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

Thank you! it work for me

@baihuier
Copy link

apt install g++, in my case, it works.

Thank you! It works for me.

@Hansyvea
Copy link

Hansyvea commented Apr 30, 2024

install the pre-build wheel list in release page works for me, in my case: pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

this works, thanks!
note: has to be abiFALSE rather than abiTrue

@d-kleine
Copy link

I have this issue on Windows, any fix for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests