Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when building the latest flash-attn (2023-07-29) with Ninja #391

Open
tongyx361 opened this issue Jul 28, 2023 · 4 comments
Open

Comments

@tongyx361
Copy link

Errors when building flash-attn with Ninja

I've been using pip install flash-attn --no-build-isolation (2023-07-29)

Related environment information:

  • CUDA version: 11.7
  • PyTorch version: 2.0.1

Reading the installation instructions with the error message, I realized that it was an error in the Ninja compilation

According to a reply under Issue #358, if you don't set MAX_JOBS, it's very likely that Ninja's parallel build will consume so much resource that the system can't handle it.

In fact, the installation instructions also point this out:

If your machine has less than 96GB of RAM and lots of CPU cores, ninja might run too many parallel compilation jobs that could exhaust the amount of RAM. limit the number of parallel compilation jobs, you can set the environment variable MAX_JOBS: `MAX_JOBS=4 pip install flash-attn --no-build- isolation

Information about the devices on the server I'm using:

  • 72 CPU cores
  • RAM: 197G/376G free/total
              total used free shared buff/cache available
Mem: 376G 62G 197G 12G 116G 298G
Swap: 29G 24G 5.8G

But it seems that the server I'm using has more than 96GB of free CPU RAM, so theoretically it shouldn't cause an error by not setting MAX_JOBS.

ChatGPT:
ninja is an efficient parallel build system that can start multiple compilation tasks at the same time to speed up builds.
By default, ninja attempts to use all available CPU cores to build projects in parallel, which can lead to excessive consumption of system resources, especially on systems with low memory.
MAX_JOBS is an environment variable that limits the number of parallel build tasks for ninja. The number of parallel build tasks can be controlled by setting the MAX_JOBS environment variable to avoid excessive consumption of system resources. Typically, a reasonable value would be twice the number of CPU cores.

In practice, however, error messages do vary with the MAX_JOBS setting.

Errors with different MAX_JOBS settings

MAX_JOBS=1

MAX_JOBS=1 pip install flash-attn --no-build-isolation

Key error message:

/tmp/pip-install-npwm31a8/flash-attn_fe2073454d9241589566a66867aa6cfb/csrc/flash_attn/src/flash_fwd_launch_template.h(118): here
          instantiation of "void run_mha_fwd_hdim128<T>(Flash_fwd_params &, cudaStream_t) [with T=cutlass::bfloat16_t]"
/tmp/pip-install-npwm31a8/flash-attn_fe2073454d9241589566a66867aa6cfb/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.cu(18): here

26 errors detected in the compilation of "/tmp/pip-install-npwm31a8/flash-attn_fe2073454d9241589566a66867aa6cfb/csrc/flash_attn/src/flash _fwd_hdim128_bf16_sm80.cu".
ninja: build stopped: subcommand failed.
ninja: build stopped: subcommand failed.

ninja: build stopped: subcommand failed.
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '1']' returned non-zero exit status 1.

A similar error was reported in Issue #358

I tried with MAX_JOBS=1 and see.

      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_launch_template.h
(179): error: expression must have a constant value
      Note #2767-D: the value of *this cannot be used as a constant
                detected during instantiation of "void run_mha_bwd_hdim32<T>(Flash_bwd_params &, cudaStream_t, __nv_bool) [
with T=cutlass::bfloat16_t]"
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.
cu(15): here
      
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_launch_template.h
(179): error: expression must have a constant value
      Note #2767-D: the value of *this cannot be used as a constant
                detected during instantiation of "void run_mha_bwd_hdim32<T>(Flash_bwd_params &, cudaStream_t, __nv_bool) [
with T=cutlass::bfloat16_t]"
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.
cu(15): here
      
      2 errors detected in the compilation of "/tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/f
lash_attn/src/flash_bwd_hdim32_bf16_sm80.cu".

EDIT: That looks like #343

where Issue #343 proposes a PR that has been accepted and shouldn't pose a problem

MAX_JOBS=2/4/72

The following is an example of MAX_JOBS=2

Command run:

MAX_JOBS=2 pip install flash-attn --no-build-isolation

Key error message:

...
      ptxas info : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_ traitsILi96ELi64ELi128ELi8ELi2ELi4ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi96ELi64ELi128ELi8ES2_EEEv16Flash_bwd_ params
          0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
      ptxas info : used 48 registers, 696 bytes cmem[0]
      ninja: build stopped: subcommand failed.
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '2']' returned non-zero exit status 1.

ChatGPT:
This error comes from NVIDIA's CUDA compiler, ptxas, and suggests that the compiler encountered a problem compiling the CUDA code. Specifically, the error message indicates that when compiling a CUDA function named _Z25flash_bwd_dot_do_o_kernel, the compiler was unable to fulfill the resources required for the function.
According to the error message, the function requires 48 registers and 696 bytes of shared memory (cmem). The registers are used to store variables and state within the function, while the shared memory is used to share data within the thread block.

It looks as if MAX_JOBS=2 is over the system resource limit? (weird)

@ari9dam
Copy link

ari9dam commented Aug 4, 2023

Did you find any solution?

@mattiamazzari
Copy link

Any solution?

@ari9dam
Copy link

ari9dam commented Oct 6, 2023

I tried to build the docker in a different machine and It worked.

@jianzi123
Copy link

I found that compiling flash-attn2 required a lot of c, which was later changed to an ngc image. If you want to compile from source, you'll probably need a good machine, otherwise you'll get a ninja error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants