Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to install 2.0 #358

Open
akanyaani opened this issue Jul 20, 2023 · 33 comments
Open

Not able to install 2.0 #358

akanyaani opened this issue Jul 20, 2023 · 33 comments

Comments

@akanyaani
Copy link

akanyaani commented Jul 20, 2023

Tried pip install and setup.py install both

/home/yellow/flash-attention/csrc/cutlass/include/cute/stride.hpp(112): warning: calling a __host__ function("__builtin_unreachable") from a __host__ __device__ function("cute::crd2idx< ::cute::tuple< ::cute::Underscore,  ::cute::Underscore, int > ,  ::cute::tuple< ::cute::tuple< ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2>  > ,  ::cute::constant<int, (int)2> ,  ::cute::tuple< ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2>  >  > ,  ::cute::tuple< ::cute::tuple< ::cute::constant<int, (int)1> ,  ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)4>  > ,  ::cute::constant<int, (int)8> ,  ::cute::tuple< ::cute::constant<int, (int)16> ,  ::cute::constant<int, (int)32>  >  > > ") is not allowed

Killed

txas info    : Used 255 registers, 576 bytes cmem[0]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yellow/flash-attention/setup.py", line 201, in <module>
    setup(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install.py", line 80, in run
    self.do_egg_install()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install.py", line 129, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
@lms21a
Copy link

lms21a commented Jul 20, 2023

Same issue--not a one off

@tmm1
Copy link
Contributor

tmm1 commented Jul 20, 2023

I'm seeing a similar issue:

      ptxas info    : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_traitsILi128ELi64ELi128E
Li8ELi2ELi4ELi2ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi128ELi64ELi128ELi8ES2_EEEv16Flash_bwd_params              
          0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads                                                   
      ptxas info    : Used 49 registers, 688 bytes cmem[0]                                                                 
      ninja: build stopped: subcommand failed.                                                                             
      Traceback (most recent call last):                                                                                   
        File "/home/tmm1/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build 
          subprocess.run(                                                                                                  
        File "/usr/lib/python3.10/subprocess.py", line 524, in run                                                         
          raise CalledProcessError(retcode, process.args,                                                                  
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.   

@timmoon10
Copy link

I also reproduce the error. Setting MAX_JOBS=1 in the environment fixes it for me, so it seems that compilation has become resource-intensive enough for a parallel Ninja build to overwhelm many systems. It's a long-term question, but I wonder if the current approach of statically-compiled CUDA kernels is sustainable. Perhaps there is value to considering JIT compilation, e.g. with Triton or NVRTC?

@tridao
Copy link
Contributor

tridao commented Jul 20, 2023

Yeah I personally don't like the fact that we're templating so heavily (for dropout / no dropout, causal / not causal, different head dimensions, whether seqlen is divisible by 128 or not, different GPU types). The goal has been to get maximum performance, perhaps at the expense of compilation time.

  • Agree with you that JIT compilation is interesting. I don't have any experience there however.
  • Another way is to have pre-built wheels that folks can just download. I'll get to that once I'm done fixing some of the edge cases with the backward pass.

@timmoon10
Copy link

Yea, it's definitely a hard problem that we've also been hitting in Transformer Engine.

It'll take some engineering effort, but I've found NVRTC to be a nice way to avoid the combinatorial explosion from templating. I've found that with the right wrapper classes (see rtc.h and rtc.cpp), it can be straightforward to write and launch JIT kernels (see how transpose.cu calls the kernel in rtc/transpose.cu). It does impose compilation time the first time each kernel is called though, and it is quite annoying to include external headers (including the CUDA Toolkit and C++ Standard Library).

@lms21a
Copy link

lms21a commented Jul 20, 2023

I also reproduce the error. Setting MAX_JOBS=1 in the environment fixes it for me, so it seems that compilation has become resource-intensive enough for a parallel Ninja build to overwhelm many systems. It's a long-term question, but I wonder if the current approach of statically-compiled CUDA kernels is sustainable. Perhaps there is value to considering JIT compilation, e.g. with Triton or NVRTC?

Unfortunately for me, I still get the same error even if MAX_JOBS=1 is set. Also tried building from source with the same error. Any temporary solution available?

@tmm1
Copy link
Contributor

tmm1 commented Jul 20, 2023

I tried with MAX_JOBS=1 and see:

      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_launch_template.h
(179): error: expression must have a constant value
      Note #2767-D: the value of *this cannot be used as a constant
                detected during instantiation of "void run_mha_bwd_hdim32<T>(Flash_bwd_params &, cudaStream_t, __nv_bool) [
with T=cutlass::bfloat16_t]"
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.
cu(15): here
      
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_launch_template.h
(179): error: expression must have a constant value
      Note #2767-D: the value of *this cannot be used as a constant
                detected during instantiation of "void run_mha_bwd_hdim32<T>(Flash_bwd_params &, cudaStream_t, __nv_bool) [
with T=cutlass::bfloat16_t]"
      /tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/flash_attn/src/flash_bwd_hdim32_bf16_sm80.
cu(15): here
      
      2 errors detected in the compilation of "/tmp/pip-install-mvmpcn8m/flash-attn_39e3b6d4eaad444b8beae75cad8aecd2/csrc/f
lash_attn/src/flash_bwd_hdim32_bf16_sm80.cu".

EDIT: That looks like #343

@MiladInk
Copy link

MiladInk commented Jul 21, 2023

seeing the same problem here:

4, in run
    _build_ext.run(self)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

I am setting MAX_JOBS=1 to see if it improves performance.

update: setting MAX_JOBS=1 solved it for me. It got installed albeit slowly.

@Desein-Yang
Copy link

I got the same error and I modify the setup.py files, ninja -v --> ninja --version. Then I met /cognitive_comp/yangqi/images/flash-attention-main/build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.o: No such file or directory

@mtisz
Copy link

mtisz commented Jul 26, 2023

Same issue here..

@ilikenwf
Copy link

ilikenwf commented Jul 27, 2023

I got the same error and I modify the setup.py files, ninja -v --> ninja --version. Then I met /cognitive_comp/yangqi/images/flash-attention-main/build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.o: No such file or directory

Same here.

Despite even attempting a git submodule update --init --recursive --force I still run into this issue...so if anyone finds anything, thank you very much.

@mtisz
Copy link

mtisz commented Jul 27, 2023

Pretty sure this is an issue with CUDA 12 -- but I don't know whether there's a timeline to support CUDA 12

@BoxiangW
Copy link
Contributor

Same issue here

@BoxiangW
Copy link
Contributor

Same issue here

I solved it by updating CUDA version, hope it helps

@ilikenwf
Copy link

I'm running 12.1 - which works fine with an older commit.

@BaldStrong
Copy link

You can reinstall Python from conda-forge (no need to change the Python version)
conda install python=3.x.xx --channel conda-forge

@mtisz
Copy link

mtisz commented Jul 27, 2023

I'm running 12.1 - which works fine with an older commit.

Can you link the commit/hash here?

@ilikenwf
Copy link

As it turns out i may be wrong about that with a submodule issue.

@mtisz
Copy link

mtisz commented Jul 27, 2023

It would be great if the devs can chip in here.. Is there a timeline for supporting CUDA 12?

@ilikenwf
Copy link

Downgrading all the cuda stuff to 11.8, and gcc to v11 appears to work.

@ilikenwf
Copy link

I spoke too soon. Is this a compatibility issue? Not sure why those kernels aren't building.

@ilikenwf
Copy link

So I need to try with cuda 12.x again just for fun but it appears ninja itself, even if one modifies the call to ninja -v in torch, is causing those files not to build. Uninstalling ninja takes longer to build, however those objects appear to be getting created.

@mtisz
Copy link

mtisz commented Jul 27, 2023

Can you build it without Ninja? I thought you can't..

@ilikenwf
Copy link

I'm building xformers which pulls in flash-attention via submodule, which may make a difference, but it implies that you can for a slower build experience.

@tmm1
Copy link
Contributor

tmm1 commented Jul 27, 2023

It works with MAX_JOBS=1 if you install from git. It will take a long time so be patient.

You can select specific cuda version using conda: https://hamel.dev/notes/cuda.html

@ilikenwf
Copy link

It works with MAX_JOBS=1 if you install from git. It will take a long time so be patient.

You can select specific cuda version using conda: https://hamel.dev/notes/cuda.html

It did not here, but maybe I needed to blow away my build directory first.

@tmm1
Copy link
Contributor

tmm1 commented Jul 27, 2023

changing ninja -v to ninja --version makes no sense. you need it to build code not print its version number..

  --version      print ninja version ("1.11.1")
  -v, --verbose  show all command lines while building

@ilikenwf
Copy link

I was following some half baked advice from a similar issue on github... regardless something odd is going on and MAX_JOBS doesn't really seem to help (although it may be getting ignored despite me exporting it in the console, since I'm building as part of xformers).

@FerdinandZhong
Copy link

I had encountered the same issue when I built with nvcc 11.6. And the package can be built with nvcc 11.8

@PhenixZhang
Copy link

Same issue here, and could anybody tell me what will happen if I set MAX_jobs=1?

@NeedsMoar
Copy link

In case anyone was wondering since I have enough ram and cores to test, I let Ninja do whatever it wanted with thread count and didn't change the setup file except to build for sm_89 in case it worked.

RAM usage maxed out at something over half of my available before gradually sloping back down, 268GB was the max I think... All 32/64 cores were at 100% and I'm pretty sure Ninja had at least one process running for every HT core. I didn't count them, but it went from basically parked to 4.3GHz and stayed there. As soon as compiles started finishing (erroring I mean) with constexpr issues due to the use of a non constexpr variable set through passing by reference to another method to instantiate a template cmd.exe began hanging for extended periods with the sheer volume of output spam it was trying to queue up for display... I've spammed it pretty impressively on purpose before (with the entire printable unicode range at 2MB/s no less) and have never seen it hang up on output until now.

Had it not been throwing hundreds of thousands of template errors I'd wager it would be quite a bit faster and not have used quite so much memory, but that amount of memory usage is still insane and something is broken with the template instantiation even when it builds ok, I suspect. This probably shouldn't be happening:

E:\code\flash-attention\csrc\flash_attn\src\flash_bwd_kernel.h(783): error: no instance of overloaded function "cute::copy" matches the argument list
            argument types are: (
cute::TiledCopy
   <cute::Copy_Atom
      <cute::SM75_U32x4_LDSM_N, cutlass::half_t>, 
   cute::Layout
      <cute::tuple
         <cute::tuple<cute::C<4>, cute::_8, cute::_2, cute::_4>, 
          cute::tuple
         <cute::tuple<cute::_2, cute::_2>, 
           cute::tuple<cute::_2, cute::_1>
        >
        >, cute::tuple
          <cute::tuple<cute::_128, cute::_1, cute::_0, cute::_8>, 
            cute::tuple
            <cute::tuple<cute::_64, cute::_512>, 
            cute::tuple<cute::C<32>, cute::_0>>
            >
          >, 
        cute::tuple
          <cute::Layout
             <cute::tuple<cute::C<8>, cute::C<4>, cute::_2>, 
              cute::tuple<cute::_1, cute::_32, cute::_8>
              >, 
            cute::Layout<cute::C<16>, cute::_1>
            >
          >, 
          cute::Tensor
           <cute::ViewEngine
               <cute::smem_ptr<cutlass::half_t>>, 
           cute::Layout
                <cute::tuple
                   <cute::tuple<cute::_8, cute::_1>, cute::_2, 
                    cute::tuple
                        <cute::tuple<cute::_2, cute::_2>, cute::_2>>, 
                               cute::tuple
                                   <cute::tuple<cute::_1, cute::_0>, 
                                 cute::_1024, 
                               cute::tuple<cute::tuple<int, int>, 
                               cute::C<8192>
                             >
                           >
                         >
                     >, 
    <error-type>)

          cute::copy(smem_tiled_copy_KV, tdPsV, tdPrV_copy_view);

Ignoring how bizarre that is, why is a SM7.5 template being instantiated? I know that wasn't on the command line... Is it a leftover? is every possible arch being built at once and that's the delay and the source of the enormous error count?

A long time ago I used to have to do rebuilds of the entire LLVM compiler suite + our product integration with it multiple times a day; it totalled in at something like 50k C++ source files. My work machine had 24GB of ram and 6 cores and I could still play videogames and run a linux VM during the 10 minutes it took for Visual Studio to crank out a full rebuild with the source on a regular SATA2 SSD. While trying to build this didn't make my computer unresponsive or even slow down I have a uniquely gigantic amount of memory installed for a home workstation. Your average current-ish 16 core Ryzen with 32GB of DDR5 because that's as much as you can install without crippling its speed thanks to the scam of XMP would have gone into swap file territory almost instantly and been difficult to kill all the tasks on since extra processes were being spawned for everything. I'd strongly suggest just killing off the ninja part of the build until it's un-screwed-up and maybe look at:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2280r4.html

Which might have something to do with all the template constexpr errors, although MS has been pretty on top of new stuff lately. I did try setting the standard to C++20 but it was unclear if it "took" since most command line options were being specified multiple times and things like HALF_FLOAT were being both defined and undefined on the same command line. It might be worth looking into clang-cl from the VS build tools to see if they've implemented something (right or wrong) that makes it work for now, I guess.

I'm also not super familiar with CUDA but something doesn't sit quite right about building for SM90 automatically based on CUDA version when the 4090 is SM8.9 (90 is hopper I guess?) but since two different versions get passed into the command line for the build (80 and 90, or in my case the 8.9 I shoved into setup.py to make it build a version I needed)

I'd also strongly suggest installing a Windows VM on your linux install, sucking it up, and learning how to use the visual studio compilation tools. They're significantly easier to deal with than clang or GCC as far as that goes, and this kind of template spaghetti factory explosion isn't something you can just sit around and say "I don't know windows somebody else can help" forever if you want a windows build. If you can't or aren't willing to do it there's nothing wrong with just announcing the Windows build is dead. I'd rather it wasn't but realistically projects have to be worked on by their maintainers or at least understandable by new people and I'd personally rather fix bugs in boost::spirit for years. A good first step might be deleting the hundreds of commented out lines of code doing slightly different things with unlabelled values; one of the nice things about source control is that you don't have to keep 5 years of commented out code with no comments on what it did, why it was changed, or what was better or worse laying around in the source tree. That at least gives people something clean to look at if you find somebody willing to take this on.

I'm not trying to be rude but man... Somebody might be willing to help you from NVidia or hugging face since this gets used with xformers which doesn't have as much oomph without it apparently and triton can't be built on Windows easily either.

@tridao
Copy link
Contributor

tridao commented Oct 8, 2023

This is a free and open source project, and I'm maintaining it in my free time. My expertise is not in compiling or building packages.
We have a PyPI package because of community contribution, and we now have prebuilt CUDA wheels for Linux because of community contribution.
I've recently clarified in the README:
"Requirement: Linux. Windows is not supported for now. If you have ideas on how to modify the code to support Windows, please reach out via Github issue."

@rybakov
Copy link

rybakov commented May 30, 2024

I had similar error with TransformerEngine: during installation there is a moment when it needs around 70GB RAM (total).
Simple solution is to increase swap size (so that free RAM + swap > 70GB).

sudo swapoff /swapfile

# resize the file, in this instance to create a 128GB swap file
sudo fallocate -l 128G /swapfile

# "mark" the file as a swapfile
sudo mkswap /swapfile

# enable the swap file with swapon
sudo swapon /swapfile

# verify the original and new swap file size.
swapon --show

Now install it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests