-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
installing dropout_layer_norm #131
Comments
It takes about 5-6 minutes to install if |
Thanks for the very quick response. I do have ninja installed so not sure where the issue is. The following are the specs. ninja --version gcc --version nvcc --version The installation of main package works fine though. |
Are multiple CPU cores being used to compile? |
yup. 24 cores all seem to be fully occupied. |
When I run with pip install -v, It also appears to be stuck at processing ln_fwd_xxx.cu and ln_bws_xxx.cu files --its still progressing, but very slowly. |
You can comment out these 2 lines if you're only using Ampere and that should reduce compilation time by 2x. |
Thanks, Ill try this.. One of the warnings I get is this, but a little beyond me to know if this is an issue. though. |
Even with the modification, it is running for over 1hr now. Btw, do you have a docker with all the flash-attn packages compiled? |
I'm facing the same issue. It stuck while installing |
I figured that it is faster if we use 11.8 cuda but gets stuck on lower versions. Not sure why though. Another hack is to go through the list of .cu files in the setup and only install the fwd/bwd libs for the dimensions you care about. |
@tridao out of curiosity, could you share some insight on the performance impact of |
@shijie-wu I've seen performance improvement on the order of 1-4% with |
is there any rule of thumb you could share on the relationship between improvement and model size? |
Larger models will get less improvement, since layer norm (and dropout & residual) will take less time relative to matrix multiply in the MLP and attention. You can profile your model to see what fraction of time is spent on layer norm etc. |
on unrelated note, what is the typical speed up with |
I don't remember exactly, maybe on the order of 2-7%. Larger models will get less improvement, for the same reason above. |
thanks for sharing! |
To add another data point I am seeing exactly the same issue when compiling via
|
I can confirm I was seeing the same result, stuck on the exact same line After switching to the container mentioned Not sure if this is helpful for anyone but thought I'd share. |
I am trying to install drop_layer_norm to use with fused_dropout_add_ln but the pip installation is taking over an hour to install (and not yet done but also no errors). Is this normal?
The text was updated successfully, but these errors were encountered: