Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build error "fatal error: ATen/cuda/CUDAGraphsUtils.cuh: No such file or directory" #1043

Closed
mindmapper15 opened this issue Feb 5, 2021 · 4 comments

Comments

@mindmapper15
Copy link

I'm trying to build and install apex to use with fairseq and I got this error.
It seems some file required to build fast multi-head attention feature is missing.
Does anyone has same issue with me?

My setup environment :
Python 3.8 (Anaconda)
torch 1.7.0
CUDA Version 11.0

In file included from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3:0, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/ATen/Context.h:4, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8, from /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/extension.h:4, from apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout.cpp:1: /tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:277:30: note: declared here DeprecatedTypeProperties & type() const { ^ /usr/local/cuda-11.0/bin/nvcc -I/tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include -I/tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/TH -I/tls/john/anaconda3/envs/john/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.0/include -I/tls/john/anaconda3/envs/john/include/python3.8 -c apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu -o build/temp.linux-x86_64-3.8/apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 -I./apex/contrib/csrc/multihead_attn/cutlass/ -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -gencode arch=compute_80,code=sm_80 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fast_additive_mask_softmax_dropout -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /usr/local/bin/gcc -std=c++14 In file included from apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu:14:0: apex/contrib/csrc/multihead_attn/softmax.h:3:41: fatal error: ATen/cuda/CUDAGraphsUtils.cuh: No such file or directory compilation terminated. error: command '/usr/local/cuda-11.0/bin/nvcc' failed with exit status 1 Running setup.py install for apex: finished with status 'error'

@mindmapper15
Copy link
Author

mindmapper15 commented Feb 5, 2021

OK...So maybe I found what the problem is?

https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/cuda

In master branch of pytorch, there is CUDAGraphsUtils.cuh file which is apex trying to find for its build but failed,

https://github.com/pytorch/pytorch/tree/1.7/aten/src/ATen/cuda
But on the other hand, in branch 1.7 of pytorch, that file is gone missing.

So, I think maybe I have to install pytorch from source or manually copy & paste that file to installed pytorch package path...


Just checked 1.7.1 version source files
And again, there is no such file in its path where it should be

@mindmapper15 mindmapper15 changed the title Build error with --global-option="--fast_multihead_attn" Build error "fatal error: ATen/cuda/CUDAGraphsUtils.cuh: No such file or directory" Feb 5, 2021
@mindmapper15
Copy link
Author

mindmapper15 commented Feb 5, 2021

So, Eventually I found the solution.

The CUDA header file CUDAGraphsUtils.cuh has been added since torch==1.8.0a, which is still on the alpha stage.

The stable version of pytorch is 1.7.1 currently, of course, which does not include this header file.
So as the previous versions like 1.7, 1.6, 1.5, etc.

What you have to do is just simply rollback the apex source to previous commit by following command.

git reset --hard 3fe10b5597ba14a748ebb271a6ab97c09c5701ac

Problem solved so I'll close this issue.

@ijpq
Copy link

ijpq commented Oct 20, 2021

So, Eventually I found the solution.

The CUDA header file CUDAGraphsUtils.cuh has been added since torch==1.8.0a, which is still on the alpha stage.

The stable version of pytorch is 1.7.1 currently, of course, which does not include this header file. So as the previous versions like 1.7, 1.6, 1.5, etc.

What you have to do is just simply rollback the apex source to previous commit by following command.

git reset --hard 3fe10b5597ba14a748ebb271a6ab97c09c5701ac

Problem solved so I'll close this issue.

problem solved on torch==1.4.0

@rxmao
Copy link

rxmao commented Oct 27, 2021

So, Eventually I found the solution.
The CUDA header file CUDAGraphsUtils.cuh has been added since torch==1.8.0a, which is still on the alpha stage.
The stable version of pytorch is 1.7.1 currently, of course, which does not include this header file. So as the previous versions like 1.7, 1.6, 1.5, etc.
What you have to do is just simply rollback the apex source to previous commit by following command.
git reset --hard 3fe10b5597ba14a748ebb271a6ab97c09c5701ac
Problem solved so I'll close this issue.

problem solved on torch==1.4.0

Could you say how you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants