Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no cuda/cuda_config.h #1

Open
lshiwjx opened this issue Jul 23, 2017 · 25 comments
Open

no cuda/cuda_config.h #1

lshiwjx opened this issue Jul 23, 2017 · 25 comments

Comments

@lshiwjx
Copy link

lshiwjx commented Jul 23, 2017

when compiled with g++:
/python3.4/site-packages/tensorflow/include/tensorflow/stream_executor/dso_loader.h:32:30: fatal error: cuda/cuda_config.h: No such file or directory compilation terminated.

solved by copy a cuda_config.h file from https://insight.io/github.com/tensorflow/tensorflow/blob/master/third_party/toolchains/gpus/cuda/cuda/cuda_config.h?line

@lshiwjx
Copy link
Author

lshiwjx commented Jul 23, 2017

but there will be much errors then.

@Zardinality
Copy link
Owner

Normally when you complie tensorflow from source, after running ./configure and set properly, cuda_config.h will be generated automatically. I will add it in README in case anyone encounter this. And what errors do you have right now? Feel free to paste it there!

Zardinality added a commit that referenced this issue Jul 23, 2017
@lshiwjx
Copy link
Author

lshiwjx commented Jul 23, 2017

Thanks for your reply!
I don't know why there is no cuda_config.h, maybe it is because I installed tensorflow with conda.

Then I compiled it with the recommend method:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
nvcc -std=c++11 -c -o deform_conv.cu.o deform_conv.cu.cc -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -L /usr/local/cuda-8.0/lib64/ --expt-relaxed-constexpr
g++ -std=c++11 -shared -o deform_conv.so deform_conv.cc deform_conv.cu.o -I $TF_INC -fPIC -lcudart -L /usr/local/cuda-8.0/lib64 -D GOOGLE_CUDA=1 -Wfatal-errors -I /usr/local/cuda-8.0/include -D_GLIBCXX_USE_CXX11_ABI=0

but when I use the deform_conv.so, there is an error:
tensorflow.python.framework.errors_impl.NotFoundError: /home/sl/Project/TENSORFLOW/TF-deformable-conv-master/lib/deform_conv.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

gcc 5.4.0

@Zardinality
Copy link
Owner

The same compliing method works on my machine. It seems from this link gcc5 with -D_GLIBCXX_USE_CXX11_ABI=0 is related(I mean with this flag gcc can do this work). Maybe you could try the advice from link:

Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi. Furthermore if you are using TensorFlow package created from source remember to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" as bazel command to compile the Python package.

or simply switch to gcc4 to see if it'll work.

@lshiwjx
Copy link
Author

lshiwjx commented Jul 23, 2017

Thanks, but I have used it and it can't help.
Finally I reinstall the gcc an g++ with version 4.8. And it can run demo now.
But there is a bug:

File "demo.py", line 39, in deform_conv_2d strides=[1, 1, stride, stride], num_groups=1) TypeError: deform_conv_op() missing 1 required positional argument: 'deformable_group'
seems need to add something like deformable_group=1 after the num_group=1

@Zardinality
Copy link
Owner

Right, just before you reply me I realized I didn't update demo and benchmark when last but one commit, now it should be normal.

@zhihengli-UR
Copy link

I use g++-4.8 to compile the source file but with the same error:

fatal error: cuda/cuda_config.h: No such file or directory #include "cuda/cuda_config.h"

I install tensorflow via pip.

I use find . -name 'cuda_config.h' command to search the header file at tensorflow's site-packages directory, but failed to find that file.

@Zardinality
Copy link
Owner

@hubertlee915 Already fix it by add cuda_config.h in lib. Like what I do in another repo, can you confirm this was fixed?

@zhihengli-UR
Copy link

It works! Thank you!

@zhihengli-UR
Copy link

I encounter the same problem (undefined symbol error) as @GitHubShily said before. I've also reinstalled my g++-4.8 but it doesn't help.

@Zardinality
Copy link
Owner

@hubertlee915 sorry I have no idea about how this happens. Except for install tensorflow and this op from source using bazel, or double check the filename of that *.so, I can give you no other advice.

@cotrane
Copy link

cotrane commented Aug 4, 2017

I had the same issue as @GitHubShily and @hubertlee915
I solved it by

  • installing gcc-4.9 and g++-4.9
  • changing nvcc_compile.sh to:
    nvcc -std=c++11 -ccbin=/usr/bin/g++-4.9 -c -o deform_conv.cu.o deform_conv.cu.cc -I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -L /usr/local/cuda-8.0/lib64/ --expt-relaxed-constexpr
  • and changing g++_complie.sh to:
    g++-4.9 -std=c++11 -shared -o deform_conv.so deform_conv.cc deform_conv.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_HOME/lib64 -D GOOGLE_CUDA=1 -Wfatal-errors -I $CUDA_HOME/include -D_GLIBCXX_USE_CXX11_ABI=0

It seems that the flag D_GLIBCXX_USE_CXX11_ABI=0 in g++_complie.sh is not enough. Probably there is a similar flag you need to add to nvcc_compile.sh? I haven't tried further.

Thanks for the code anyways!

@zhihengli-UR
Copy link

It works! Thank you! @cotrane

@Zardinality
Copy link
Owner

@cotrane Thank you for those tips! In this case I will suggest all people who intend to try this repo to use gcc-4.9.

@zhihengli-UR
Copy link

gcc-4.8 works as well. Adding -ccbin flag to nvcc is the key issue.

@tyyyang
Copy link

tyyyang commented Sep 6, 2017

Just as @cotrane said, it still needs to add the flag D_GLIBCXX_USE_CXX11_ABI=0 into the nvcc_complile.sh, right after the $TF_INC.

nvcc -std=c++11 -c -o deform_conv.cu.o deform_conv.cu.cc -I $TF_INC -D_GLIBCXX_USE_CXX11_ABI=0 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -L /usr/local/cuda-8.0/lib64/ --expt-relaxed-constexpr

This should work fine for gcc5.

@tygrer
Copy link

tygrer commented Oct 30, 2017

Please tell me where is nvcc_compile.sh and g++_complie.sh, I can't find them.

@tygrer
Copy link

tygrer commented Oct 30, 2017

@hubertlee915 @cotrane @Zardinality @skyoung
Please tell me where is nvcc_compile.sh and g++_complie.sh, I can't find them. Thank you very much!

@Zardinality
Copy link
Owner

It is right in ./lib. @tygrer

@John1231983
Copy link

John1231983 commented Jan 1, 2018

Hello all, I am using g++ 4.9 and cuda 8.0, cudnn 6.0. I installed tensorflow from pip3 install tensorflow-gpu==1.3. When I run ./nvcc_compile.sh. I got the error


nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/platform/default/stream_executor.h:26:0,
                 from /usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/platform/stream_executor.h:24,
                 from /usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h:26,
                 from deform_conv.cu.cc:70:
/usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/stream_executor/dso_loader.h:32:30: fatal error: cuda/cuda_config.h: No such file or directory
 #include "cuda/cuda_config.h"
                              ^
compilation terminated.

I did as above suggestion by copy cuda_config.h but I got the new error

/usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(620): error: identifier "__shfl" is undefined

/usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(640): error: identifier "__shfl_up" is undefined

/usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(660): error: identifier "__shfl_down" is undefined

/usr/local/lib/python3.5/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(680): error: identifier "__shfl_xor" is undefined

Does it need to reinstall cudnn 5.0?

@Zardinality
Copy link
Owner

@John1231983 I found similiar issue where your last error was fixed by using TF 1.2.1.

@John1231983
Copy link

Hi. Thanks for your reply. What tf version did you use?

@Zardinality
Copy link
Owner

@John1231983 1.2.0 , but I have it source installed.

@John1231983
Copy link

Hi @Zardinality: Your method worked fine in personal computer by copy

if [ ! -f $TF_INC/tensorflow/stream_executor/cuda/cuda_config.h ]; then
    cp ./cuda_config.h $TF_INC/tensorflow/stream_executor/cuda/
fi

However, for the server, I cannot copy it to the folder because of permission issue. Do you have another way such as link the lib to another folder that does not require permission

@Zardinality
Copy link
Owner

Zardinality commented Jan 10, 2018

@John1231983 You might be using tensorflow owned by root. Why not install it in your own home directory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants