Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: cuda_runtime.h: No such file or directory #131

Closed
gr8Adakron opened this issue Jul 5, 2018 · 15 comments
Closed

fatal error: cuda_runtime.h: No such file or directory #131

gr8Adakron opened this issue Jul 5, 2018 · 15 comments

Comments

@gr8Adakron
Copy link

gr8Adakron commented Jul 5, 2018

I have Installed and make NCCL successfully then added all the environment paths too, after that I am trying to run this test program:


#include <nccl.h>

typedef struct {
  double* sendBuff;
  double* recvBuff;
  int size;
  cudaStream_t stream;
} PerThreadData;

int main(int argc, char* argv[])
{
  int nGPUs;
  cudaGetDeviceCount(&nGPUs);
  ncclComm_t* comms = (ncclComm_t*)malloc(sizeof(ncclComm_t)*nGPUs);
  ncclCommInitAll(comms, nGPUs); // initialize communicator
                                // One communicator per process

  PerThreadData* data;

  ... // Allocate data and issue work to each GPU's
      // perDevStream to populate the sendBuffs.

  for(int i=0; i<nGPUs; ++i) {
    cudaSetDevice(i); // Correct device must be set
                      // prior to each collective call.
    ncclAllReduce(data[i].sendBuff, data[i].recvBuff, size,
        ncclDouble, ncclSum, comms[i], data[i].stream);
  }

  ... // Issue work into data[*].stream to consume buffers, etc.
}

and it keeps giving me this error:

$ g++ nccl_temp.cpp

In file included from nccl_temp.cpp:1:0:
/usr/local/include/nccl.h:10:26: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.

This is when I do: locate cuda_runtime.h it returns me this:
/usr/local/cuda-9.0/targets/x86_64-linux/include/cuda_runtime.h

This is my LD_LIBRARY_PATH variable:
LD_LIBRARY_PATH=:./build/lib:/home/afzal/nickel/lib:/usr/local/cuda/lib64:/usr/local/cuda-9.0/targets/x86_64-linux/include/

This is my PATH variable:
PATH=/home/afzal/.virtualenvs/tensorflow_py36/bin:/home/afzal/bin:/home/afzal/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/home/afzal/.fzf/bin

Any help? As I am trying to install tf-serving but that too returns me error of nccl libraries, so I thought I will first solve the issues of nccl and that will eventually solve the problems of tf-serving.

-Thanks.

@gr8Adakron
Copy link
Author

gr8Adakron commented Jul 5, 2018

@kerrmudgeon @tfogal @dholt @jaredcasper I am stuck on this installation process, can anyone help me out, Please!

-thanks in advance

@gr8Adakron
Copy link
Author

And I am getting these errors in tf-serving installation:


nccl_manager.cc:(.text._ZN10tensorflow11NcclManager18LoopKernelLaunchesEPNS0_10NcclStreamE+0x386): undefined reference to `ncclBcast'

nccl_manager.cc:
(.text._ZN10tensorflow11NcclManager15GetCommunicatorEPNS0_10CollectiveE+0x53a): undefined reference to `ncclCommInitAll'

nccl_manager.cc:(.text._ZN10tensorflow11NcclManager15GetCommunicatorEPNS0_10CollectiveE+0xf21): undefined reference to `ncclGetErrorString'

@kwen2501
Copy link
Contributor

kwen2501 commented Jul 5, 2018

Hi, you need to use CUDA compiler nvcc instead of g++ to compile your CUDA program. See, for example, here: https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/

@gr8Adakron
Copy link
Author

Thanks! somehow I solved it after running this command.

sudo apt-get install nvidia-cuda-toolkit

As from my information everything was already installed, I don't know why nvcc was not present.

But still I am getting the tensorflow-serving error. Can you help me out with this?

I don't know everything is there still it says, after building and compiling while running the final command it returns all the undefined things. Which is weird and it's giving me a headache.

undefined reference to `ncclCommInitAll'

Please, help me ?

@sjeaugey
Copy link
Member

It would look like -lnccl is missing from the link command.

Still, I think this no longer applies to the current version. Please re-open if needed.

@victorhcm
Copy link

You can also fix it using CPATH to point where the header files are:

export CPATH=/usr/local/cuda-10.1/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH

@jefflgaol
Copy link

Good solution there, mate!

@sisrfeng
Copy link

sisrfeng commented Apr 5, 2020

while running centerNet, I met similar problem. I use the docker
image from https://hub.docker.com/r/frt03/centernet, and the problem is solved.
I think it is because my CUDA version is too old (9.0)

@DineshRajanT
Copy link

You can also fix it using CPATH to point where the header files are:

export CPATH=/usr/local/cuda-10.1/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH

But this still didn't solve for me

@sakex
Copy link

sakex commented Sep 9, 2020

You can also fix it using CPATH to point where the header files are:

export CPATH=/usr/local/cuda-10.1/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH

Thank you sir,

I'd like to add it works with cuda-11, just change to

export CPATH=/usr/local/cuda-11.0/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-11.0/bin:$PATH

@chenQ1114
Copy link

chenQ1114 commented Jul 29, 2021

I got the same error: cuda_runtime.h No such file or directory.

gmake -C src build BUILDDIR=/home/mwp141/Tool/nccl-2.8.3-1/build
which: no nvcc in (/usr/local/cuda-9.2/bin)
which: no nvcc in (/usr/local/cuda-9.2/bin)
which: no nvcc in (/usr/local/cuda-9.2/bin)
which: no nvcc in (/usr/local/cuda-9.2/bin)
gmake[1]: Entering directory /home/mwp141/Tool/nccl-2.8.3-1/src' Generating nccl.h.in > /home/mwp141/Tool/nccl-2.8.3-1/build/include/nccl.h Grabbing include/nccl_net.h > /home/mwp141/Tool/nccl-2.8.3-1/build/include/nccl_net.h Compiling init.cc > /home/mwp141/Tool/nccl-2.8.3-1/build/obj/init.o In file included from init.cc:7:0: /home/mwp141/Tool/nccl-2.8.3-1/build/include/nccl.h:10:26: fatal error: cuda_runtime.h: No such file or directory #include <cuda_runtime.h> ^ compilation terminated. gmake[1]: *** [/home/mwp141/Tool/nccl-2.8.3-1/build/obj/init.o] Error 1 gmake[1]: Leaving directory /home/mwp141/Tool/nccl-2.8.3-1/src'
gmake: *** [src.build] Error 2

export CPATH=/usr/local/cuda-9.2/targets/x86_64-linux/include:$CPATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.2/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-9.2/bin:$PATH

This did not solved by me. Do you know how to fix it? Thanks!

@sjeaugey
Copy link
Member

sjeaugey commented Jul 29, 2021

There seems to be no nvcc in /usr/local/cuda-9.2/bin. Is CUDA installed in /usr/local/cuda-9.2 ? If not, did you set CUDA_HOME to /usr/local/cuda-9.2 by mistake? Otherwise you can set CUDA_HOME to a path where CUDA is installed.

Also note, the latest version of NCCL will probably not compile with an old CUDA 9.2. I'd advise to upgrade to CUDA 10.2 at least, and preferably 11.4.

@ArchanaShinde1
Copy link

ArchanaShinde1 commented Feb 17, 2023

There seems to be no nvcc in /usr/local/cuda-9.2/bin. Is CUDA installed in /usr/local/cuda-9.2 ? If not, did you set CUDA_HOME to /usr/local/cuda-9.2 by mistake? Otherwise you can set CUDA_HOME to a path where CUDA is installed.

Also note, the latest version of NCCL will probably not compile with an old CUDA 9.2. I'd advise to upgrade to CUDA 10.2 at least, and preferably 11.4.

export CUDA_HOME=/usr/local/cuda-11.4
works for me.Thanks!

@amughrabi
Copy link

If you are using Anaconda, the following line works like a charm:

conda install -c nvidia cuda-toolkit

@mmehedin
Copy link

mmehedin commented Feb 1, 2024

If you are using Anaconda, the following line works like a charm:

conda install -c nvidia cuda-toolkit

this is the solution for the cuda_runtime.h error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests