Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not find CUDA (found 11.7) #966

Closed
Qiangong2 opened this issue Apr 20, 2023 · 12 comments
Closed

Could not find CUDA (found 11.7) #966

Qiangong2 opened this issue Apr 20, 2023 · 12 comments

Comments

@Qiangong2
Copy link

Trying to compile a GPU accelerated version of Relion 4 with the Intel oneapi 2023.1.0 toolkit. Building an ALTCPU build compiles fine, but building a CUDA-accelerated build fails at cmake with:

-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version "11.7")
-- Using non-cuda compilation....

We have CUDA 11.7 installed and it is fully functional with our RTX A5000 GPUs

The cmake statement is:

cmake -DCUDA_ARCH=86 -DMKLFFT=ON -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DMPI_C_COMPILER=mpiicc -DMPI_CXX_COMPILER=mpiicpc -DCMAKE_C_FLAGS="-O3 -ip -g -restrict " -DCMAKE_CXX_FLAGS="-O3 -ip -g -restrict "  -DFORCE_OWN_FLTK=ON  .. 

Any ideas?

@biochem-fan
Copy link
Member

Do you have nvcc in the PATH?

@Qiangong2
Copy link
Author

Yes, it's in the path

@biochem-fan
Copy link
Member

Aren't you building it within an activated conda environment?
Please deactivate conda.

@Qiangong2
Copy link
Author

I'm not using conda. The only thing different than normal is that I'm having to source intel's setvars file if I want to use the compilers (without explicit paths). I think it may be overwriting my PATH and LD_LIBRARY_PATH

@biochem-fan
Copy link
Member

That should not be a problem; I also source setvars.sh.

@Qiangong2
Copy link
Author

Qiangong2 commented Apr 22, 2023

Alright, new error:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cufft_LIBRARY (ADVANCED)
    linked by target "tiltpair_plot" in directory /root/relion/src/apps
    linked by target "star_handler" in directory /root/relion/src/apps
(more lines)
    linked by target "particle_reposition" in directory /root/relion/src/apps
CUDA_curand_LIBRARY (ADVANCED)
    linked by target "relion_lib" in directory /root/relion/src/apps

Definitely something to do with the libraries. To get this far, I added -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.1 to the cmake command.

EDIT: We use the offline run file to install cuda if that makes any difference

@Qiangong2
Copy link
Author

If I append -LAH to cmake, it says that the libraries are not found, but when I try to manually add them in the cmake command (like -DCUDA_cufft_LIBRARY:FILEPATH=/usr/local/cuda-12.1/lib64/libcufft.so) It just ignores the options

@biochem-fan
Copy link
Member

To confirm CUDA installation, can you build it with GCC?

@Qiangong2
Copy link
Author

Yes, it finds the CUDA libraries just fine and builds with standard gcc

@Qiangong2
Copy link
Author

Qiangong2 commented Apr 24, 2023

Same issue with Cuda 12.1. Here is what cmake -LAH says for everything cuda related:

CMake Generate step failed.  Build files cannot be regenerated correctly.
// Enable CUDA GPU acceleration
CUDA:BOOL=ON
CUDA_64_BIT_DEVICE_CODE:BOOL=ON
// Attach the build rule to the CUDA source file.  Enable only when the CUDA source file is added to at most one target.
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE:BOOL=ON
CUDA_BUILD_CUBIN:BOOL=OFF
CUDA_BUILD_EMULATION:BOOL=OFF
CUDA_CUDART_LIBRARY:FILEPATH=CUDA_CUDART_LIBRARY-NOTFOUND
CUDA_CUDA_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libcuda.so
CUDA_GENERATED_OUTPUT_DIR:PATH=
CUDA_HOST_COMPILATION_CPP:BOOL=ON
CUDA_HOST_COMPILER:FILEPATH=/opt/intel/oneapi/compiler/2023.0.0/linux/bin/intel64/icc
CUDA_NVCC_EXECUTABLE:FILEPATH=/usr/local/cuda/bin/nvcc
CUDA_NVCC_FLAGS:STRING=
CUDA_NVCC_FLAGS_BENCHMARKING:STRING=-arch=sm_61  -D__INTEL_COMPILER --default-stream per-thread 
CUDA_NVCC_FLAGS_DEBUG:STRING=-arch=sm_61  -D__INTEL_COMPILER --default-stream per-thread
CUDA_NVCC_FLAGS_MINSIZEREL:STRING=
CUDA_NVCC_FLAGS_PROFILING:STRING=-arch=sm_61  -D__INTEL_COMPILER --default-stream per-thread -lineinfo
CUDA_NVCC_FLAGS_RELEASE:STRING=-arch=sm_61  -D__INTEL_COMPILER --default-stream per-thread --disable-warnings
CUDA_NVCC_FLAGS_RELWITHDEBINFO:STRING=-arch=sm_61  -D__INTEL_COMPILER --default-stream per-thread
CUDA_OpenCL_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libOpenCL.so
CUDA_PROPAGATE_HOST_FLAGS:BOOL=ON
CUDA_SDK_ROOT_DIR:PATH=/usr/local/cuda-12.1
// Compile CUDA objects with separable compilation enabled.  Requires CUDA 5.0+
CUDA_SEPARABLE_COMPILATION:BOOL=OFF
CUDA_TOOLKIT_INCLUDE:PATH=/usr/local/cuda/include
// Use the static version of the CUDA runtime library if available
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON
// Print out the commands run while compiling the CUDA source file.  With the Makefile generator this defaults to VERBOSE variable specified on the command line, but can be forced on with this option.
CUDA_VERBOSE_BUILD:BOOL=OFF
// Version of CUDA as computed from nvcc.
CUDA_VERSION:STRING=12.1
CUDA_cublas_LIBRARY:FILEPATH=CUDA_cublas_LIBRARY-NOTFOUND
CUDA_cudadevrt_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcudadevrt.a
// static CUDA runtime library
CUDA_cudart_static_LIBRARY:FILEPATH=/usr/local/cuda/lib64/libcudart_static.a
CUDA_cufft_LIBRARY:FILEPATH=CUDA_cufft_LIBRARY-NOTFOUND
CUDA_cupti_LIBRARY:FILEPATH=/usr/local/cuda/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY:FILEPATH=CUDA_curand_LIBRARY-NOTFOUND
CUDA_cusolver_LIBRARY:FILEPATH=CUDA_cusolver_LIBRARY-NOTFOUND
CUDA_cusparse_LIBRARY:FILEPATH=CUDA_cusparse_LIBRARY-NOTFOUND
CUDA_nppc_LIBRARY:FILEPATH=CUDA_nppc_LIBRARY-NOTFOUND
CUDA_nppial_LIBRARY:FILEPATH=CUDA_nppial_LIBRARY-NOTFOUND
CUDA_nppicc_LIBRARY:FILEPATH=CUDA_nppicc_LIBRARY-NOTFOUND
CUDA_nppidei_LIBRARY:FILEPATH=CUDA_nppidei_LIBRARY-NOTFOUND
CUDA_nppif_LIBRARY:FILEPATH=CUDA_nppif_LIBRARY-NOTFOUND
CUDA_nppig_LIBRARY:FILEPATH=CUDA_nppig_LIBRARY-NOTFOUND
CUDA_nppim_LIBRARY:FILEPATH=CUDA_nppim_LIBRARY-NOTFOUND
CUDA_nppist_LIBRARY:FILEPATH=CUDA_nppist_LIBRARY-NOTFOUND
CUDA_nppisu_LIBRARY:FILEPATH=CUDA_nppisu_LIBRARY-NOTFOUND
CUDA_nppitc_LIBRARY:FILEPATH=CUDA_nppitc_LIBRARY-NOTFOUND
CUDA_npps_LIBRARY:FILEPATH=CUDA_npps_LIBRARY-NOTFOUND
CUDA_nvToolsExt_LIBRARY:FILEPATH=CUDA_nvToolsExt_LIBRARY-NOTFOUND
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/librt.a

This is the output of: cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDA_SDK_ROOT_DIR=/usr/local/cuda-12.1 -DCUDA_ARCH=61 -DMKLFFT=ON -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DMPI_C_COMPILER=mpiicc -DMPI_CXX_COMPILER=mpiicpc -DCMAKE_C_FLAGS="-O3 -ip -g -xCOMMON-AVX512 -restrict " -DCMAKE_CXX_FLAGS="-O3 -ip -g -xCOMMON-AVX512 -restrict " -DFORCE_OWN_FLTK=ON .. -LAH | grep CUDA

@khal3dj
Copy link

khal3dj commented Jul 27, 2023

Hi, what is your CMake version? If you're using CMake >= 3.10, then I just solved a similar issue with CUDA 11.2 installed in a non-default directory structure. I'm not sure if you got the same warning:

-- CUDA enabled - Building CUDA-accelerated version of RELION                                                                                                                        
-- Setting cpu precision to double                                                                                                                                                   
-- Setting accelerated code precision to single                                                                                                                                      
CMake Warning (dev) at CMakeLists.txt:142 (FIND_PACKAGE):                                                                                                                            
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake                                                                                                             
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to                                                                                                        
  set the policy and suppress this warning.                                                                                                                                          
                                                                                                                                                                                     
This warning is for project developers.  Use -Wno-dev to suppress it.                                                                                                                
                                                                                                                                                                                     
-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version "11.2")                                                                                                           
-- Using non-cuda compilation....     

If you did, then that would be potential issue #1:
Apparently the FindCUDA module (used in CMakeLists.txt, line 142) is deprecated since version 3.10 (more info here), and it can cause some trouble locating the right CUDA directories even if all environment variables are set correctly. Passing -DCUDA_CUDART_LIBRARY=/usr/lib/cuda/targets/x86_64-linux/lib/libcudart.so to Cmake took care of CUDA_CUDART_LIBRARY, but nothing would convince it to accept CUDA_INCLUDE_DIRS (like you mentioned).

What I did after some research (and experimentation) is edit CMakeLists.txt, add the following call BEFORE line 142, so it would look as follows:

FIND_PACKAGE(CUDAToolkit)
FIND_PACKAGE(CUDA)

If your CUDA installation has the include folder where it should be (e.g. ../cuda/include ), then hopefully you should be good to go, because that's where the module is going to look. Hope that helps.

P.S. You probably won't have the next problems, but I'll include them just in case someone else with these symptoms reads this.

If you can't find the include folder in the root CUDA directory where bin/nvcc is (maybe with your 11.7 installation?), then potential issue #2:
My CUDA installation/directory structure is a little weird, so I gave CMake the path to my CUDA include dir like so:
-DCUDAToolkit_INCLUDE_DIR=/usr/lib/cuda/targets/x86_64-linux/include
which was ok for FIND_PACKAGE(CUDAToolkit) and sufficient in theory, but it didn't pass if(CUDA_FOUND) in CMakeLists.txt, so I ended up just creating a symbolic link (careful here):
sudo ln -s /usr/lib/cuda/targets/x86_64-linux/include /usr/lib/cuda/include
Then cleared my build dir, ran cmake again, and it went through with no issues.

If you're using CUDA 11 on an up-to-date system, there is a good chance your default gcc/g++ compilers are too new as well (for CUDA 11, they must be version =< 10).
In that case, potential issue #3 could arise:
CMake will complain with something like error -- unsupported GNU version! gcc 10 and up are not supported!
I didn't try compiling with Intel's icc, but it uses the same headers/libs/etc. as the default gcc I believe.
For that error, make sure you have gcc-10 and g++-10 (or below) installed, then provide them to CMake when you execute:
cmake -DCMAKE_C_COMPILER=gcc-10 -DCMAKE_CXX_COMPILER=g++-10
and for make :
make CC=gcc-10 CXX=g++-10
or the Intel MKL equivalent. And make sure your mpicc/mpicxx are using the same version of course, or make will fail when linking some stuff.
Alternatively, you could also create symlinks in cuda/bin to the right executables.

Happy compiling and thanks to the devs!

@Qiangong2
Copy link
Author

Closing as affected machine has been retired and there is no way to continue testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants