-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Build] Add a reasonable default for CMAKE_CUDA_COMPILER in *nix #17293
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
@mxnet-label-bot add [Build] |
Why would nvcc not be on the PATH? Could you provide an example system for reference that comes with this setup? |
nvcc is normally not in the PATH in ubuntu nvidia packages it goes in the path that you can see in this PR. |
/usr/local/cuda/bin/nvcc is usually NOT in the path. |
It's expected that to compile software, the compiler must be available. For that, it must be either on With respect to C and C++ compilers, this is what the CC and CXX environment variables are for. Alike, if users want to use non-standard nvcc (ie nvcc that is not on PATH), they can set CUDACXX. I think it's better to follow standard practice instead of taking additional assumptions. For example, we may want to use clang to compile the cuda files instead of nvcc. If nvcc is not on path, users may reasonably expect that clang will be used. What do you think? |
CMakeLists.txt
Outdated
@@ -84,6 +84,10 @@ message(STATUS "CMake version '${CMAKE_VERSION}' using generator '${CMAKE_GENERA | |||
project(mxnet C CXX) | |||
if(USE_CUDA) | |||
cmake_minimum_required(VERSION 3.13.2) # CUDA 10 (Turing) detection available starting 3.13.2 | |||
if (NOT MSVC AND (NOT DEFINED CMAKE_CUDA_COMPILER OR "${CMAKE_CUDA_COMPILER}" STREQUAL "CMAKE_CUDA_COMPILER-NOTFOUND")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be required to check CUDACXX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked fine with this, could you expand on your concern / question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your current logic may break users from setting CUDACXX environment variable? That's the standard way of defining Cuda compiler. It's good to follow standards to avoid technical debt
@larroy In which environment did you see this problem? Could you paste the diagnose.py result here? |
@apeforest ubuntu 18.04 with nvidia machine learning APT repositories, pretty standard. I will update with diagnose.py as requested, thanks. |
@larroy when adding the APT, users should add the respective folder to the PATH. Is it not documented on the Nvidia page? |
249babf
to
b9ad4b4
Compare
We never had to do such a thing. This is happening due to CMake changes. I applied your suggestion. I would suggest to apply my proposed patch which makes it smoother in 99% of the cases for users. |
|
|
@mxnet-label-bot add [breaking] |
This fixes #15492 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify if this breaks CUDACXX env variable.
Further, this only helps users that didn't install Cuda correctly. See the mandatory actions in the Cuda installation guide https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#mandatory-post
I don't think we should include logic into our build logic to handle systems in broken state.
What do you suggest wrt https://cmake.org/cmake/help/v3.13/envvar/CUDACXX.html should we check if this is set? I understand your concern. Please suggest a better approach, I'm not a CMake expert, but this was working before without needing to set any paths. Even though you are right about the documentation from Nvidia. I can compile pytorch just fine without needing to do any additional changes to PATHs, or environments. When there's a single cuda version installed or symlinked to /usr/local/cuda we should just pick up that one unless specified otherwise. Please propose changes or alternatives. |
This is the output from pytorch build:
|
@larroy, yes it's required to check the environment variables (both Thus I suggest the following approach instead check_language(CUDA)
if (NOT CMAKE_CUDA_COMPILER_LOADED AND UNIX AND EXISTS "/usr/local/cuda/bin/nvcc")
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")
message(WARNING "CMAKE_CUDA_COMPILER guessed: ${CMAKE_CUDA_COMPILER}")
endif() It should be placed at the same position as the changes done in this PR. My concern is that if nvcc is not on the PATH, users may also have forgotten to set |
Thanks for the clarifications, make sense. I don't think LD_LIBRARY_PATH will be an issue in this case. As the mxnet so points to the right library, I can show that this is the case. I disagree with you regarding "broken system". /usr/local/cuda is the convention for default cuda installation, even though is not really in the nvidia documentation. I see your point, but we should just work by default, as before. |
LD_LIBRARY_PATH will not cause issues in this case, but it's a related problem source. It will cause problems if users install cuda via runfile and forget to set LD_LIBRARY_PATH. In any case, it's just an example for why we can't handle all kinds of broken systems. Given the updated strategy using check_language(CUDA) it should be fine to merge this PR |
@leezu then approve please |
@larroy why not use the approach in #17293 (comment) I don't think the PR handles the case described correctly yet. |
b9ad4b4
to
664d44b
Compare
@larroy why close this issues? You can just copy the suggested code change and push, then it can be merged:
|
I don't have much bandwidth left, but if the change is this small I can finish the PR. Seems Linux GPU is timeouting often though. |
9024af7
to
d92f769
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Fixes a bug in #17293 causing an infinite loop on some systems.
Fixes a bug in apache#17293 causing an infinite loop on some systems.
* [Build] Add a reasonable default for CMAKE_CUDA_COMPILER in *nix * CR * CR * Update as per CR comments * include(CheckLanguage) Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Fixes a bug in apache#17293 causing an infinite loop on some systems.
Description
After recent changes to CMake, CMAKE_CUDA_COMPILER is not picked up automatically, as nvcc is not usually on the PATH. This sets a reasonable default which is also used by convention by NVidia tooling which symlinks /usr/local/cuda to the default cuda version.