Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn CUDA off by default #4406

Merged
merged 1 commit into from Feb 19, 2020
Merged

Turn CUDA off by default #4406

merged 1 commit into from Feb 19, 2020

Conversation

rwy7
Copy link
Member

@rwy7 rwy7 commented Oct 3, 2019

  • Stop automatically enabling OMR_OPT_CUDA, so a user has to specifically opt-in for CUDA support.
  • Stop using the built-in FindCUDA package, because it requires a complete installation of CUDA. We only need a small subset of the CUDA SDK.
  • Enable OMR_OPT_CUDA explicitly in X86-64 builds

Copy link
Contributor

@mstoodle mstoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle. One question and a few hopefully quick changes requested before we merge.

CMakeLists.txt Outdated
@@ -68,15 +68,22 @@ set(OMR_INSTALL_DATA_DIR ${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME} CACHE PATH "In
###

if (OMR_ENV_DATA64)
find_package(Threads)
find_package(Threads QUIET)
if(Threads_FOUND)
# Threads mush be found before we can look for CUDA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe correct this typo "mush" while you're here?

CMakeLists.txt Outdated
@@ -68,15 +68,22 @@ set(OMR_INSTALL_DATA_DIR ${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME} CACHE PATH "In
###

if (OMR_ENV_DATA64)
find_package(Threads)
find_package(Threads QUIET)
if(Threads_FOUND)
# Threads mush be found before we can look for CUDA.
# FindCuda will error out if Threads is missing, even though CUDA itself is optional.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should correct this comment as it no longer matches what will happen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what needs to be corrected here.

Copy link
Contributor

@mstoodle mstoodle Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you added "QUIET" it's no longer true that "FindCuda will error out" is it? Or am I reading too much into this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you're saying. FindCUDA is already optional, now we're just making it quiet too. The comment is documenting what I consider a bug in FindCUDA module. FindCUDA treats FindThreads as required, even if FindCUDA itself is optional. So we have to ensure that FindThreads exists before we use FindCUDA.

CMakeLists.txt Outdated
set(CUDA_FOUND OFF CACHE BOOL "CUDA is disabled (threads not found)")
endif()
else()
message(STATUS "OMR: CUDA disabled (unsupported platform)")
set(CUDA_FOUND OFF CACHE BOOL "CUDA is disabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this message also have the "(unsupported platform)" added to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

CMakeLists.txt Outdated
find_package(CUDA)
find_package(CUDA QUIET)
if (CUDA_FOUND)
message(STATUS "OMR: CUDA enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone requests to explicitly disable CUDA support, will this code still run (and print "OMR: CUDA enabled" if all the libraries are present)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Want that changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CUDA is explicitly disabled, there should be no messages about any attempts to find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the flip side, if CUDA is explicitly enabled and directed to use a specific version, this should not fail if nvcc is not found (OMR doesn't need it). See [1] which only makes the necessary header files available.

[1] eclipse-openj9/openj9#7280

Copy link
Member Author

@rwy7 rwy7 Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, is there a specific CUDA header I can check for in the build system? And I suppose this means there are no CUDA libraries we need to link? On what platforms do we support CUDA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMR uses few of the header files from CUDA: I think checking for include/cuda.h is probably sufficient.

@rwy7
Copy link
Member Author

rwy7 commented Oct 16, 2019

Based on feedback on this PR, and in the OMR architecture meeting, I'm planning on making the following change: OMR_OPT_CUDA will be off by default, and must be explicitly enabled by the user. Since OMR doesn't require a complete CUDA SDK installed, I'm going to stop using the built-in FindCUDA package, and only search for what we need.

@rwy7 rwy7 changed the title Soften message when CUDA isn't supported Turn CUDA off by default Oct 16, 2019
@rwy7
Copy link
Member Author

rwy7 commented Oct 28, 2019

OK I've pushed a new version of this commit, that just checks for cuda.h and cuda_runtime.h. This new version no longer automatically enables OMR_OPT_CUDA. If a user does turn this flag on, but the dependencies are missing, it's treated as a hard error.

@rwy7
Copy link
Member Author

rwy7 commented Oct 28, 2019

@genie-omr build all

@rwy7 rwy7 force-pushed the gentle-not-found branch 2 times, most recently from 31f4f6c to e19d2be Compare October 28, 2019 16:44
Comment on lines 46 to 63
find_path(OmrCuda_CUDA_H_DIR
NAMES cuda.h
PATHS
${OmrCuda_SEARCH_PATH}
PATH_SUFFIXES
include
DOC "The cuda.h include directory"
)

find_path(OmrCuda_CUDA_RUNTIME_H_DIR
NAMES cuda_runime.h
PATHS
${OmrCuda_SEARCH_PATH}
PATH_SUFFIXES
include
DOC "The cuda_runtime.h include directory"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OmrCuda_CUDA_H_DIR and OmrCuda_CUDA_RUNTIME_H_DIR should always be the same: I think it more appropriate to check for both header files in a single directory (and just add that one to the include path).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've made it so we look for cuda_runtime.h in whichever directory cuda.h is in.

@rwy7
Copy link
Member Author

rwy7 commented Oct 28, 2019

@genie-omr build xlinux

@rwy7
Copy link
Member Author

rwy7 commented Oct 28, 2019

@genie-omr build xlinux

@rwy7
Copy link
Member Author

rwy7 commented Oct 28, 2019

@genie-omr build xlinux

#############################################################################

#
# This package locates a minimum set of cuda resources, required by the OMR_OPT_CUDA flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'CUDA' is an acronym which should be in upper-case.

@rwy7 rwy7 requested a review from mstoodle November 23, 2019 00:11
@rwy7
Copy link
Member Author

rwy7 commented Dec 3, 2019

@dnakamura do you want to review this?
@mstoodle are you okay with the new (much quieter) behaviour?
@genie-omr build all

"$ENV{CUDA_HOME}"
"$ENV{CUDA_PATH}"
"$ENV{CUDA_INC_PATH}"
"$ENV{NVSDKCOMPUTE_ROOT}/C"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the /C suffix here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stolen from FindCUDA.cmake. Apparently, after CUDA 3.0, the C headers were moved under the C directory. If you know this is wrong, please let me know--I am not familiar with CUDA, I just tried to replicate the FindCUDA search order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to include NVSDKCOMPUTE_ROOT; if it's empty, the effect is to consider /C which probably isn't what we want.

I'd suggest we simplify this so it just checks that the user has specified (a reasonable value for) OMR_CUDA_HOME if OMR_OPT_CUDA is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or make it behave like the documentation for OMR_OPT_CUDA says:

Path to the CUDA SDK. Takes precedence over CUDA_HOME in OMR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to try to automatically locate CUDA. I would rather improve the search path than remove the functionality outright.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a cmake build on zLinux yesterday: the search for CUDA failed. CMake is configured explicitly with -DJ9VM_OPT_CUDA=OFF which OpenJ9 maps to OMR_OPT_CUDA=OFF. We shouldn't even be trying to find CUDA in that case. I don't know if there's a way to tell the difference between OFF by default and OFF on the command-line: if not, then I suggest we have to scale this back so only validate a path supplied in CUDA_BIN_PATH (or whatever CMake variables we settle on).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with this PR, cmake will search for CUDA only if OMR_OPT_CUDA is enabled. We can still provide a default search path for CUDA headers, and there is no need to differentiate why the feature is off. Does that sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller specifies OMR_CUDA_HOME, this should not consider any other location that might happen to be an acceptable CUDA installation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

@rwy7 rwy7 force-pushed the gentle-not-found branch 2 times, most recently from f129bc9 to 6e3011b Compare January 22, 2020 19:19
elseif(ENV{CUDA_HOME})
set(OmrCuda_SEARCH_DIR "$ENV{CUDA_HOME}")
else()
message(WARNING "CUDA support requested, but OMR_CUDA_HOME/CUDA_HOME are not set.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn. If we make this an error, and the user doesn't specify REQUIRED in find_package(OmrCuda REQUIRED), then we will have a hard error when there shouldn't be. However, this package doesn't function as a traditional "find package" module, and I find it hard to believe anyone would use this package without OMR_OPT_CUDA. Making this an error is probably fine.

If we leave this as a warning, the build will still error out when it's misconfigured, because OMR makes OmrCuda REQUIRED when OMR_OPT_CUDA is enabled.

I could go either way, so let me know what you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about the (optional)REQUIRED argument to FindOmrCuda. I think it should be an error in the required case and I'm fine with a warning otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -202,6 +202,6 @@ set(OMR_NOTIFY_POLICY_CONTROL OFF CACHE BOOL "TODO: Document")

set(OMR_ENV_GCC OFF CACHE BOOL "TODO: Document")

set(OMR_OPT_CUDA ${CUDA_FOUND} CACHE BOOL "Enable CUDA support in OMR")
set(OMR_OPT_CUDA OFF CACHE BOOL "Enable CUDA support in OMR")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps OMR_CUDA_HOME should be set (and documented) here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to keep OMR_CUDA_HOME in the FindOmrCuda module, but I could add a note to OMR_OPT_CUDA saying users should set OMR_CUDA_HOME. Would that be OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the set(OMR_CUDA_HOME) call that would appear here would also specify the type (directory or path?), but mentioning it in the description of OMR_OPT_CUDA works for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've added the mention, and I've changed the type of OMR_CUDA_HOME to PATH (that means "path to a directory", it was just STRING before).

@DanHeidinga
Copy link
Contributor

@rwy0717 @mstoodle What's the next step on this PR? There's an OpenJ9 cmake PR that depends on this so I'd like to see it merged unless there's something that still needs to be done

@keithc-ca
Copy link
Member

When this is merged, the Java extension repos will need to change as well. Previously, CUDA_BIN_PATH would identify the CUDA installation location; with this OMR_CUDA_HOME should be used in places like https://github.com/ibmruntimes/openj9-openjdk-jdk8/blob/openj9/closed/OpenJ9.gmk#L497

1. Stop automatically enabling OMR_OPT_CUDA. This flag must now be
   enabled by the user.
2. Only search for CUDA resources if the feature has been enabled.
3. Only search for CUDA headers, not all resources.
4. Require that the user specify where CUDA is located via the
   OMR_CUDA_HOME directory. Do not search standard paths for CUDA
   resources.

Signed-off-by: Robert Young <rwy0717@gmail.com>
@rwy7
Copy link
Member Author

rwy7 commented Feb 10, 2020

Just rebased.
@genie-omr build all

@rwy7
Copy link
Member Author

rwy7 commented Feb 10, 2020

Here are two PRs updating the extension repos.

There isn't anything to change in jdk-next, yet, but let me know if I'm missing any other places.

@DanHeidinga
Copy link
Contributor

+1 The extensions repos changes look to cover all repos with cmake so far.

@keithc-ca Any further concerns?

Copy link
Member

@keithc-ca keithc-ca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go.

@dnakamura
Copy link
Contributor

LGTM

Copy link
Contributor

@mstoodle mstoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants