Turn CUDA off by default #4406

rwy7 · 2019-10-03T19:16:55Z

Stop automatically enabling OMR_OPT_CUDA, so a user has to specifically opt-in for CUDA support.
Stop using the built-in FindCUDA package, because it requires a complete installation of CUDA. We only need a small subset of the CUDA SDK.
Enable OMR_OPT_CUDA explicitly in X86-64 builds

mstoodle

Looks good in principle. One question and a few hopefully quick changes requested before we merge.

mstoodle · 2019-10-03T19:45:28Z

CMakeLists.txt

@@ -68,15 +68,22 @@ set(OMR_INSTALL_DATA_DIR ${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME} CACHE PATH "In
 ###

 if (OMR_ENV_DATA64)
-	find_package(Threads)
+	find_package(Threads QUIET)
 	if(Threads_FOUND)
 		# Threads mush be found before we can look for CUDA.


maybe correct this typo "mush" while you're here?

mstoodle · 2019-10-03T19:45:55Z

CMakeLists.txt

@@ -68,15 +68,22 @@ set(OMR_INSTALL_DATA_DIR ${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME} CACHE PATH "In
 ###

 if (OMR_ENV_DATA64)
-	find_package(Threads)
+	find_package(Threads QUIET)
 	if(Threads_FOUND)
 		# Threads mush be found before we can look for CUDA.
 		# FindCuda will error out if Threads is missing, even though CUDA itself is optional.


also should correct this comment as it no longer matches what will happen

I'm not sure what needs to be corrected here.

Once you added "QUIET" it's no longer true that "FindCuda will error out" is it? Or am I reading too much into this comment?

Oh I see what you're saying. FindCUDA is already optional, now we're just making it quiet too. The comment is documenting what I consider a bug in FindCUDA module. FindCUDA treats FindThreads as required, even if FindCUDA itself is optional. So we have to ensure that FindThreads exists before we use FindCUDA.

mstoodle · 2019-10-03T19:47:19Z

CMakeLists.txt

 		set(CUDA_FOUND OFF CACHE BOOL "CUDA is disabled (threads not found)")
 	endif()
 else()
+	message(STATUS "OMR: CUDA disabled (unsupported platform)")
 	set(CUDA_FOUND OFF CACHE BOOL "CUDA is disabled")


should this message also have the "(unsupported platform)" added to it?

mstoodle · 2019-10-03T19:49:16Z

CMakeLists.txt

-		find_package(CUDA)
+		find_package(CUDA QUIET)
+		if (CUDA_FOUND)
+			message(STATUS "OMR: CUDA enabled")


If someone requests to explicitly disable CUDA support, will this code still run (and print "OMR: CUDA enabled" if all the libraries are present)?

Yep. Want that changed?

If CUDA is explicitly disabled, there should be no messages about any attempts to find it.

On the flip side, if CUDA is explicitly enabled and directed to use a specific version, this should not fail if nvcc is not found (OMR doesn't need it). See [1] which only makes the necessary header files available.

[1] eclipse-openj9/openj9#7280

OK, is there a specific CUDA header I can check for in the build system? And I suppose this means there are no CUDA libraries we need to link? On what platforms do we support CUDA?

OMR uses few of the header files from CUDA: I think checking for include/cuda.h is probably sufficient.

rwy7 · 2019-10-16T14:37:35Z

Based on feedback on this PR, and in the OMR architecture meeting, I'm planning on making the following change: OMR_OPT_CUDA will be off by default, and must be explicitly enabled by the user. Since OMR doesn't require a complete CUDA SDK installed, I'm going to stop using the built-in FindCUDA package, and only search for what we need.

rwy7 · 2019-10-28T15:47:15Z

OK I've pushed a new version of this commit, that just checks for cuda.h and cuda_runtime.h. This new version no longer automatically enables OMR_OPT_CUDA. If a user does turn this flag on, but the dependencies are missing, it's treated as a hard error.

rwy7 · 2019-10-28T15:52:34Z

@genie-omr build all

keithc-ca · 2019-10-28T16:47:26Z

cmake/modules/FindOmrCuda.cmake

+find_path(OmrCuda_CUDA_H_DIR
+	NAMES cuda.h
+	PATHS
+		${OmrCuda_SEARCH_PATH}
+	PATH_SUFFIXES
+		include
+	DOC "The cuda.h include directory"
+)
+
+find_path(OmrCuda_CUDA_RUNTIME_H_DIR
+	NAMES cuda_runime.h
+	PATHS
+		${OmrCuda_SEARCH_PATH}
+	PATH_SUFFIXES
+		include
+	DOC "The cuda_runtime.h include directory"
+)


OmrCuda_CUDA_H_DIR and OmrCuda_CUDA_RUNTIME_H_DIR should always be the same: I think it more appropriate to check for both header files in a single directory (and just add that one to the include path).

OK I've made it so we look for cuda_runtime.h in whichever directory cuda.h is in.

rwy7 · 2019-10-28T16:48:34Z

@genie-omr build xlinux

rwy7 · 2019-10-28T16:51:54Z

@genie-omr build xlinux

rwy7 · 2019-10-28T23:19:36Z

@genie-omr build xlinux

keithc-ca · 2019-10-29T20:36:43Z

cmake/modules/FindOmrCuda.cmake

+#############################################################################
+
+#
+# This package locates a minimum set of cuda resources, required by the OMR_OPT_CUDA flag.


nit: 'CUDA' is an acronym which should be in upper-case.

rwy7 · 2019-12-03T19:35:00Z

@dnakamura do you want to review this?
@mstoodle are you okay with the new (much quieter) behaviour?
@genie-omr build all

keithc-ca · 2019-12-03T21:12:36Z

cmake/modules/FindOmrCuda.cmake

+	"$ENV{CUDA_HOME}"
+	"$ENV{CUDA_PATH}"
+	"$ENV{CUDA_INC_PATH}"
+	"$ENV{NVSDKCOMPUTE_ROOT}/C"


Why the /C suffix here?

Stolen from FindCUDA.cmake. Apparently, after CUDA 3.0, the C headers were moved under the C directory. If you know this is wrong, please let me know--I am not familiar with CUDA, I just tried to replicate the FindCUDA search order.

I'm not sure it makes sense to include NVSDKCOMPUTE_ROOT; if it's empty, the effect is to consider /C which probably isn't what we want.

I'd suggest we simplify this so it just checks that the user has specified (a reasonable value for) OMR_CUDA_HOME if OMR_OPT_CUDA is set.

Or make it behave like the documentation for OMR_OPT_CUDA says:

Path to the CUDA SDK. Takes precedence over CUDA_HOME in OMR

I think it's reasonable to try to automatically locate CUDA. I would rather improve the search path than remove the functionality outright.

I tried a cmake build on zLinux yesterday: the search for CUDA failed. CMake is configured explicitly with -DJ9VM_OPT_CUDA=OFF which OpenJ9 maps to OMR_OPT_CUDA=OFF. We shouldn't even be trying to find CUDA in that case. I don't know if there's a way to tell the difference between OFF by default and OFF on the command-line: if not, then I suggest we have to scale this back so only validate a path supplied in CUDA_BIN_PATH (or whatever CMake variables we settle on).

Yes, with this PR, cmake will search for CUDA only if OMR_OPT_CUDA is enabled. We can still provide a default search path for CUDA headers, and there is no need to differentiate why the feature is off. Does that sound good?

If the caller specifies OMR_CUDA_HOME, this should not consider any other location that might happen to be an acceptable CUDA installation.

Sounds good to me 👍

keithc-ca · 2020-01-22T20:50:53Z

cmake/modules/FindOmrCuda.cmake

+elseif(ENV{CUDA_HOME})
+	set(OmrCuda_SEARCH_DIR "$ENV{CUDA_HOME}")
+else()
+	message(WARNING "CUDA support requested, but OMR_CUDA_HOME/CUDA_HOME are not set.")


Should this not be an error?

I was torn. If we make this an error, and the user doesn't specify REQUIRED in find_package(OmrCuda REQUIRED), then we will have a hard error when there shouldn't be. However, this package doesn't function as a traditional "find package" module, and I find it hard to believe anyone would use this package without OMR_OPT_CUDA. Making this an error is probably fine.

If we leave this as a warning, the build will still error out when it's misconfigured, because OMR makes OmrCuda REQUIRED when OMR_OPT_CUDA is enabled.

I could go either way, so let me know what you prefer.

I forgot about the (optional)REQUIRED argument to FindOmrCuda. I think it should be an error in the required case and I'm fine with a warning otherwise.

keithc-ca · 2020-01-23T14:57:33Z

cmake/config.cmake

@@ -202,6 +202,6 @@ set(OMR_NOTIFY_POLICY_CONTROL OFF CACHE BOOL "TODO: Document")

 set(OMR_ENV_GCC OFF CACHE BOOL "TODO: Document")

-set(OMR_OPT_CUDA ${CUDA_FOUND} CACHE BOOL "Enable CUDA support in OMR")
+set(OMR_OPT_CUDA OFF CACHE BOOL "Enable CUDA support in OMR")


Perhaps OMR_CUDA_HOME should be set (and documented) here?

I think it makes more sense to keep OMR_CUDA_HOME in the FindOmrCuda module, but I could add a note to OMR_OPT_CUDA saying users should set OMR_CUDA_HOME. Would that be OK?

I was thinking the set(OMR_CUDA_HOME) call that would appear here would also specify the type (directory or path?), but mentioning it in the description of OMR_OPT_CUDA works for me.

OK, I've added the mention, and I've changed the type of OMR_CUDA_HOME to PATH (that means "path to a directory", it was just STRING before).

DanHeidinga · 2020-02-08T18:51:40Z

@rwy0717 @mstoodle What's the next step on this PR? There's an OpenJ9 cmake PR that depends on this so I'd like to see it merged unless there's something that still needs to be done

keithc-ca · 2020-02-10T14:19:42Z

When this is merged, the Java extension repos will need to change as well. Previously, CUDA_BIN_PATH would identify the CUDA installation location; with this OMR_CUDA_HOME should be used in places like https://github.com/ibmruntimes/openj9-openjdk-jdk8/blob/openj9/closed/OpenJ9.gmk#L497

1. Stop automatically enabling OMR_OPT_CUDA. This flag must now be enabled by the user. 2. Only search for CUDA resources if the feature has been enabled. 3. Only search for CUDA headers, not all resources. 4. Require that the user specify where CUDA is located via the OMR_CUDA_HOME directory. Do not search standard paths for CUDA resources. Signed-off-by: Robert Young <rwy0717@gmail.com>

rwy7 · 2020-02-10T16:12:37Z

Just rebased.
@genie-omr build all

rwy7 · 2020-02-10T21:58:23Z

Here are two PRs updating the extension repos.

jdk 8
jdk 11

There isn't anything to change in jdk-next, yet, but let me know if I'm missing any other places.

DanHeidinga · 2020-02-13T14:44:06Z

+1 The extensions repos changes look to cover all repos with cmake so far.

@keithc-ca Any further concerns?

keithc-ca

I think we're good to go.

dnakamura · 2020-02-13T17:14:15Z

LGTM

mstoodle

lgtm

rwy7 added the cmake label Oct 3, 2019

rwy7 requested review from charliegracie and youngar as code owners October 3, 2019 19:16

rwy7 assigned mstoodle Oct 3, 2019

rwy7 mentioned this pull request Oct 3, 2019

Revert "Soften message when CUDA isn't supported" #4405

Merged

mstoodle requested changes Oct 3, 2019

View reviewed changes

rwy7 changed the title ~~Soften message when CUDA isn't supported~~ Turn CUDA off by default Oct 16, 2019

rwy7 force-pushed the gentle-not-found branch from 8ecd87c to ef21c8c Compare October 28, 2019 15:43

rwy7 force-pushed the gentle-not-found branch 2 times, most recently from 31f4f6c to e19d2be Compare October 28, 2019 16:44

keithc-ca reviewed Oct 28, 2019

View reviewed changes

rwy7 force-pushed the gentle-not-found branch from e19d2be to 3e5958a Compare October 28, 2019 16:51

rwy7 force-pushed the gentle-not-found branch from 3e5958a to ae9c3d4 Compare October 28, 2019 23:17

keithc-ca reviewed Oct 29, 2019

View reviewed changes

rwy7 requested a review from mstoodle November 23, 2019 00:11

keithc-ca reviewed Dec 3, 2019

View reviewed changes

rwy7 force-pushed the gentle-not-found branch 2 times, most recently from f129bc9 to 6e3011b Compare January 22, 2020 19:19

keithc-ca reviewed Jan 22, 2020

View reviewed changes

rwy7 force-pushed the gentle-not-found branch from 6e3011b to 8f92d94 Compare January 22, 2020 22:29

keithc-ca reviewed Jan 23, 2020

View reviewed changes

rwy7 force-pushed the gentle-not-found branch from 8f92d94 to 42321d0 Compare January 24, 2020 16:02

dnakamura approved these changes Jan 28, 2020

View reviewed changes

dnakamura mentioned this pull request Jan 29, 2020

Add plinux cmake jenkins build eclipse-openj9/openj9#8113

Merged

rwy7 force-pushed the gentle-not-found branch from 42321d0 to 91890a1 Compare February 10, 2020 16:10

keithc-ca approved these changes Feb 13, 2020

View reviewed changes

mstoodle approved these changes Feb 19, 2020

View reviewed changes

rwy7 assigned youngar and unassigned mstoodle Feb 19, 2020

youngar merged commit 0fcb146 into eclipse:master Feb 19, 2020

rwy7 deleted the gentle-not-found branch February 19, 2020 21:02

This was referenced Feb 20, 2020

Remove obsolete export of CUDA_BIN_PATH ibmruntimes/openj9-openjdk-jdk8#375

Merged

Remove obsolete export of CUDA_BIN_PATH ibmruntimes/openj9-openjdk-jdk11#266

Merged

Turn CUDA off by default #4406

Turn CUDA off by default #4406

Conversation

rwy7 commented Oct 3, 2019 • edited

mstoodle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstoodle Oct 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwy7 Oct 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwy7 commented Oct 16, 2019

rwy7 commented Oct 28, 2019

rwy7 commented Oct 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwy7 commented Oct 28, 2019

rwy7 commented Oct 28, 2019

rwy7 commented Oct 28, 2019

Choose a reason for hiding this comment

rwy7 commented Dec 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanHeidinga commented Feb 8, 2020

keithc-ca commented Feb 10, 2020

rwy7 commented Feb 10, 2020

rwy7 commented Feb 10, 2020

DanHeidinga commented Feb 13, 2020

keithc-ca left a comment

Choose a reason for hiding this comment

dnakamura commented Feb 13, 2020

mstoodle left a comment

Choose a reason for hiding this comment

rwy7 commented Oct 3, 2019 •

edited

mstoodle Oct 5, 2019 •

edited

rwy7 Oct 16, 2019 •

edited