Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Rebase) Partial fix to compile time issues w/nvcc + Kokkos_ENABLE_DEBUG_BOUNDS_CHECK #7013

Merged
merged 6 commits into from
May 29, 2024

Conversation

nmm0
Copy link
Contributor

@nmm0 nmm0 commented May 15, 2024

Rebase of #6596

For reference when configured with:

  CMAKE_BUILD_TYPE="Debug"
  CMAKE_COMPILE_WARNING_AS_ERROR="ON"
  CMAKE_CXX_COMPILER="kokkos/bin/nvcc_wrapper"
  CMAKE_CXX_STANDARD="17"
  CMAKE_EXE_LINKER_FLAGS="-rdynamic"
  CMAKE_EXPORT_COMPILE_COMMANDS="ON"
  CMAKE_INSTALL_PREFIX:PATH="..."
  CMAKE_POSITION_INDEPENDENT_CODE="ON"
  CMAKE_VERBOSE_MAKEFILE:BOOL="OFF"
  CUDAToolkit_BIN_DIR=".../spack-env/view/gcc-12.2.0/cuda/11.6.0/bin"
  CUDAToolkit_ROOT=".../spack-env/view/gcc-12.2.0/cuda/11.6.0"
  Kokkos_ENABLE_CUDA="ON"
  Kokkos_ENABLE_CUDA_CONSTEXPR="OFF"
  Kokkos_ENABLE_CUDA_LAMBDA="ON"
  Kokkos_ENABLE_DEBUG="ON"
  Kokkos_ENABLE_DEBUG_BOUNDS_CHECK="ON"
  Kokkos_ENABLE_DEPRECATED_CODE_3:BOOL="OFF"
  Kokkos_ENABLE_DEPRECATED_CODE_4:BOOL="OFF"
  Kokkos_ENABLE_OPENMP:BOOL="OFF"
  Kokkos_ENABLE_SERIAL:BOOL="ON"
  Kokkos_ENABLE_TESTS="ON"

This (rebased) PR builds on my machine with -j 16 in 20 mins. Current develop takes 70 mins.

I have tested this on a Pascal arch machine, all tests pass (except for the 1gb memory test since I don't have that on my machine. I specifically tested the random number tests 100 times in a row with no failure.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't pass Windows CUDA testing (doesn't like noreturn on something)

core/src/Cuda/Kokkos_Cuda_abort.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_abort.hpp Outdated Show resolved Hide resolved
Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
@masterleinad masterleinad requested a review from crtrott May 16, 2024 17:36
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need

#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
// required to workaround failures in random number generator unit tests with
// pre-volta architectures
#define KOKKOS_IMPL_ABORT_NORETURN
#else
// cuda_abort aborts when building for other platforms than macOS
#define KOKKOS_IMPL_ABORT_NORETURN [[noreturn]]
#endif

then?

Can/should we handle the [[noreturn]] part elsewhere?
Isn't this fix mostly about using the __noinline__ specifier?

extern __device__ void __assertfail(const void *message, const void *file,
unsigned int line, const void *function,
size_t charsize);
__device__ [[noreturn]] void __assertfail(const void *message, const void *file,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I can see how the extern specifier didn't make much sense but, out of curiosity, is that change necessary (not asking to change anything at this time).
Also, for consistency I would have slightly preferred if the [[noreturn]] attribute came first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the extern change was necessary -- but looks like clang cuda also chokes on the [[noreturn]] not being first so I moved it

@dalg24
Copy link
Member

dalg24 commented May 22, 2024

Retest this please

@crtrott
Copy link
Member

crtrott commented May 22, 2024

Clang CUDA doesn't work:

/var/jenkins/workspace/Kokkos_PR-7013/core/src/Cuda/Kokkos_Cuda_abort.hpp:31:12: error: an attribute list cannot appear here [clang-diagnostic-error]
__device__ [[noreturn]] void __assertfail(const void *message, const void *file,

@nmm0
Copy link
Contributor Author

nmm0 commented May 22, 2024

Clang CUDA doesn't work:

/var/jenkins/workspace/Kokkos_PR-7013/core/src/Cuda/Kokkos_Cuda_abort.hpp:31:12: error: an attribute list cannot appear here [clang-diagnostic-error]
__device__ [[noreturn]] void __assertfail(const void *message, const void *file,

@crtrott let me see if reversing the order of [[noreturn]] will do it. It's more consistent anyway

@nmm0
Copy link
Contributor Author

nmm0 commented May 22, 2024

Do we still need

#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
// required to workaround failures in random number generator unit tests with
// pre-volta architectures
#define KOKKOS_IMPL_ABORT_NORETURN
#else
// cuda_abort aborts when building for other platforms than macOS
#define KOKKOS_IMPL_ABORT_NORETURN [[noreturn]]
#endif

then?

Can/should we handle the [[noreturn]] part elsewhere? Isn't this fix mostly about using the __noinline__ specifier?

@dalg24 interestingly I get Kokkos_Abort.hpp(99): warning #1305-D: function declared with "noreturn" does return when trying to remove so I guess in some code paths it can return

@nmm0
Copy link
Contributor Author

nmm0 commented May 22, 2024

Do we still need

#if defined(KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK)
// required to workaround failures in random number generator unit tests with
// pre-volta architectures
#define KOKKOS_IMPL_ABORT_NORETURN
#else
// cuda_abort aborts when building for other platforms than macOS
#define KOKKOS_IMPL_ABORT_NORETURN [[noreturn]]
#endif

then?
Can/should we handle the [[noreturn]] part elsewhere? Isn't this fix mostly about using the __noinline__ specifier?

@dalg24 interestingly I get Kokkos_Abort.hpp(99): warning #1305-D: function declared with "noreturn" does return when trying to remove so I guess in some code paths it can return

Also, just tested and (maybe unsurprisingly) changing this doesn't change compilation time.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are errors with illegal instructions. Is that real?

@nmm0
Copy link
Contributor Author

nmm0 commented May 28, 2024

There are errors with illegal instructions. Is that real?

I suspect maybe they are... I'm looking at it this week

@nmm0
Copy link
Contributor Author

nmm0 commented May 28, 2024

Retest this please

@crtrott crtrott merged commit d65b67b into kokkos:develop May 29, 2024
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants