Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CUDA compilation with Clang compiler #161

Merged
merged 19 commits into from
Mar 9, 2021
Merged

Conversation

jrmadsen
Copy link
Collaborator

@jrmadsen jrmadsen commented Mar 9, 2021

  • support setting CMAKE_CUDA_COMPILER=clang++ or (via env) CUDACXX=clang++
  • improved automatic cuda arch detection
  • increased kokkos testing
  • renamed generic macros like GLOBAL_CALLABLE to TIMEMORY_GLOBAL_FUNCTION, etc.
  • fixed kokkosp weak binding symbols on macOS
  • renamed generic macros _UNIX, _MACOS, etc. to TIMEMORY_UNIX, TIMEMORY_MACOS, etc.
    • previous macros are still defined, will be removed eventually
  • replaced __NVCC__ with __CUDACC__
    • define _TIMEMORY_OPENMP_TARGET for OpenMP target compilation with timemory since OpenMP target may define __CUDACC__ but won't allow compiling raw CUDA
  • TIMEMORY_FOLD_EXPRESSION(...) implements C++17 fold expressions instead of C++14 fold expression via initializer list
  • miscellaneous CI changes to reduce turnover

- protect NVCC flags with gen-expr
- GPU "CALLABLE" -> TIMEMORY macros
- Use __CUDACC__ instead of __NVCC__
- _TIMEMORY_OPENMP_TARGET guards
- disable cupti_counters for CUDA 11
- define malloc_gotcha types explicitly
- threading::affinity::set
- CI testing for cuda + clang
- _UNIX == TIMEMORY_UNIX
- _LINUX == TIMEMORY_LINUX
- _WINDOWS == TIMEMORY_WINDOWS
- _MACOS == TIMEMORY_MACOS
- Added tests when kokkos sample is built
- Updated kokkos-common memory_entry_t to use user_kokkosp_bundle
- TIMEMORY_LIBRARY_SOURCE
- Fix visibility for kp_timemory on linux
- Fix weak binding on macOS
- Better data_tracker_tests
@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #161 (5277e0f) into develop (0653050) will increase coverage by 0.11%.
The diff coverage is 95.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #161      +/-   ##
===========================================
+ Coverage    82.60%   82.70%   +0.11%     
===========================================
  Files          240      240              
  Lines        16238    16245       +7     
===========================================
+ Hits         13411    13434      +23     
+ Misses        2827     2811      -16     
Impacted Files Coverage Δ
source/kokkosp.cpp 96.08% <ø> (+6.54%) ⬆️
source/library.cpp 75.62% <ø> (ø)
source/pthread.cpp 15.79% <ø> (ø)
source/timemory/api/kokkosp.hpp 100.00% <ø> (ø)
source/timemory/backends/cpu.hpp 85.19% <ø> (ø)
source/timemory/backends/cupti.hpp 70.17% <ø> (ø)
source/timemory/backends/memory.hpp 100.00% <ø> (ø)
source/timemory/backends/papi.hpp 83.55% <ø> (ø)
source/timemory/backends/process.hpp 100.00% <ø> (ø)
source/timemory/backends/threading.hpp 77.03% <ø> (ø)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0653050...5277e0f. Read the comment docs.

@jrmadsen jrmadsen merged commit 63f2106 into develop Mar 9, 2021
@jrmadsen jrmadsen deleted the clang-cuda-support branch March 9, 2021 20:58
jrmadsen added a commit that referenced this pull request Jun 28, 2021
* Added support for cuda compilation with clang

* Miscellaneous updates for clang + cuda

- protect NVCC flags with gen-expr
- GPU "CALLABLE" -> TIMEMORY macros
- Use __CUDACC__ instead of __NVCC__
- _TIMEMORY_OPENMP_TARGET guards
- disable cupti_counters for CUDA 11
- define malloc_gotcha types explicitly
- threading::affinity::set
- CI testing for cuda + clang

* Removed error check

* Macro updates + kokkos tests

- _UNIX == TIMEMORY_UNIX
- _LINUX == TIMEMORY_LINUX
- _WINDOWS == TIMEMORY_WINDOWS
- _MACOS == TIMEMORY_MACOS
- Added tests when kokkos sample is built
- Updated kokkos-common memory_entry_t to use user_kokkosp_bundle

* Update add_secondary.hpp

* CI updates

* Find launch_compiler component when finding Kokkos

* Fix to TIMEMORY_PYTHON_PLOTTER pre-processor def

* Use separable compilation for kokkos instead of launch_compiler

* cmake_defines fixes for default

* Update kokkos_compilation usage

* Enable HW counters for CUDA 11

* Kokkos-connector updates

* Remove kokkos_compilation macro usage

* More kokkos-connector updates

* Testing updates

* Miscellaneous testing fixes

- TIMEMORY_LIBRARY_SOURCE
- Fix visibility for kp_timemory on linux
- Fix weak binding on macOS
- Better data_tracker_tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant