Conversation
The error appeared to be benign, but it generated several warnings. To get around it, we added the MathFcn class, which dispatches the function call to the proper implementation (cuda vs std).
For some reason this never caused compilation errors, but it started after the previous commit.
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
|
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
This moves some code and then for testing only replaces a macro with a function? Or is there more? |
I moved the pack math overloads to their own file, cause ekat_pack.hpp was a bit long. For testing, nothing has changed, it's the same as before. What changed is how the different math fcns overloads are generated: the macro instantiating the pack overloads now calls |
|
Sorry for being dense, but is this introducing one extra level of function call for things like pow? |
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
@ambrad yes, although it should be inlined (I may replace |
|
Ok. Yes, please at least use KOKKOS_FORCEINLINE_FUNCTION. We have clear evidence from studies ~2 years ago that if one isn't very careful, Also, I don't see why an extra fn call is necessary. Why not just use macros whose def depends on CUDA_ARCH? This is super low-level code, so worrying about things like fn calls destroying vectorization is important. |
|
To give more context: things were also (almost) working with something like this since cuda overloads Edit: that's why I was asking on slack about ints. The only workaround I could find was to explicitly cast integers to double, but to do that without messing up the sp build, I had to do multiple impl. And partial specialization wasn't possible, so in the end I used a helper struct for the specialization of the core function on integer/floating. |
|
Just a semi-related question, if KOKKOS_FORCEINLINE_FUNCTION is more reliable than KOKKOS_INLINE_FUNCTION, should we just always use KOKKOS_FORCEINLINE? If not, when to prefer one over the other? |
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ jgfouca ]! |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
I think the point is that you don't want to inline the whole library. For key functions, you want to be sure it is inlined, e.g., for vectorization reasons, as Andrew pointed out. But in other less crucial cases, it's probably best to let the compiler decide. |
|
@jgfouca, I agree strongly with Luca. FORCEINLINE must be used very carefully. It's pretty much meant only for dealing with things like this. |
- use FORCEINLINE instead of INLINE (crucial for vectorization) - use only 2 specializations, since blah(x) on CUDA works for for both double and float.
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
|
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
Ok, I removed testing of math functions for int/long. Math fcns calls on packs of integers will result in a compiler error on cuda (except for abs, which is overloaded for int/long). This also removed the need for MathFcn, and the discussion on FORCEINLINE. Note: I did however replace INLINE with FORCEINLINE on the max/min functions in ekat_math_utils.hpp, since it is possible they might get called inside vectorizable loops. |
This comes at the (small) price of not providing overloads of standard math functions for packs of integers.
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Mappy
Jenkins Parameters
|
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ jgfouca ambrad ]! |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
Motivation
On CUDA, the compilation of pack math functions caused a bunch of warnings, regarding std::blah functions not being a device fcn but being called inside a
__host__ __device__function. I am not 100% sure why this warning generated, and yet was not a real problem.This PR fixes the warning, by providing 3 implementation of all the math functions, for double, float, and integer types.
Related Issues
Testing
These modification affect only pack operations implementations, which continue to be tested as they were before.