-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parallel_reduce #34
Conversation
Thanks for opening this pull request! Please check out our contributing guidelines. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, begin, end), lambda, result); | ||
} | ||
|
||
template<typename PackType, typename ValueType, bool Serialize=false> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't address the primary need. Luca suggested looking at https://github.com/E3SM-Project/E3SM/blob/b7b269b3cfaca869e861fa01e6d65e201dc588a3/components/homme/src/share/cxx/ColumnOps.hpp#L474, and I agree that's the best approach. The minor issue of providing a reduction over a single pack can wait for a future PR, and when it's done, it will live in ekat_pack.hpp, since it has nothing to do with Kokkos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yes I wasn't sure since the suggestion in the issue listed only the single pack version, but I will implement this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #26 (comment) and subsequent comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being moved to a new PR
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Blake # 23 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Bowman # 20 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Weaver # 24 (click to expand)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementations look good, but I have a few requests. Feel free to push back on what you disagree.
|
||
template<typename PackType, typename ValueType, bool Serialize=false> | ||
static KOKKOS_INLINE_FUNCTION | ||
void reduce_add (const PackType& p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in the ekat_pack.hpp file. I would also add it as a method of the pack class, actually, but free function is probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll implement this in ekat_pack.hpp (is it ok as part of this pull request?) and then add the column_reduce()
function here as discussed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can go both ways (single PR vs two PRs). Multiple PRs make code review faster, but I don't feel strongly either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being moved to a new PR
else | ||
{ | ||
typename PackType::scalar sum = p[0]; | ||
vector_simd for (int i = 1; i < PackType::n; ++i) sum += p[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting to write a micro reproducer for a simd reduction (without ekat, just plain cpp), and check if vector_simd vectorizes. If not, try with something like
#define vector_simd_reduce(op,result) _Pragma("omp simd reduction(op:result)")
(if openmp is enabled, and expand to nothing otherwise). I suspect vector_simd
might not always vectorize, while the omp simd reduction will. Or, at least, the latter will vectorize more times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a new PR
tests/kokkos/kokkos_utils_tests.cpp
Outdated
ekat::ExeSpaceUtils<ExeSpace>::parallel_reduce(team, begin, end, | ||
[&] (const int k, Scalar& reduction_value) { | ||
reduction_value += data[k]; | ||
}, true/*serialize*/, team_result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider adding a short "fake" test for serialize=false. Not to check the answer (will be non deterministic), but to check stuff compiles and runs without issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this test, I added a template parameter for whether or not the REQUIRE(results_h(0) == serial_result);
is run based on EKAT_TEST_STRICT_FP
. So on a non-bfb run the test runs just without the failing assertion. Is this acceptable? I'm can definitely change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outdated now. I will run a test with both Serialize=true/false
and only test values when true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of EKAT_TEST_STRICT_FP
. Ekat offers some features, that should work (perhaps following different code paths) regardless. We should test both a "strict" code flow, as well as a "release" code flow. Obviously, the checks that need to occur are different (like you suggested, not running the REQUIRE clause if not a strict run).
That said, I don't see where the extra tepmlate parameter is. Maybe you haven't pushed yet? Either way, let's make sure that the function is executed with both serialize true and false (and also check results for the true case).
tests/kokkos/kokkos_utils_tests.cpp
Outdated
} | ||
|
||
ekat::ExeSpaceUtils<ExeSpace>::reduce_add<PackType,Scalar,true/*serialize*/>(pack,reduce_add_result); | ||
REQUIRE(reduce_add_result == serial_result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test both serialize and not serialize. However, is the vector reduction guaranteed to give the same result if serialize=false? I suspect not. However, if your input contains powers of two only (of not too different exponents), then the floating point ops are commutative. You could for instance check that a pack containing [1, 0.5, 0.25, ..., 0.5^N-1]
sums up to (1-0.5^N)/(1-0.5)
(I hope I didn't mess up the math formula, but you get the gist). This sum should always be exact (at least in single/double precision, if we ever add half precision, we may have to limit pack size N to something like 10).
{ | ||
for (int i = 0; i < PackType::n; ++i) | ||
result += p[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is me being pedantic (read: feel free to ignore this), but in the rest of scream/ekat, we put opening braces on the same level of the if/for statements, and else statements appear as } else {
(one line).
Not a big deal, feel free to disregard.
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Blake # 24 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Bowman # 21 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Weaver # 25 (click to expand)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Blake # 25 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Bowman # 22 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Weaver # 26 (click to expand)
|
@bartgol @ambrad: Talking with Luca, I decided to move the column reduction to a new PR (I will make later today). This is now only for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm late to the party on this one. I didn't have much to add to what Andrew and Luca already said. Nice work.
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ bartgol jeff-cohere ambrad ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
32611a9
@bartgol @ambrad @jeff-cohere I decided to make The only issue is that, because the ExeSpace template is defaulted, |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Blake # 28 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Bowman # 25 (click to expand)
Console Output (last 100 lines) : EKAT_PullRequest_Autotester_Weaver # 29 (click to expand)
|
@tcclevenger I like this.
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: EKAT_PullRequest_Autotester_Blake
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Bowman
Jenkins Parameters
Build InformationTest Name: EKAT_PullRequest_Autotester_Weaver
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
I don't have a strong opinion on defaulting Serialize to false. It is definitely the safest default, I guess, but it would not solve the problem of downstream apps having to specify the argument everywhere, or, at least, in every use case that will be part of a BFB test, no? I mean, if you are using Besides, given the current feedback on having EKAT_BFB_DEFAULTS (or similar), I think we will end up with a default that is set at config-time. Imho, let's leave no default now, and probably bring configurable default in soon in another PR. |
Sounds good. It looks like the it needs to be approved again before being able to merge. Also, do you like to squash your commits, or it good as is? |
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ ambrad ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
Congrats on merging your first pull request! We hope this is only the first of many to come. |
Motivation
ExeSpaceUtils::parallel_reduce()
discussed in Issue #32. Allows for a serial implementation forparallel_reduce()
for the purpose of matching bfb to fortran.ExeSpaceUtils::reduce_add()
discussed in Issue #26. Provides a function which gives the sum over anekat::Pack
. Also allows for a serial implementation for the purpose of matching bfb to fortran.Testing
I created 2 tests in tests/kokkos/kokkos_utils_tests.cpp:
parallel_reduce
andreduce_add
, which compare some simple serial computation done in for loops (like in fortran) with the serialized version of the implemented functions. Each test, when tested on the GPU, passes only whenserialize=true
. On the CPU,parallel_reduce
passes withserialize=false
, but that may be expected.Additional Information
A few comments/questions:
bool serialize
would be a template parameter in both functions, however forparallel_reduce()
I was unsure how I would call the function given that I do not know the explicit Lambda type and am relying on the compiler to detect it for me. Also, the src directory does not seem to know the correct value ofEKAT_TEST_STRICT_FP
, so I couldn't use that as in this example (line 321). Open to suggestions.reduce_add()
, I'm guessing thatPackType::n
return the potential size of the pack, and does not account for garbage if it is not completely full? Which could be an issue if I'm summing up all entries.. From here, they were able to accessColInfo<LENGTH>::LastPackLen
to test if the final pack was full. Do I have something similar available here?