-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New BVH::query() overload that only takes predicates and callback #329
Conversation
a672f6b
to
5366adf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The atomic thing in the example can be done in a separate PR, if desired.
I tried implementing a benchmark examples using this traversal type. It does not compile:
Essentially, if additional argument (such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix overloads choices. Definitly need a test that runs query(space, predicates, callback, traversal_policy)
and chooses the correct overload. The test may just be that it compiles.
I pushed a benchmark for this type of callback. It does not compile yet, but it should once the issue with overloads is fixed. |
0b8a5b9
to
b5ad61e
Compare
Well, maybe I was to fast to say my patch would compile. Needs a bit of tweaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think permutation is broken for type 1 travesal. The callback(query_index, primitive_index)
is called for non-permuted query_index
while queries are permuted. So either we need to wrap the callback, or we need to merge #305 first.
I discovered this issue by a quick modification of halo finder example, trying to see how doing type 1 traversal affects its performance. It produced completely wrong halos.
Yeah, we need tests.
I did the following change to the example to try it out with HACC: diff --git a/examples/halo_finder/ArborX_HaloFinder.hpp b/examples/halo_finder/ArborX_HaloFinder.hpp
index 7afb9d7..2c92dac 100644
--- a/examples/halo_finder/ArborX_HaloFinder.hpp
+++ b/examples/halo_finder/ArborX_HaloFinder.hpp
@@ -196,7 +196,6 @@ bool verifyCC(ExecutionSpace exec_space, IndicesView indices, OffsetView offset,
template <typename MemorySpace>
struct CCSCallback
{
- using tag = ArborX::Details::InlineCallbackTag;
Kokkos::View<int *, MemorySpace> stat_;
// Per [1]:
@@ -240,9 +239,8 @@ struct CCSCallback
return curr;
}
- template <typename Query, typename Insert>
- KOKKOS_FUNCTION void operator()(Query const &query, int j,
- Insert const &) const
+ template <typename Query>
+ KOKKOS_FUNCTION void operator()(Query const &query, int j) const
{
int const i = ArborX::getData(query);
@@ -353,14 +351,11 @@ void findHalos(ExecutionSpace exec_space, Primitives const &primitives,
// insert() will not be called
start = clock::now();
Kokkos::Profiling::pushRegion("ArborX:HaloFinder:ccs");
- Kokkos::View<int *, MemorySpace> indices("indices", 0);
- Kokkos::View<int *, MemorySpace> offset("offset", 0);
Kokkos::View<int *, MemorySpace> stat(
Kokkos::ViewAllocateWithoutInitializing("stat"), n);
ArborX::iota(exec_space, stat);
Kokkos::Profiling::pushRegion("ArborX:HaloFinder:ccs:query");
- bvh.query(exec_space, predicates, CCSCallback<MemorySpace>{stat}, indices,
- offset);
+ bvh.query(exec_space, predicates, CCSCallback<MemorySpace>{stat});
Kokkos::Profiling::popRegion();
// Per [1]:
//
@@ -393,6 +388,8 @@ void findHalos(ExecutionSpace exec_space, Primitives const &primitives,
start = clock::now();
Kokkos::Profiling::pushRegion("ArborX:HaloFinder:verify");
+ Kokkos::View<int *, MemorySpace> indices("indices", 0);
+ Kokkos::View<int *, MemorySpace> offset("offset", 0);
bvh.query(exec_space, predicates, indices, offset);
auto passed = verifyCC(exec_space, indices, offset, ccs);
printf("Verification %s\n", (passed ? "passed" : "failed")); This does not compile with
|
Will fix |
With 7bf380d and HACC patch, HACC' query runs about 7% faster. Summit results are pending. |
Summit results (59cbe5c vs 7bf380d)Serial
OpenMP
Cuda
|
I'm not ready to accept this yet. Will go through the code to see if I can identify the reason for slowdown and fix that. Will have to wait for a few days, though. |
2695781
to
15d50db
Compare
…passed to the callback in the new overload
15d50db
to
7d4ebbe
Compare
Current version (master ffed0fe vs this branch 7d4ebbe). Only posting radius search results, as construction and knn did not change (here are raw logs: 202008071619_master_ffed0fe.txt Serial
OpenMP
Cuda
So, Serial ~10% slower everywhere. OpenMP ~6%. Cuda unaffected. After looking exhaustively, I did not find a solution. The only thing that comes to mind is the way I removed the new test from benchmark during rebase, and will introduce it in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.