-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement nearest query for BruteForce #1053
Conversation
Kokkos::View<PairIndexDistance *, MemorySpace> _buffer; | ||
Kokkos::View<int *, MemorySpace> _offset; | ||
|
||
NearestBufferProvider() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a default constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we don't need to allocate the storage when called for an empty tree. So, we could avoid doing the scan over primitives k's in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we never call it do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We implicitly do, in the TreeTraversal constructor.
if (k < 1) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this really should be a precondition
This was brought up in the past but I see we still do not enforce in TreeTraversal.
using PairIndexDistance = | ||
typename NearestBufferProvider<MemorySpace>::PairIndexDistance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up we should think of making this a struct with named parameters.
This is really ugly below when we refer to the "second" to signify the distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fine with me. The thing I am starting to dislike is having all those PairValueIndex, PairIndexRank, PairIndexDistance thingies floating around. Wonder if there's a better way to handle that.
while (!heap.empty()) | ||
{ | ||
callback(predicate, values(heap.top().first)); | ||
heap.pop(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably comment that this is sorting the heap.
We technically did not intend to guarantee any order for nearest queries but I suppose we do sort as well in TreeTraversal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except it does not sort the heap in the increasing order. Rather, it is in decreasing order. So, the callbacks here would be called in a different order than in BVH (where they would be called in increasing distance order).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right...
Why did you choose to do that instead of just looping over the elements of the underlying storage?
I know this escapes the control of the data structure but it is more efficient because it skips the heap operations.
(not blocking nor asking you to change at this time, just trying to figure out why you did it this way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should do
sortHeap(heap.data(), heap.data() + heap.size(), heap.valueComp());
for (decltype(heap.size()) i = 0; i < heap.size(); ++i)
_callback(predicate, values(heap.data() + i)->first);
We could skip sortHeap
, but I wonder if we should. If we don't, we would replicate behavior of the BVH in that the callback will be called in the order from the nearest to further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented with sorting.
{ | ||
|
||
template <typename MemorySpace> | ||
struct NearestBufferProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self to get back to this.
Kokkos::parallel_for( | ||
"ArborX::NearestBufferProvider::scan_queries_for_numbers_of_neighbors", | ||
Kokkos::RangePolicy<ExecutionSpace>(space, 0, n_queries), | ||
KOKKOS_CLASS_LAMBDA(int i) { _offset(i) = getK(predicates(i)); }); | ||
KokkosExt::exclusive_scan(space, _offset, _offset, 0); | ||
int const buffer_size = KokkosExt::lastElement(space, _offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any good reason not to do all of this in one parallel_scan
call? Do we expect getK
to be more expensive than launching another kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to measure that the performance gain is worth the added code complexity but yes that is a good suggestion to use a parallel_scan
with a trailing return value argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing is that we don't have a function with a good interface that returns the trailing value. It certainly is not a performance critical thing.
If we do decide to do something about it, we should talk about the interface. I would propose not doing it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good enough
650d077
to
7e95982
Compare
I'm not sure why CUDA-Clang failed. Seems totally unrelated. |
|
@masterleinad Right. I saw that and it does not make sense to me. It dowa not have to do anything with this PR. |
It was likely introduced in e21c55a. This is a failure already in |
This is a straightforward not-optimized version of the nearest query for BruteForce. I think, k=1 case should be separated out in the future, as it can be performed in a tiling manner similar to the spatial search. The same, however, can't be said about k > 1 case.
Right now, a single thread is allocated per predicate, that goes through all indexables. This limitation is due to the fact that we don't have a multi-thread PriorityQueue.
I reenabled the nearest queries tests in the tests.
My overall motivation for implementing this is to try using BruteForce as the top tree in the DistributedTree.