Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement nearest query for BruteForce #1053

Merged
merged 8 commits into from
Apr 5, 2024

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Apr 3, 2024

This is a straightforward not-optimized version of the nearest query for BruteForce. I think, k=1 case should be separated out in the future, as it can be performed in a tiling manner similar to the spatial search. The same, however, can't be said about k > 1 case.

Right now, a single thread is allocated per predicate, that goes through all indexables. This limitation is due to the fact that we don't have a multi-thread PriorityQueue.

I reenabled the nearest queries tests in the tests.

My overall motivation for implementing this is to try using BruteForce as the top tree in the DistributedTree.

@aprokop aprokop added the enhancement New feature or request label Apr 3, 2024
@aprokop aprokop requested a review from dalg24 April 3, 2024 21:05
@aprokop aprokop mentioned this pull request Apr 3, 2024
test/tstKokkosToolsAnnotations.cpp Outdated Show resolved Hide resolved
test/CMakeLists.txt Show resolved Hide resolved
Kokkos::View<PairIndexDistance *, MemorySpace> _buffer;
Kokkos::View<int *, MemorySpace> _offset;

NearestBufferProvider() = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a default constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we don't need to allocate the storage when called for an empty tree. So, we could avoid doing the scan over primitives k's in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we never call it do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implicitly do, in the TreeTraversal constructor.

src/details/ArborX_DetailsNearestBufferProvider.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsBruteForceImpl.hpp Outdated Show resolved Hide resolved
Comment on lines +170 to +171
if (k < 1)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this really should be a precondition
This was brought up in the past but I see we still do not enforce in TreeTraversal.

Comment on lines +173 to +174
using PairIndexDistance =
typename NearestBufferProvider<MemorySpace>::PairIndexDistance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up we should think of making this a struct with named parameters.
This is really ugly below when we refer to the "second" to signify the distance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine with me. The thing I am starting to dislike is having all those PairValueIndex, PairIndexRank, PairIndexDistance thingies floating around. Wonder if there's a better way to handle that.

Comment on lines 210 to 214
while (!heap.empty())
{
callback(predicate, values(heap.top().first));
heap.pop();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably comment that this is sorting the heap.
We technically did not intend to guarantee any order for nearest queries but I suppose we do sort as well in TreeTraversal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except it does not sort the heap in the increasing order. Rather, it is in decreasing order. So, the callbacks here would be called in a different order than in BVH (where they would be called in increasing distance order).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...
Why did you choose to do that instead of just looping over the elements of the underlying storage?
I know this escapes the control of the data structure but it is more efficient because it skips the heap operations.
(not blocking nor asking you to change at this time, just trying to figure out why you did it this way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should do

    sortHeap(heap.data(), heap.data() + heap.size(), heap.valueComp());
    for (decltype(heap.size()) i = 0; i < heap.size(); ++i)
      _callback(predicate, values(heap.data() + i)->first);

We could skip sortHeap, but I wonder if we should. If we don't, we would replicate behavior of the BVH in that the callback will be called in the order from the nearest to further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented with sorting.

{

template <typename MemorySpace>
struct NearestBufferProvider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self to get back to this.

Comment on lines +60 to +65
Kokkos::parallel_for(
"ArborX::NearestBufferProvider::scan_queries_for_numbers_of_neighbors",
Kokkos::RangePolicy<ExecutionSpace>(space, 0, n_queries),
KOKKOS_CLASS_LAMBDA(int i) { _offset(i) = getK(predicates(i)); });
KokkosExt::exclusive_scan(space, _offset, _offset, 0);
int const buffer_size = KokkosExt::lastElement(space, _offset);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good reason not to do all of this in one parallel_scan call? Do we expect getK to be more expensive than launching another kernel?

Copy link
Contributor

@dalg24 dalg24 Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to measure that the performance gain is worth the added code complexity but yes that is a good suggestion to use a parallel_scan with a trailing return value argument.

Copy link
Contributor Author

@aprokop aprokop Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is that we don't have a function with a good interface that returns the trailing value. It certainly is not a performance critical thing.

If we do decide to do something about it, we should talk about the interface. I would propose not doing it in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good enough

@aprokop
Copy link
Contributor Author

aprokop commented Apr 4, 2024

I'm not sure why CUDA-Clang failed. Seems totally unrelated.

@masterleinad
Copy link
Collaborator

/opt/boost/include/boost/mpl/assert.hpp:83:5: note: candidate function template not viable: no known conversion from 'mpl_::failed ************(boost::mpl::is_sequence<std::tuple<Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::Device<Kokkos::Threads, Kokkos::HostSpace>>>::************)' to 'typename assert<false>::type' (aka 'mpl_::assert<false>') for 1st argument
int assertion_failed( typename assert<C>::type );
    ^
1 error generated when compiling for sm_70.
test/CMakeFiles/ArborX_Test_DetailsTreeConstruction.exe.dir/build.make:91: recipe for target 'test/CMakeFiles/ArborX_Test_DetailsTreeConstruction.exe.dir/tstIndexableGetter.cpp.o' failed
make[2]: *** [test/CMakeFiles/ArborX_Test_DetailsTreeConstruction.exe.dir/tstIndexableGetter.cpp.o] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/ArborX_PR-1053/build'

@aprokop
Copy link
Contributor Author

aprokop commented Apr 4, 2024

@masterleinad Right. I saw that and it does not make sense to me. It dowa not have to do anything with this PR.

@aprokop
Copy link
Contributor Author

aprokop commented Apr 5, 2024

I'm not sure why CUDA-Clang failed. Seems totally unrelated.

It was likely introduced in e21c55a. This is a failure already in master.

@aprokop aprokop merged commit d48fe3d into arborx:master Apr 5, 2024
1 of 2 checks passed
@aprokop aprokop deleted the brute_force_nearest branch April 5, 2024 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants