Update DBSCAN post-processing to avoid calling sort #453

aprokop · 2021-01-06T18:00:43Z

We know that Kokkos::BinSort struggles on arrays with duplicated
indices, which is certainly the case here, as the sort is called on
cluster indices.

The new approach completely avoids that. And, in fact, seems to be
faster.

examples/dbscan/ArborX_DBSCAN.hpp

aprokop · 2021-01-06T18:02:53Z

I tried this on V100, and it dropped the postprocessing time from 0.27 to 0.15. In serial, I actually was able to run the 1M problem while it just hanged previously.

@dalg24 Can you please try it on HIP to make sure it does not introduce a regression?
@masterleinad Could you please give it a whirl with SYCL?

aprokop · 2021-01-06T18:05:09Z

One details that the current version of the patch introduces is that the resulting crs-like storage is not deterministic, while previously it was (in most cases, due to Thrust sort being stable).

It's not a problem per say, as the result is still valid, but it is hard to test against a golden file when spitting out cluster centers, as they are subject to summation order.

I'd like to be able to sort indices corresponding to the same offset entry prior to center computation, but not sure how to efficiently. Any thoughts?

Update: Thanks to a hint from @dalg24, this was resolved by using a heap to order the indices before the printout.

masterleinad · 2021-01-06T18:37:18Z

@masterleinad Could you please give it a whirl with SYCL?

This pull request more or less eliminates postprocessing for SYCL.

examples/dbscan/ArborX_DBSCAN.hpp

aprokop · 2021-01-06T23:04:58Z

OK, this is ready for a review.

examples/dbscan/dbscan.cpp

examples/dbscan/ArborX_DetailsDBSCANVerification.hpp

test/tstHeapOperations.cpp

src/details/ArborX_DetailsHeap.hpp

aprokop · 2021-01-07T14:27:58Z

test/tstHeapOperations.cpp

 BOOST_AUTO_TEST_CASE(sort_heap)
 {
-  for (auto heap : {std::vector<int>{36, 19, 25, 17, 3, 7, 1, 2, 9},
+  for (auto heap : {std::vector<int>{}, std::vector<int>{3},


This does not strictly belong to this PR, but I think it's helpful to slightly expand the testing.

src/details/ArborX_DetailsHeap.hpp

masterleinad

Some comments but looks good to me.

examples/dbscan/ArborX_DBSCAN.hpp

masterleinad · 2021-01-08T19:34:44Z

examples/dbscan/ArborX_DetailsDBSCANVerification.hpp

+  // FIXME we don't want to modify the clusters view in this check. What we
+  // want here is to create a view on the host, and deep_copy into it.
+  // create_mirror_view_and_copy won't work, because it is a no-op if clusters
+  // is already on the host.
+  decltype(Kokkos::create_mirror_view(Kokkos::HostSpace{},
+                                      std::declval<ClusterView>()))
+      clusters_host(Kokkos::ViewAllocateWithoutInitializing(
+                        "ArborX::DBSCAN::clusters_host"),
+                    clusters.size());
+  Kokkos::deep_copy(exec_space, clusters_host, clusters);


Would be useful to have in Kokkos.

Sure would be nice if some kind soul could introduce it there. Definitely not doing it myself.

aprokop · 2021-01-08T22:56:52Z

@dalg24 I removed the heapSort. This should have addressed all your comments.

We know that Kokkos::BinSort struggles on arrays with duplicated indices, which is certainly the case here, as the sort is called on cluster indices. The new approach completely avoids that. And, in fact, seems to be faster.

The passed `clusters` view was unintentionally modified during verification, causing issues for post-processing when verify was enabled. This problem has been around for a long time, and was uncovered before because the output results were never checked when run with verify.

aprokop · 2021-01-12T02:00:35Z

Rebased on master, as it was conflicting after merging #450.

test/tstHeapOperations.cpp

aprokop added the performance Something is slower than it should be label Jan 6, 2021

aprokop commented Jan 6, 2021

View reviewed changes

examples/dbscan/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved

masterleinad reviewed Jan 6, 2021

View reviewed changes

examples/dbscan/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved

aprokop force-pushed the dbscan_postprocess branch 4 times, most recently from 3afcbcd to 0461dc9 Compare January 6, 2021 22:44

dalg24 reviewed Jan 6, 2021

View reviewed changes

examples/dbscan/dbscan.cpp Outdated Show resolved Hide resolved

dalg24 reviewed Jan 6, 2021

View reviewed changes

examples/dbscan/ArborX_DetailsDBSCANVerification.hpp Show resolved Hide resolved

dalg24 reviewed Jan 7, 2021

View reviewed changes

test/tstHeapOperations.cpp Outdated Show resolved Hide resolved

dalg24 reviewed Jan 7, 2021

View reviewed changes

src/details/ArborX_DetailsHeap.hpp Show resolved Hide resolved

aprokop commented Jan 7, 2021

View reviewed changes

dalg24 requested changes Jan 7, 2021

View reviewed changes

src/details/ArborX_DetailsHeap.hpp Outdated Show resolved Hide resolved

masterleinad approved these changes Jan 8, 2021

View reviewed changes

aprokop added 7 commits January 11, 2021 20:54

Update DBSCAN post-processing to avoid calling sort

7c31eb6

We know that Kokkos::BinSort struggles on arrays with duplicated indices, which is certainly the case here, as the sort is called on cluster indices. The new approach completely avoids that. And, in fact, seems to be faster.

Use heap to order cluster indices for reproducibility

c65dfa8

Silence clang-tidy warnings

02a86a9

Introduce heapSort

1d7b0b1

Switch to using heapSort in DBSCAN center calculations

5e2154f

Added a comment requested in the review

8dbcf73

aprokop force-pushed the dbscan_postprocess branch from b9b2e2c to 52376ca Compare January 12, 2021 02:00

aprokop mentioned this pull request Jan 12, 2021

DBSCAN algorithm improvements #417

Closed

dalg24 approved these changes Jan 12, 2021

View reviewed changes

test/tstHeapOperations.cpp Outdated Show resolved Hide resolved

aprokop added 2 commits January 11, 2021 23:06

Add makeHeap

7c6845f

Get rid of heapSort, because it's not in the C++ stl standard

929c1c9

aprokop force-pushed the dbscan_postprocess branch from 9382827 to 929c1c9 Compare January 12, 2021 04:07

aprokop merged commit f39640a into arborx:master Jan 12, 2021

aprokop deleted the dbscan_postprocess branch January 12, 2021 04:29

dalg24 mentioned this pull request Jan 21, 2021

Move comparison function objects out of priority queue header #466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update DBSCAN post-processing to avoid calling sort #453

Update DBSCAN post-processing to avoid calling sort #453

aprokop commented Jan 6, 2021

aprokop commented Jan 6, 2021

aprokop commented Jan 6, 2021 •

edited

Loading

masterleinad commented Jan 6, 2021

aprokop commented Jan 6, 2021

aprokop Jan 7, 2021

masterleinad left a comment

masterleinad Jan 8, 2021

aprokop Jan 8, 2021

aprokop commented Jan 8, 2021

aprokop commented Jan 12, 2021 •

edited

Loading

Update DBSCAN post-processing to avoid calling sort #453

Update DBSCAN post-processing to avoid calling sort #453

Conversation

aprokop commented Jan 6, 2021

aprokop commented Jan 6, 2021

aprokop commented Jan 6, 2021 • edited Loading

masterleinad commented Jan 6, 2021

aprokop commented Jan 6, 2021

aprokop Jan 7, 2021

Choose a reason for hiding this comment

masterleinad left a comment

Choose a reason for hiding this comment

masterleinad Jan 8, 2021

Choose a reason for hiding this comment

aprokop Jan 8, 2021

Choose a reason for hiding this comment

aprokop commented Jan 8, 2021

aprokop commented Jan 12, 2021 • edited Loading

aprokop commented Jan 6, 2021 •

edited

Loading

aprokop commented Jan 12, 2021 •

edited

Loading