-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DBSCAN post-processing to avoid calling sort #453
Conversation
I tried this on V100, and it dropped the postprocessing time from 0.27 to 0.15. In serial, I actually was able to run the 1M problem while it just hanged previously. @dalg24 Can you please try it on HIP to make sure it does not introduce a regression? |
One details that the current version of the patch introduces is that the resulting crs-like storage is not deterministic, while previously it was (in most cases, due to Thrust sort being stable). It's not a problem per say, as the result is still valid, but it is hard to test against a golden file when spitting out cluster centers, as they are subject to summation order. I'd like to be able to sort indices corresponding to the same offset entry prior to center computation, but not sure how to efficiently. Any thoughts? Update: Thanks to a hint from @dalg24, this was resolved by using a heap to order the indices before the printout. |
This pull request more or less eliminates postprocessing for |
3afcbcd
to
0461dc9
Compare
OK, this is ready for a review. |
BOOST_AUTO_TEST_CASE(sort_heap) | ||
{ | ||
for (auto heap : {std::vector<int>{36, 19, 25, 17, 3, 7, 1, 2, 9}, | ||
for (auto heap : {std::vector<int>{}, std::vector<int>{3}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not strictly belong to this PR, but I think it's helpful to slightly expand the testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but looks good to me.
// FIXME we don't want to modify the clusters view in this check. What we | ||
// want here is to create a view on the host, and deep_copy into it. | ||
// create_mirror_view_and_copy won't work, because it is a no-op if clusters | ||
// is already on the host. | ||
decltype(Kokkos::create_mirror_view(Kokkos::HostSpace{}, | ||
std::declval<ClusterView>())) | ||
clusters_host(Kokkos::ViewAllocateWithoutInitializing( | ||
"ArborX::DBSCAN::clusters_host"), | ||
clusters.size()); | ||
Kokkos::deep_copy(exec_space, clusters_host, clusters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to have in Kokkos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure would be nice if some kind soul could introduce it there. Definitely not doing it myself.
@dalg24 I removed the |
We know that Kokkos::BinSort struggles on arrays with duplicated indices, which is certainly the case here, as the sort is called on cluster indices. The new approach completely avoids that. And, in fact, seems to be faster.
The passed `clusters` view was unintentionally modified during verification, causing issues for post-processing when verify was enabled. This problem has been around for a long time, and was uncovered before because the output results were never checked when run with verify.
b9b2e2c
to
52376ca
Compare
Rebased on master, as it was conflicting after merging #450. |
9382827
to
929c1c9
Compare
We know that Kokkos::BinSort struggles on arrays with duplicated
indices, which is certainly the case here, as the sort is called on
cluster indices.
The new approach completely avoids that. And, in fact, seems to be
faster.