Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch HalfTraversal to APIv2 #1022

Merged
merged 5 commits into from
Apr 10, 2024
Merged

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Jan 10, 2024

The main thing is to change half-traversal callback from callback(int, int) to callback(value, value).

Note that this change is not backwards compatible. However, HalfTraversal is experimental, so it should be fine.

@aprokop aprokop added the refactoring Code reorganization label Jan 10, 2024
@aprokop aprokop changed the title Switch HalfTraversal to using values instead of int Switch HalfTraversal to APIv2 Jan 10, 2024
@aprokop aprokop mentioned this pull request Jan 10, 2024
@aprokop
Copy link
Contributor Author

aprokop commented Jan 10, 2024

I need to check that FDBSCAN is not affected. We don't have benchmark for neighbor lists, so can't say how much it affected it. But I expect that there is some benefit as I switched to APIv2 with points instead of boxes for the tree construction.

@aprokop aprokop added performance Something is slower than it should be API User visible interface modifications labels Jan 10, 2024
src/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved
@aprokop
Copy link
Contributor Author

aprokop commented Jan 11, 2024

saturn dbscan $ ./ArborX_master_7abc0268 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose
-- construction     :      0.371
-- query+cluster    :      3.143
-- postprocess      :      0.516
total time          :      4.106
-----------------------------------------------
saturn dbscan $ ./ArborX_half_cfb2d3e2 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose
-- construction     :      0.375
-- query+cluster    :      3.358
-- postprocess      :      0.529
total time          :      4.340

About 6% slowdown in query+cluster on A100 for 497M problem (eps=0.014, minPts=2).

saturn dbscan $ ./ArborX_master_7abc0268 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose
-- construction     :      0.379
-- query+cluster    :      4.487
---- neigh          :      0.836
---- query          :      3.408
-- postprocess      :      0.423
total time          :      5.368
-----------------------------------------------
saturn dbscan $ ./ArborX_half_cfb2d3e2 --binary --filename
  ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose
-- construction     :      0.376
-- query+cluster    :      4.796
---- neigh          :      0.843
---- query          :      3.701
-- postprocess      :      0.407
total time          :      5.657

About 8.5% slowdown in query on A100 for 497M problem (eps=0.014, minPts=2).

The cost is likely passing values by const ref to the FDBSCANCallback, and then unwrapping it. I don't know how I feel about it. On one hand, that's a pretty hefty penalty for such a small change. On the other, the interface for half traversal seems to be more flexible.

Maybe need to explore other cases where half traversal is useful but maybe not possible with the current callback interface.

@aprokop
Copy link
Contributor Author

aprokop commented Jan 11, 2024

Move a part of the PR (the one that isn't related to switching HalfTraversal API to v2) to #1024.

@aprokop
Copy link
Contributor Author

aprokop commented Jan 12, 2024

Rebased on #1024.

@aprokop aprokop marked this pull request as draft January 26, 2024 17:00
@aprokop
Copy link
Contributor Author

aprokop commented Apr 9, 2024

The following change seems to have fixed the performance issue:

--- a/src/details/ArborX_DetailsHalfTraversal.hpp
+++ b/src/details/ArborX_DetailsHalfTraversal.hpp
@@ -52,7 +52,7 @@ struct HalfTraversal

   KOKKOS_FUNCTION void operator()(int i) const
   {
-    auto const &leaf_value = HappyTreeFriends::getValue(_bvh, i);
+    auto const leaf_value = HappyTreeFriends::getValue(_bvh, i);
     auto const predicate = _get_predicate(leaf_value);

     int node = HappyTreeFriends::getRope(_bvh, i);

While it may have negative consequences for very large size values, this is acceptable for our common use cases. In the future, we may specialize on sizeof(Value).

saturn dbscan ((fd39c111...)) $ for i in ./ArborX_master_466efb7c ./ArborX_half_fd39c111; do $i --binary --filename ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 2 --impl fdbscan --verbose --kokkos-device-id=2 --max-num-points 300000000; done
ArborX version    : 1.6 (dev)
ArborX hash       : 466efb7c
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 2
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.207
-- query+cluster    :      1.894
-- postprocess      :      0.202
total time          :      2.308

#clusters       : 18287721
#cluster points : 226764477 [75.59%]
#noise   points : 73235523 [24.41%]
ArborX version    : 1.6 (dev)
ArborX hash       : fd39c111
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 2
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.242
-- query+cluster    :      1.882
-- postprocess      :      0.203
total time          :      2.332

#clusters       : 18287721
#cluster points : 226764477 [75.59%]
#noise   points : 73235523 [24.41%]
saturn dbscan ((fd39c111...)) $ for i in ./ArborX_master_466efb7c ./ArborX_half_fd39c111; do $i --binary --filename ~/data/hacc_497M.arborx --eps 0.014 --core-min-size 10 --impl fdbscan --verbose --kokkos-device-id=2 --max-num-points 300000000; done
ArborX version    : 1.6 (dev)
ArborX hash       : 466efb7c
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 10
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.241
-- query+cluster    :      2.641
---- neigh          :      0.449
---- query          :      2.098
-- postprocess      :      0.146
total time          :      3.034

#clusters       : 791387
#cluster points : 148503760 [49.50%]
#noise   points : 151496240 [50.50%]
ArborX version    : 1.6 (dev)
ArborX hash       : fd39c111
Kokkos version    : 4.1.0
algorithm         : dbscan
eps               : 0.014000
cluster min size  : 1
implementation    : fdbscan
verify            : false
minpts            : 10
filename          : /home/users/aprokop/data/hacc_497M.arborx [binary, max_pts = 300000000]
samples           : -1
verbose           : true
Reading in "/home/users/aprokop/data/hacc_497M.arborx" in binary mode...done
Read in 300000000 3D points
-- construction     :      0.179
-- query+cluster    :      2.643
---- neigh          :      0.448
---- query          :      2.104
-- postprocess      :      0.146
total time          :      2.973

#clusters       : 791387
#cluster points : 148503760 [49.50%]
#noise   points : 151496240 [50.50%]

@aprokop aprokop marked this pull request as ready for review April 9, 2024 20:32
@aprokop
Copy link
Contributor Author

aprokop commented Apr 9, 2024

@dalg24 ready for re-review

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised by the variability of the construction timings in the results you posted. Is it something you have been seeing before?

@@ -122,7 +127,9 @@ void findFullNeighborList(ExecutionSpace const &space,
Points points{primitives}; // NOLINT
int const n = points.size();

BoundingVolumeHierarchy<MemorySpace, PairValueIndex<Point>> bvh(
using Value = PairValueIndex<Point>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of that naming

@aprokop
Copy link
Contributor Author

aprokop commented Apr 10, 2024

I am a bit surprised by the variability of the construction timings in the results you posted. Is it something you have been seeing before?

Something else was running which may have interfered at the moment. I rerun today a few times and saw consistent 0.207-0.210.

@aprokop aprokop merged commit 908f934 into arborx:master Apr 10, 2024
2 checks passed
@aprokop aprokop deleted the half_traversal_apiv2 branch April 10, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API User visible interface modifications performance Something is slower than it should be refactoring Code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants