Use lower bounds to avoid traversals in MST #631

aprokop · 2022-02-15T23:46:46Z

No description provided.

dalg24 · 2022-02-16T03:19:12Z

src/details/ArborX_MinimumSpanningTree.hpp

+    if (radius < _lower_bounds(i - n + 1))
+      return;


What happened to the version with the scan you showed me? I thought you said it was faster.

I thought so too. But I repeated experiments on Summit yesterday, and could not reproduce. Maybe I ran so many experiments that different results got confused in my head. So, the current patch does not actually seem to help on GPU. On CPU it's better than what I showed you for two reasons: a) we don't do parallel_scan, and b) more importantly, radius may have already been updated by another thread that converged, so it even more threads drop out.

aprokop · 2022-02-16T19:33:52Z

I changed to using lower bound only for Serial. I investigated using it on Nvidia A100 (I used OACISS Saturn, where I get consistent timings; on Perlmutter, the timings from run to run are very inconsistent). The summary is:

If I use it inside operator() (same as the current version for Serial), it runs ~6+% slower
If I used pre-filtering, it runs ~2% faster

In short, there is no benefit on GPU, so I disabled it there.

On CPU (AMD EPYC 7763) in Serial it speeds up the MST construction by 25%.

dalg24 · 2022-02-16T21:45:47Z

src/details/ArborX_MinimumSpanningTree.hpp

@@ -482,6 +513,16 @@ struct MinimumSpanningTree
        Kokkos::view_alloc(Kokkos::WithoutInitializing, "ArborX::MST::radii"),
        n);

+    bool const use_lower_bounds =
+        (std::is_same<ExecutionSpace, Kokkos::Serial>{});


Kokkos::Serial is undefined when the serial backend is not enabled. It is worrisome that the CI passed...

aprokop · 2022-02-16T22:21:29Z

@dalg24 Did you mean to push seemingly unrelated stuff in example_callback.cpp?

dalg24 · 2022-02-16T22:22:20Z

@dalg24 Did you mean to push seemingly unrelated stuff in example_callback.cpp?

No. Will fix.

aprokop · 2022-02-16T22:59:44Z

This does not compile:

arborx/src/details/ArborX_MinimumSpanningTree.hpp(559): error:
The enclosing parent function ("doBoruvka") for an extended __host__ __device__
lambda cannot have private or protected access within its class

aprokop · 2022-02-17T05:19:07Z

I was sure that having if (radius < lower_bound) inside traverse was fine for GPU with noop functor. But it's not. I may be hallucinating, but I rerun all commits in this PR and they all introduced 12% perf regression on A100.

So I did the final salute by just explicitly stripping the check completely out the functor by introducing tags, and dispatching based on the execution space.

I did not guard recomputing lower bounds for GPU, as it takes < 0.1%.

src/details/ArborX_MinimumSpanningTree.hpp

aprokop added the performance Something is slower than it should be label Feb 15, 2022

aprokop mentioned this pull request Feb 15, 2022

Couple perf improvements for MST and integration with the driver #628

Closed

dalg24 reviewed Feb 16, 2022

View reviewed changes

aprokop force-pushed the use_lower_bound_in_mst branch from a52eb33 to c82c9c8 Compare February 16, 2022 19:30

aprokop force-pushed the use_lower_bound_in_mst branch from c82c9c8 to e5b7dad Compare February 16, 2022 19:34

dalg24 reviewed Feb 16, 2022

View reviewed changes

aprokop and others added 4 commits February 16, 2022 17:23

Use lower bounds to avoid traversals in MST

4484fd8

Use a lambda in place of the struct that behaves as a rank-1 view

bbab4a5

Guard comparison of exec space with Kokkos::Serial

a397907

Avoid initializing the lower bounds view twice

a40981a

dalg24 force-pushed the use_lower_bound_in_mst branch from 1ed7ef4 to a40981a Compare February 16, 2022 22:24

dalg24 and others added 2 commits February 16, 2022 18:06

Fixup extended __host__ __device__ lambda with NVCC

73a3439

Final attempt in resolving the perf regression

ad52344

dalg24 reviewed Feb 17, 2022

View reviewed changes

src/details/ArborX_MinimumSpanningTree.hpp Outdated Show resolved Hide resolved

src/details/ArborX_MinimumSpanningTree.hpp Outdated Show resolved Hide resolved

Suggested improvements to previous commit

33ab347

dalg24 reviewed Feb 17, 2022

View reviewed changes

src/details/ArborX_MinimumSpanningTree.hpp Show resolved Hide resolved

Drop use of if constexpr

9f640ce

aprokop merged commit dc9d59d into arborx:master Feb 17, 2022

aprokop deleted the use_lower_bound_in_mst branch February 17, 2022 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lower bounds to avoid traversals in MST #631

Use lower bounds to avoid traversals in MST #631

aprokop commented Feb 15, 2022

dalg24 Feb 16, 2022

aprokop Feb 16, 2022

aprokop commented Feb 16, 2022 •

edited

dalg24 Feb 16, 2022

aprokop commented Feb 16, 2022

dalg24 commented Feb 16, 2022

aprokop commented Feb 16, 2022

aprokop commented Feb 17, 2022

Use lower bounds to avoid traversals in MST #631

Use lower bounds to avoid traversals in MST #631

Conversation

aprokop commented Feb 15, 2022

dalg24 Feb 16, 2022

Choose a reason for hiding this comment

aprokop Feb 16, 2022

Choose a reason for hiding this comment

aprokop commented Feb 16, 2022 • edited

dalg24 Feb 16, 2022

Choose a reason for hiding this comment

aprokop commented Feb 16, 2022

dalg24 commented Feb 16, 2022

aprokop commented Feb 16, 2022

aprokop commented Feb 17, 2022

aprokop commented Feb 16, 2022 •

edited