-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly compute tree bounds with indexables #894
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See aprokop#16
@dalg24 integrated your changes and merged bounds with the hierarchy generation. |
6d46363
to
6559ec0
Compare
6559ec0
to
0f3d411
Compare
5% slowdown on Cuda for the smallest (10K) tree construction. I think need to revert some part of the patch to not make it part of the hierarchy generation for loop. Pretty annoyed, this is taking so much longer than expected. |
Co-authored-by: Damien L-G <dalg24@gmail.com>
c41fc42
to
22bf054
Compare
Latest version has no slowdown. |
// Skip initialization so that we don't execute a kernel launch | ||
Kokkos::View<BoundingVolume, typename Nodes::memory_space> bounding_volume( | ||
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | ||
"ArborX::BVH::getSingleLeafBounds::bounding_volume")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Skip initialization so that we don't execute a kernel launch | |
Kokkos::View<BoundingVolume, typename Nodes::memory_space> bounding_volume( | |
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | |
"ArborX::BVH::getSingleLeafBounds::bounding_volume")); | |
using MemorySpace = typename Nodes::memory_space; | |
auto bounding_volume = Kokkos::create_mirror_view( | |
Kokkos::view_alloc(space, Kokkos::WithoutInitializing, | |
"ArborX::BVH::getSingleLeafBounds::bounding_volume", | |
MemorySpace()), | |
Kokkos::View<BoundingVolume, Kokkos::HostSpace, Kokkos::MemoryUnmanaged>( | |
&bounds)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this does not work because of the labels. view_alloc
in create_mirror_view
does not allow a label. Which results in bounding_volume
having a label _mirror
, which fails out labels tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug on the Kokkos side. I am tempted to just filter out "_mirror"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the bug that it does not allow a label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, can just do what I did before.
Kokkos::deep_copy( | ||
space, | ||
Kokkos::View<BoundingVolume, Kokkos::HostSpace, | ||
Kokkos::MemoryUnmanaged>(&bounds), | ||
Kokkos::View<BoundingVolume const, MemorySpace, | ||
Kokkos::MemoryUnmanaged>(getRootBoundingVolumePtr())); | ||
} | ||
|
||
KOKKOS_FUNCTION | ||
BoundingVolume const *getRootBoundingVolumePtr() const | ||
{ | ||
// Need address of the root node's bounding box to copy it back on the host, | ||
// but can't access node elements from the constructor since the data is on | ||
// the device. | ||
return &_internal_nodes.data()->bounding_volume; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you prefer this approach as opposed to the one in the 3rd round of aprokop#16
(_bounds
member which is a mirror view of the bounds
argument)?
Was the extra allocation causing slowdowns? Is that why we are still using this hack to get the address of the bounding volume of the root node.
I am a bit surprised and if the performance is the same I would prefer the extra member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using any extra variable with an assignment inside the operator()
was slowing things down for the smallest problem. We had 10K the smallest, but the relative slowdown would have been worse for smaller. Essentially, the thread taking the longest (i.e., traversing all the way to the root), was necessarily doing an extra Box assignment, which was non-trivial.
The only way to avoid that was to not use any extra variables. In which case, I ended up just deep copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It might be worth expanding/updating the comment to explain that we chose to be naughty although we had alternatives for performance reasons.
74c7ddd
to
22bf054
Compare
Reverted the last changes. |
Leaf nodes will not always store bounding volume inside. We have to use the indexable for a degenerate tree.