-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Externalize buffer in BVH callback benchmark #416
Conversation
The callbacks simply count the number of found neighbors.
Current benchmark results:
I will have a look where the discrepancy for large sizes in |
For
for the callback version (with external buffers) and
for the regular version (with internal buffers). So for this problem |
I did not think of that. It certainly is clear to me the permutation computation should be exposed to a user, or at the least, to the outer part of ArborX that this buffer optimization would reside in. We have several use cases where it would also make sense for a user to have that information, e.g., self-collision problems. We could talk about the best way to achieve this. |
Results for the case if we are not sorting:
and
I am not quite sure that I understand the huge discrepancy between |
It's really hard to compare these two, as they run different number of times (186 vs 106)? The times will have to be normalized first for comparison.
Not sure, never saw this large of a discrepancy. |
My understanding was always that these times are normalized already. |
In benchmark timers, sure. But not in Kokkos profiling, right? I think the latter is simply cumulative. |
I still need to understand this some more but the results with presorting queries are:
So the |
Results with
not much of a difference. |
Looking at the Kokkos profiling output for the non-callback version
and the callback version
the average reported runtime by Kokkos is 0.0109s for the non-callback version and 0.0118s for the callback version. |
Closing this in favor of #425. |
This pull request is meant to explore how much externalizing the buffer logic really costs (somewhat based on #412. Initial tests on my laptop look at least reasonable: