Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up DBSCAN post-processing #430

Merged
merged 2 commits into from
Dec 17, 2020
Merged

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Dec 11, 2020

I'm still not sure if this is the best way to do, but it speeds up postproccessing on V100 for halo finding from 0.050 to 0.027. 3% overall time improvement for the win :) The improvements are better on AMD.

A side note: --verify option does not test this. The only way right now to validate this patch is to run with --output-sizes-and-centers and compare the output to master.

@aprokop aprokop added the performance Something is slower than it should be label Dec 11, 2020
examples/dbscan/ArborX_DBSCAN.hpp Show resolved Hide resolved
examples/dbscan/ArborX_DBSCAN.hpp Show resolved Hide resolved
examples/dbscan/ArborX_DBSCAN.hpp Show resolved Hide resolved
@aprokop aprokop changed the title Speedup DBSCAN post-processing Speed up DBSCAN post-processing Dec 11, 2020
@aprokop aprokop requested a review from dalg24 December 12, 2020 01:15
@aprokop
Copy link
Contributor Author

aprokop commented Dec 16, 2020

@dalg24 Added more comments.

@aprokop
Copy link
Contributor Author

aprokop commented Dec 16, 2020

PGI failed in the distributed tree, but it has nothing to do with this PR.

@aprokop
Copy link
Contributor Author

aprokop commented Dec 17, 2020

retest this please

Comment on lines +268 to +272
// truncate the permutation array, see comment above
reallocWithoutInitializing(cluster_indices, num_cluster_indices);
Kokkos::deep_copy(
exec_space, cluster_indices,
Kokkos::subview(permute, std::make_pair(0, num_cluster_indices)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Kokkos::resize()isn't it? Did you chose do it this way because Kokkos initializes the array before copying entries? If so please comment.

If I understand it right, this is something that could be reported to Kokkos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this to my attention. Now that I look at the complete code, I realize that we no longer use permute in any of the following kernels, and thus don't have to be careful with it anymore. Thus, you are right, it could be done as

Kokkos::resize(permute, num_cluster_indices));
cluster_indices = permute;

At this moment, I'm not sure about the complexity of the first line. Is it 1 allocation, 1 initialization of size num_cluster_indices and 1 deep_copy? If yes, that it indeed has an extra initialization cost that we don't need. I think I tried a solution proposed in this comment (Kokkos::resize( Kokkos::view_resize (WithoutInitializing), A, 2*N );) but it did not compile, because there is no view_resize.

Copy link
Contributor Author

@aprokop aprokop Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems to have been fixed in kokkos/kokkos#2285. So it almost works with this:

Kokkos::resize(Kokkos::WithoutInitializing, permute, num_cluster_indices);                                                                                                                                                                                                                                                                                             
cluster_indices = permute;

except they have different value types:

error: no operator "=" matches these operands
            operand types are:
  Kokkos::View<int *, Kokkos::Cuda::memory_space> =
  Kokkos::View<unsigned int *, Kokkos::Device<Kokkos::CudaSpace::execution_space, Kokkos::CudaSpace::memory_space>>

Copy link
Contributor

@dalg24 dalg24 Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try Kokkos::resize(Kokkos::WithoutInitializing, A, 2*N);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway I missed that you were copying into another view.

@aprokop aprokop merged commit 9338ba9 into arborx:master Dec 17, 2020
@aprokop aprokop deleted the dbscan_postprocessing branch December 17, 2020 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Something is slower than it should be
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants