Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new algorithm (FDBSCAN-DenseBox) for DBSCAN #508

Merged
merged 9 commits into from
May 28, 2021

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Apr 23, 2021

No description provided.

@aprokop aprokop added performance Something is slower than it should be refactoring Code reorganization clustering Anything to do with clustering algorithms labels Apr 23, 2021
@aprokop aprokop force-pushed the dbscan_densebox branch 2 times, most recently from 1d518c9 to c3af33b Compare May 3, 2021 23:33
@aprokop
Copy link
Contributor Author

aprokop commented May 3, 2021

Rebased on master. Now includes #506. If you'd like, I could create a separate PR with the first three commits.

examples/dbscan/dbscan.cpp Outdated Show resolved Hide resolved
src/ArborX_DBSCAN.hpp Show resolved Hide resolved
nx, ny, nz, bounds);
auto permute = Details::sortObjects(exec_space, cell_indices);

auto mixed_offsets = Details::computeMixedOffsets(exec_space, core_min_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "mixed" offsets? I don't understand the terminology.

Copy link
Contributor Author

@aprokop aprokop May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to explain. Essentially, we want to construct a tree on a combination of individual points (belonging to non-dense cells) and boxes of dense cells. In this sense, the primitives are mixed. Mixed offsets is essentially offsets in the sorted cells indices array corresponding to mixed primitives. For example, if you have [a a b b b] cell indices, and minpts = 3, it will result in the mixed offsets [0, 1, 2, 5] mixed offsets. So, it is not guaranteed that the values of the cell indices in the interval [mixed_offsets(i), mixed_offsets(i+1)) are different from [mixed_offset(i+1), mixed_offsets(i+2)).

src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
Comment on lines 309 to 372
// NOTE: This is pretty bad. A single thread will scan a linear range
// corresponding to a dense cell. There is no upper limit on the
// number of points in such a cell. In the pathological case, all
// points may be contained in a single cell, making this completely
// serial. Is there a way to do better?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this dominate the execution time? Is it a "it would be nice" or a "must improve"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no quantitative answer, and I think it falls under would be nice to improve if we knew how.

src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
Comment on lines 407 to 456
sparse_predicates(Kokkos::ViewAllocateWithoutInitializing(
"ArborX::dbscan::sparse_predicates"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Sparse" predicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicates corresponding only to points that are in non-dense cells.

src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
@aprokop aprokop force-pushed the dbscan_densebox branch 2 times, most recently from 67b593c to 1567d9e Compare May 7, 2021 02:13
@aprokop
Copy link
Contributor Author

aprokop commented May 7, 2021

@dalg24 I think I addressed all your comments. Please give it another look.

@aprokop
Copy link
Contributor Author

aprokop commented May 7, 2021

I separated first three commits into #519 for ease of the review. Will rebase once that PR is merged.

@aprokop
Copy link
Contributor Author

aprokop commented May 7, 2021

Putting operator>> inside the std namespace to see if it resolves the build failures, which I cannot reproduce on my side.

Update: yep, that was it.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting operator>> inside the std namespace to see if it resolves the build failures, which I cannot reproduce on my side.

Update: yep, that was it.

No, define it in the namespace of the enum, it will find it by ADL.

examples/dbscan/dbscan.cpp Outdated Show resolved Hide resolved
examples/dbscan/dbscan.cpp Outdated Show resolved Hide resolved
@aprokop
Copy link
Contributor Author

aprokop commented May 24, 2021

OK, I updated the PR with what I think close to the final version. Would appreciate if you review it again.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

examples/dbscan/dbscan.cpp Show resolved Hide resolved
src/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved
src/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsFDBSCANDenseBox.hpp Show resolved Hide resolved
src/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved
src/ArborX_DBSCAN.hpp Show resolved Hide resolved
aprokop and others added 2 commits May 28, 2021 12:08
Co-authored-by: Damien L-G <dalg24@gmail.com>
@aprokop
Copy link
Contributor Author

aprokop commented May 28, 2021

Rebased on master, and added a commit addressing review comments.

@aprokop
Copy link
Contributor Author

aprokop commented May 28, 2021

Build failure is not a failure, but warnings from a container build.

@aprokop aprokop merged commit 240f30a into arborx:master May 28, 2021
@aprokop aprokop deleted the dbscan_densebox branch October 6, 2021 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Anything to do with clustering algorithms performance Something is slower than it should be refactoring Code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants