Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new UnionFind struct for DBSCAN #506

Merged
merged 4 commits into from
May 3, 2021

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Apr 22, 2021

This makes the code more readable, and prepares a stage for the dense
box implementation of the algorithm.

@aprokop aprokop added refactoring Code reorganization clustering Anything to do with clustering algorithms labels Apr 22, 2021
src/ArborX_DBSCAN.hpp Outdated Show resolved Hide resolved
void union_1way(int i, int j) const { labels_(i) = representative(j); }

KOKKOS_FUNCTION
void union_2way(int i, int j) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why union "one" and "two" way? I don't get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union-find algorithm essentially merges two (implicit) trees. In general, it does not matter which one is a root, and which one is merged into, and is usually based on which representative is smaller. However, for the boundary points, it matters, as otherwise it may lead to the bridging effect. So the boundary point tree has to be merged into the core point tree, which is what I call union_1way, which simply assigns a label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what I described, how would you propose to call them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be more explicit via union_boundary_core or such?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just not include union_1way in UnionFind then (and have the behavior in-line as before) and don't change the name of union_2way.

Copy link
Contributor Author

@aprokop aprokop Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masterleinad I need union 1 way in UnionFind, as I do not want to expose labels.

@masterleinad @dalg24 Would you guys feel better if I change the signature to
union(int i, int j, bool one_way = false); or union(int i, int j, bool any_order = true)? It may be slightly less performant, as it will introduce an extra unnecessary if check, but should not be a biggie.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have two member functions in that case but they should be self-describing. What you essentially want is a version that assigns that updates the left tree ( or the right tree), isn't it?
If so, union_merge_left and union_merge_any (or just union) would be fine with me. Alternatively, introducing a template argument force_merge_left should not have any runtime overhead and might be nicer by providing only one function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you essentially want is a version that assigns that updates the left tree ( or the right tree), isn't it?

Well, similar, but there is no left tree and right tree. There are two trees, one consisting of a single vertex (corresponding to the boundary point), and one other tree. To me, left is completely confusing, and does not make sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then calling this function has special requisites, i.e. one of them having only a single vertex, in that case I would certainly use two different names, maybe union_with_single_vertex_tree or so.

void union_1way(int i, int j) const { labels_(i) = representative(j); }

KOKKOS_FUNCTION
void union_2way(int i, int j) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get the name

@@ -149,15 +149,16 @@ dbscan(ExecutionSpace const &exec_space, Primitives const &primitives,
Kokkos::View<int *, MemorySpace> labels(
Kokkos::ViewAllocateWithoutInitializing("ArborX::DBSCAN::labels"), n);
ArborX::iota(exec_space, labels);
Details::UnionFind<MemorySpace> union_find{labels};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to expose this? This is an implementation of the callback. Take the labels like you did before and create your union find in the callback constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I could do that in this PR, but I would have to move it out in the dense box PR, as I will also need to update this outside of the callback. So, rather than passing the labels around I like it better to pass union-find struct around. This also makes sense logically, as we could change the way union-find struct implemented (if desired) without changing the callback.

This makes the code more readable, and prepares a stage for the dense
box implementation of the algorithm.
@aprokop
Copy link
Contributor Author

aprokop commented Apr 23, 2021

@dalg24 Do my comments address your concerns?

@aprokop
Copy link
Contributor Author

aprokop commented Apr 23, 2021

Two SYCL timeouts, but otherwise tests pass.

@aprokop
Copy link
Contributor Author

aprokop commented May 3, 2021

@dalg24 Should have addressed your comments.

I renamed union_2way and union_1way to merge and merge_into. Could not use union because it's a reserved word. Union-find is also called merge-find sometimes, so these names make some sense. I could have renamed UnionFind as a MergeFind, but I like UnionFind better.

Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if you squash merged or rewrote history

@aprokop aprokop merged commit 40eecb6 into arborx:master May 3, 2021
@aprokop aprokop deleted the dbscan_union_find branch May 3, 2021 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Anything to do with clustering algorithms refactoring Code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants