Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friends-of-Friends Query #161

Closed
sslattery opened this issue Oct 24, 2019 · 10 comments
Closed

Friends-of-Friends Query #161

sslattery opened this issue Oct 24, 2019 · 10 comments
Labels
application enhancement New feature or request

Comments

@sslattery
Copy link
Contributor

sslattery commented Oct 24, 2019

In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:

  1. Build a tree from a set of input points
  2. Establish a fixed neighborhood radius r
  3. For every point, locate the other points in the tree that are within distance r
  4. For every neighboring point within distance r, find its neighboring points that are within distance r excluding any neighbors already found previously in the query
  5. For each neighbor-of-neighbor repeat step 4 until no more points are found within distance r

The end result of each query should be a list of points that are within distance r of the query point, or are a neighbor-of-neighbors-of-neighbors-etc... of the query point.

Some questions:

  1. It was mentioned that we could possibly cap the amount of recursion in the algorithm to a fixed depth of neighbors. Does this provide a benefit? If so what are reasonable values?
  2. The output of the query could be in our standard structure in a CSR-like format where each query returns a set of object ids that satisfied the query predicate. However, many particles will belong to the same cluster and this cluster will be repeated for each point in it, thus potentially resulting in a large amount of memory needed for the query results depending on the structure of the cluster. What is the most useful output format of this type of query? Should we return clusters rather than results for individual points? Or return clusters as well as a list for each point of the cluster in which it is located?
@aprokop
Copy link
Contributor

aprokop commented Oct 24, 2019

If I understand the problem right, it should be equivalent to finding strongly connected components in an undirected (in this case) graph, a well studied problem in the graph community with several parallel methods.

I don't see, however, an easy way to not construct the graph with all the neighbors within certain distance at the moment.

@sslattery
Copy link
Contributor Author

Yes I think that is a good generalization if one considers the results of a preliminary radius query within r to be the graph.

@steverangel
Copy link

steverangel commented Oct 24, 2019

Agreed. In discussing this with Damian, we also thought that after steps 1-3, the task is then finding the connected components of a graph. Each component is then the desired output.

To add, a matrix representing the graph would be very sparse, so a representation that can take advantage of that property is needed. Roughly ~100 million points (vertices) in the graph.

@aprokop
Copy link
Contributor

aprokop commented Oct 24, 2019

To add, a matrix representing the graph would be very sparse, so a representation that can take advantage of that property is needed

If every query returns m results, our overhead is 1 + 1/m (assuming int for both results and offsets). I think this is OK for now. 100 million int is around 400MB, which is tolerable for GPUs.

@aprokop aprokop added the enhancement New feature or request label Oct 25, 2019
@aprokop
Copy link
Contributor

aprokop commented Oct 27, 2019

Just to note: the original multistep method Kokkos implementation is here. It is unknown whether it still works.

@aprokop
Copy link
Contributor

aprokop commented Dec 20, 2019

I think our plan of attack for the problem should be the following:

  1. Version 1
    • Construct the tree using ArborX for the test problem
    • Run ECL-CC algorithm on the resulting graph to produce strongly connected components
  2. Version 2
    • Implement Union-Find algorithm as part of the ArborX callback
    • Compare with version 1

Version 1 seems to be easy to implement, just need to integrate ECL-CC (which already takes in a CSR graph as input). Version 2 should be possible, but needs some further thinking. I think the main question in is whether we can combine init and compute1 stages of the ECL-CC algorithm.

There are also further optimizations that could be considered. For example, if the diameter of a bounding box is less than the specified radius, we can guarantee that all its leaves are going to be in the same SCC. We could probably do some pre-processing to partition the domain in boxes prior to the construction. It remains to be seen.

@aprokop aprokop added this to In progress in Developer: aprokop Feb 21, 2020
@aprokop aprokop mentioned this issue Mar 5, 2020
6 tasks
@aprokop aprokop moved this from In progress to To do in Developer: aprokop Mar 25, 2020
@aprokop
Copy link
Contributor

aprokop commented Apr 27, 2020

Another issue to consider is whether we want to use something like density clustering for HACC, as the data is very unevenly distributed. For example, this paper.

@sslattery
Copy link
Contributor Author

Good idea - this seems like it would be a good solution for at least some cases of unbalanced trees. We will also likely have a use case for something like this in the raytracing problem on facet geometries where the triangle distributions are irregular.

@aprokop
Copy link
Contributor

aprokop commented May 13, 2020

The driver was implemented and merged in examples. The algorithm uses version 2 from the previous comment. I'm closing this issue until we have an external request to improve it in any form, or address any of its limitations.

@aprokop aprokop closed this as completed May 13, 2020
Developer: aprokop automation moved this from To do to Done May 13, 2020
@aprokop aprokop mentioned this issue Jun 2, 2020
1 task
@aprokop
Copy link
Contributor

aprokop commented Oct 1, 2020

Timeline of the algorithm improvement

date time hash description
2020/03/09 8.33 147f3c3 (part of #237) Version 1 (using ECL-CC for post-processing)
2020/05/03 3.52 281b77d (#237) Version 2 (inline processing using union-find)
2020/05/20 2.74 c3893a6 (#306) Traversal optimization (stack Karras style)
2020/08/07 2.58 779f43d (#329) New query overload (without offset or indices)
2020/08/18 0.86 2754cf1 (#364) Stackless traversal using escape index (rope)
2020/09/26 0.68 2754cf1 with 1fcd0d7 merged in Kokkos occupancy control (50%)
  • "date" means date feature originally implemented, not when it was merged
  • "hash" means the hash of the merge of the feature branch (unless stated otherwise)

Figure_1
The baseline is 76sec on a single Power9 core on Summit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants