Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Small Feature] Parallelize KD Tree build #4559

Merged

Conversation

sgiraudot
Copy link
Contributor

@sgiraudot sgiraudot commented Mar 5, 2020

Rationale

Many algorithms (notably in the Point Set Processing package) rely on the Spatial Searching package for fast neighbor queries. Although many PSP algorithms take advantage of trivial parallelisation to speed-up computation by querying the kd tree in parallel for each point, a substantial time can be spend building the kd tree, especially for very large point clouds. Up to a third of the computation time of algorithms like CGAL::jet_estimate_normals() can be spent building the kd tree when queries are done in parallel.

This feature adds a parallel variant of the KD_tree::build() member function. On a quad-core processor, this parallel algorithm is experimentally 2 to 3 times faster than the sequential one, depending on the point cloud.

Summary of API changes

A template parameter ConcurrencyTag is added to the KD_tree:build() member function, with possible values being CGAL::Sequential_tag, CGAL::Parallel_tag and CGAL::Parallel_if_available_tag.

For backward compatibility, an overload of KD_tree::build() without template parameter is provided, falling back to the sequential version. The build() called internally if the tree is not built at first query is also the sequential one. The rational is that parallelism should always be explicitly specified by users (also to avoid hidden changes of behavior in existing codes).

License and copyright ownership

(No change)

CHANGES.md

(todo)

Submission

Version 1 (outdated)

Version 2

Status

  • Developed and locally tested (GNU/Linux)
  • Small Feature Pre-approved -- Andreas Fabri 2 April 2020 (CEST)

@afabri
Copy link
Member

afabri commented Mar 11, 2020

I would suggest to remove the parallel tag in the very first example. There should be a section at the end on parallelism, where you build with the parallel tag and at the same time show a parallel for loop for performing parallel queries.

@afabri
Copy link
Member

afabri commented Mar 11, 2020

Why is build() called before a removal function? I know this is also written in master.

@sgiraudot
Copy link
Contributor Author

sgiraudot commented Mar 16, 2020

I updated the doc with a Version 2 following @afabri's review.

I'm currently investigating the performances behavior with respect to the bucket size: we present results with the default value (10) and with 20, and the performances with 20 are always better in our performance table (it was already the case before this small feature), which probably makes users wonder why 20 is not the default value.

@sgiraudot
Copy link
Contributor Author

So I did some more experiment regarding the computation time VS bucket size, and it turns out that for the 800k points case used in the manual, the bucket size that gives the shortest computation time is more around 100:

time_vs_bucket_size_800k_pts

(Violet is sequential, green is parallel.)

The manual does explain that the bucket size should increase with the size of the point cloud, which is confirmed by the following experiment: if I randomly simplify the point cloud to keep only 10% of points (80k), then the optimal bucket size drops to around 50.

time_vs_bucket_size_80k_pts

I'm not sure what we should do about this. It looks like the bucket size could be roughly estimated as a factor of the logarithm of the number of points, but do we want to include such an automatic selection in the KD Tree? Or do we keep it as is, meaning it's the responsibility of users to choose the best bucket size according to their needs?

@MaelRL MaelRL changed the base branch from master to releases/CGAL-4.14-branch March 26, 2020 19:59
@MaelRL MaelRL changed the base branch from releases/CGAL-4.14-branch to master March 26, 2020 19:59
…ree_build-GF' into Spatial_searching-Parallelize_kd_tree_build-GF
@sgiraudot sgiraudot changed the base branch from master to releases/CGAL-5.0-branch April 16, 2020 15:04
@sgiraudot sgiraudot changed the base branch from releases/CGAL-5.0-branch to master April 16, 2020 15:04
@mglisse
Copy link
Member

mglisse commented Apr 16, 2020

The optimal bucket size may also depend on the kernel (and in particular the dimension).

@mglisse
Copy link
Member

mglisse commented Apr 16, 2020

I've checked the code, and remove() does not actually call build(),

remove calls root which calls build if necessary.

maxGimeno added a commit to maxGimeno/cgal that referenced this pull request Apr 21, 2020
…lize_kd_tree_build-GF

[Small Feature] Parallelize KD Tree build
@maxGimeno
Copy link
Contributor

Copy link
Member

@sloriot sloriot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change.md must be updated + I agree that the commented code in remove should be enabled.

@@ -51,15 +51,12 @@ namespace CGAL {
typedef typename Kdt::iterator iterator;
typedef typename Kdt::D D;

bool leaf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that change required in this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear to me that this will not negatively impact the performance so except if you show that it is faster you should undo that change.

sgiraudot and others added 3 commits April 27, 2020 10:15
Co-Authored-By: Sebastien Loriot <sloriot.ml@gmail.com>
…ree_build-GF' into Spatial_searching-Parallelize_kd_tree_build-GF
@mglisse
Copy link
Member

mglisse commented Apr 27, 2020

I agree that the commented code in remove should be enabled.

Did you read my comment in this PR? Do you disagree with it?

@sloriot sloriot changed the base branch from master to releases/CGAL-4.14-branch April 27, 2020 09:34
@sloriot sloriot changed the base branch from releases/CGAL-4.14-branch to master April 27, 2020 09:34
@sloriot
Copy link
Member

sloriot commented Apr 27, 2020

I agree that the commented code in remove should be enabled.

Did you read my comment in this PR? Do you disagree with it?

Sorry I missed it.

@sloriot sloriot removed the Tested label Apr 27, 2020
@maxGimeno
Copy link
Contributor

@sloriot sloriot self-assigned this May 5, 2020
@sloriot sloriot merged commit c602038 into CGAL:master May 5, 2020
@sloriot sloriot deleted the Spatial_searching-Parallelize_kd_tree_build-GF branch May 5, 2020 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants