Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Small Feature] Parallelize KD Tree build #4559

Merged
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6e629a9
First version of parallel Kd_tree:build()
sgiraudot Mar 4, 2020
fac53dc
Remove useless boolean
sgiraudot Mar 4, 2020
0661542
Some notes about parallelism in KD Tree
sgiraudot Mar 4, 2020
2818986
Fix kd tree node
sgiraudot Mar 5, 2020
3f28ea9
Example with parallel build
sgiraudot Mar 5, 2020
ab3f714
Document parallel build
sgiraudot Mar 5, 2020
e17378e
Update doc
sgiraudot Mar 5, 2020
bd1c509
Clean garbage
sgiraudot Mar 5, 2020
e716d90
Remove now useless workaround
sgiraudot Mar 5, 2020
fe90d1c
Include parallel KD tree build in classification
sgiraudot Mar 5, 2020
35c838d
Improve doc from review with new example for parallel KD tree
sgiraudot Mar 12, 2020
9ab9081
Use emplace_back()
sgiraudot Mar 16, 2020
4bc2e46
Update performance section
sgiraudot Mar 16, 2020
857eb65
Fix markdown
sgiraudot Mar 16, 2020
f1e5569
Update branch from master after trailing whitespaces and tabs removal
sloriot Mar 26, 2020
d42113b
extra run of the script to remove tabs and trailing whitespaces
sloriot Mar 26, 2020
74f1cad
Use default construction in emplace_back
sgiraudot Apr 14, 2020
c00aeff
Fix uninitalized variables
sgiraudot Apr 14, 2020
f862503
Fix trailing whitespaces
sgiraudot Apr 14, 2020
95b9f05
Merge remote-tracking branch 'mine/Spatial_searching-Parallelize_kd_t…
sgiraudot Apr 16, 2020
a6dc66f
Fix version
sgiraudot Apr 27, 2020
7702f57
Merge remote-tracking branch 'mine/Spatial_searching-Parallelize_kd_t…
sgiraudot Apr 27, 2020
e0936d2
Update CHANGES.md
sgiraudot Apr 27, 2020
bd08ba8
Reintroduce bool leaf in Kd_tree_node
sgiraudot Apr 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 14 additions & 4 deletions Spatial_searching/doc/Spatial_searching/CGAL/Kd_tree.h
Expand Up @@ -23,7 +23,7 @@ in a dynamically allocated array (e.g., `Epick_d` with dynamic
dimension) — we says "to a lesser extent" because the points
are re-created by the kd-tree in a cache-friendly order after its construction,
so the coordinates are more likely to be stored in a near-optimal order on the
heap. When EnablePointsCache` is set to `Tag_true`, the points
heap. When `EnablePointsCache` is set to `Tag_true`, the points
coordinates will be cached in an optimal way. This will
increase memory consumption but provide better search performance.
See also the `GeneralDistance` and `FuzzyQueryItem` concepts for
Expand Down Expand Up @@ -115,7 +115,17 @@ at the first call to a query or removal member function. You can call
`build()` explicitly to ensure that the next call to
query functions will not trigger the reconstruction of the
data structure.

\tparam ConcurrencyTag enables sequential versus parallel
algorithm. Possible values are `Sequential_tag`, `Parallel_tag`, and
`Parallel_if_available_tag`. This template parameter is optional:
calling `build()` without specifying the concurrency tag will result
in `Sequential_tag` being used. If `build()` is not called by the user
but called implicitly at the first call to a query or removal member
function, `Sequential_tag` is also used.

*/
template <typename ConcurrencyTag>
void build();

/*!
Expand Down Expand Up @@ -147,14 +157,14 @@ template <class InputIterator> void insert(InputIterator first, InputIterator be
/*!
Removes the point `p` from the `k-d` tree. It uses `equal_to_p` to identify
the point after locating it, which can matter in particular when 2 points are
in the same place. `Identify_point` is a unary functor that takes a `Point_d`
in the same place. `IdentifyPoint` is a unary functor that takes a `Point_d`
and returns a `bool`. This is a limited and naive implementation that does not
rebalance the tree. On the other hand, the tree remains valid and ready for
queries. If the internal data structure is not already built, for instance
because the last operation was an insertion, it first calls `build()`.
*/
template<class Identify_point>
void remove(Point_d p, Identify_point equal_to_p);
template<class IdentifyPoint>
void remove(Point_d p, IdentifyPoint identify_point);

/*!
Removes point `p`, calling the 2-argument function `remove()` with a functor
Expand Down
16 changes: 13 additions & 3 deletions Spatial_searching/doc/Spatial_searching/Spatial_searching.txt
Expand Up @@ -480,6 +480,13 @@ to the nearest nodes exceeds the distance to the nearest point found
with a factor 1/ (1+\f$ \epsilon\f$). Priority search supports next
neighbor search, standard search does not.

In order to speed-up the construction of the `kd` tree, the child
branches of each internal node can be computed in parallel, by calling
`Kd_tree::build<CGAL::Parallel_tag>()`. On a quad-core processor, the
parallel construction is experimentally 2 to 3 times faster than the
sequential version, depending on the point cloud. The parallel version
requires the executable to be linked against the <a href="https://www.threadingbuildingblocks.org">Intel TBB library</a>.

In order to speed-up the internal distance computations in nearest
neighbor searching in high dimensional space, the approximate
searching package supports orthogonal distance computation. Orthogonal distance
Expand Down Expand Up @@ -520,9 +527,12 @@ additional requirements when using such a cache.

\section Spatial_searchingImplementationHistory Implementation History

The initial implementation of this package was done by Hans Tangelder and
Andreas Fabri. It was optimized in speed and memory consumption by Markus
Overtheil during an internship at GeometryFactory in 2014.
The initial implementation of this package was done by Hans Tangelder
and Andreas Fabri. It was optimized in speed and memory consumption by
Markus Overtheil during an internship at GeometryFactory in 2014. The
`EnablePointsCache` feature was introduced by Clément Jamin in
2019. The parallel `kd` tree build function was introduced by Simon
Giraudot in 2020.

*/
} /* namespace CGAL */
Expand Down
5 changes: 5 additions & 0 deletions Spatial_searching/examples/Spatial_searching/CMakeLists.txt
Expand Up @@ -43,6 +43,11 @@ create_single_source_cgal_program( "iso_rectangle_2_query.cpp" )

create_single_source_cgal_program( "nearest_neighbor_searching.cpp" )

find_package( TBB QUIET )
if(TBB_FOUND)
cgal_target_use_TBB(nearest_neighbor_searching)
endif()

create_single_source_cgal_program( "searching_with_circular_query.cpp" )

create_single_source_cgal_program( "searching_with_point_with_info.cpp" )
Expand Down
Expand Up @@ -20,6 +20,9 @@ int main() {

Tree tree(points.begin(), points.end());

// The tree can be built in parallel
tree.build<CGAL::Parallel_if_available_tag>();

Point_d query(0,0);

// Initialize the search structure, and search all N points
Expand Down
216 changes: 143 additions & 73 deletions Spatial_searching/include/CGAL/Kd_tree.h
Expand Up @@ -35,6 +35,33 @@
#include <CGAL/mutex.h>
#endif

#ifdef CGAL_LINKED_WITH_TBB
MaelRL marked this conversation as resolved.
Show resolved Hide resolved
#endif

/*
For building the KD Tree in parallel, TBB is needed. If TBB is
linked, the internal structures `deque` will be replaced by
`tbb::concurrent_vector`, even if the KD Tree is built in sequential
mode (this is to avoid changing the type of the KD Tree when
changing the concurrency mode of `build()`).

Experimentally, using the `tbb::concurrent_vector` in sequential
mode does not trigger any loss of performance, so from a user's
point of view, it should be transparent.

However, in case one wants to compile the KD Tree *without using TBB
structure even though CGAL is linked with TBB*, the macro
`CGAL_DISABLE_TBB_STRUCTURE_IN_KD_TREE` can be defined. In that
case, even if TBB is linked, the standard `deque` will be used
internally. Note that of course, in that case, parallel build will
be disabled.
*/
#if defined(CGAL_LINKED_WITH_TBB) && !defined(CGAL_DISABLE_TBB_STRUCTURE_IN_KD_TREE)
# include <tbb/parallel_invoke.h>
# include <tbb/concurrent_vector.h>
# define CGAL_TBB_STRUCTURE_IN_KD_TREE
#endif

namespace CGAL {

//template <class SearchTraits, class Splitter_=Median_of_rectangle<SearchTraits>, class UseExtendedNode = Tag_true >
Expand Down Expand Up @@ -77,12 +104,15 @@ class Kd_tree {
typedef EnablePointsCache Enable_points_cache;

private:

SearchTraits traits_;
Splitter split;


// wokaround for https://svn.boost.org/trac/boost/ticket/9332
#if (_MSC_VER == 1800) && (BOOST_VERSION == 105500)
#if defined(CGAL_TBB_STRUCTURE_IN_KD_TREE)
tbb::concurrent_vector<Internal_node> internal_nodes;
tbb::concurrent_vector<Leaf_node> leaf_nodes;
#elif (_MSC_VER == 1800) && (BOOST_VERSION == 105500)
MaelRL marked this conversation as resolved.
Show resolved Hide resolved
// workaround for https://svn.boost.org/trac/boost/ticket/9332
std::deque<Internal_node> internal_nodes;
std::deque<Leaf_node> leaf_nodes;
#else
Expand Down Expand Up @@ -119,7 +149,6 @@ class Kd_tree {
: traits_(tree.traits_),built_(tree.built_),dim_(-1)
{};


// Instead of the recursive construction of the tree in the class Kd_tree_node
// we do this in the tree class. The advantage is that we then can optimize
// the allocation of the nodes.
Expand All @@ -128,50 +157,69 @@ class Kd_tree {
Node_handle
create_leaf_node(Point_container& c)
{
Leaf_node node(true , static_cast<unsigned int>(c.size()));
Leaf_node node(static_cast<unsigned int>(c.size()));
std::ptrdiff_t tmp = c.begin() - data.begin();
node.data = pts.begin() + tmp;

leaf_nodes.push_back(node);
Leaf_node_handle nh = &leaf_nodes.back();


return nh;
#ifdef CGAL_TBB_STRUCTURE_IN_KD_TREE
return &*(leaf_nodes.push_back(node));
#else
leaf_nodes.push_back (node);
return &(leaf_nodes.back());
#endif
}


// The internal node

Node_handle
create_internal_node(Point_container& c, const Tag_true&)
Node_handle new_internal_node()
{
return create_internal_node_use_extension(c);
}

Node_handle
create_internal_node(Point_container& c, const Tag_false&)
{
return create_internal_node(c);
#ifdef CGAL_TBB_STRUCTURE_IN_KD_TREE
return &*(internal_nodes.push_back(Internal_node()));
sloriot marked this conversation as resolved.
Show resolved Hide resolved
#else
internal_nodes.push_back (Internal_node());
return &(internal_nodes.back());
#endif
}



// TODO: Similiar to the leaf_init function above, a part of the code should be
// moved to a the class Kd_tree_node.
// It is not proper yet, but the goal was to see if there is
// a potential performance gain through the Compact_container
Node_handle
create_internal_node_use_extension(Point_container& c)
template <typename ConcurrencyTag>
void
create_internal_node(Node_handle n, Point_container& c, const ConcurrencyTag& tag)
{
Internal_node node(false);
internal_nodes.push_back(node);
Internal_node_handle nh = &internal_nodes.back();
Internal_node_handle nh = static_cast<Internal_node_handle>(n);
CGAL_assertion (nh != nullptr);

Separator sep;
Point_container c_low(c.dimension(),traits_);
split(sep, c, c_low);
nh->set_separator(sep);

handle_extended_node (nh, c, c_low, UseExtendedNode());

if (try_parallel_internal_node_creation (nh, c, c_low, tag))
return;

if (c_low.size() > split.bucket_size())
{
nh->lower_ch = new_internal_node();
create_internal_node (nh->lower_ch, c_low, tag);
}
else
nh->lower_ch = create_leaf_node(c_low);

if (c.size() > split.bucket_size())
{
nh->upper_ch = new_internal_node();
create_internal_node (nh->upper_ch, c, tag);
}
else
nh->upper_ch = create_leaf_node(c);
}

void handle_extended_node (Internal_node_handle nh, Point_container& c, Point_container& c_low, const Tag_true&)
{
int cd = nh->cutting_dimension();
if(!c_low.empty()){
nh->lower_low_val = c_low.tight_bounding_box().min_coord(cd);
Expand All @@ -192,56 +240,45 @@ class Kd_tree {

CGAL_assertion(nh->cutting_value() >= nh->lower_low_val);
CGAL_assertion(nh->cutting_value() <= nh->upper_high_val);

if (c_low.size() > split.bucket_size()){
nh->lower_ch = create_internal_node_use_extension(c_low);
}else{
nh->lower_ch = create_leaf_node(c_low);
}
if (c.size() > split.bucket_size()){
nh->upper_ch = create_internal_node_use_extension(c);
}else{
nh->upper_ch = create_leaf_node(c);
}




return nh;
}

inline void handle_extended_node (Internal_node_handle, Point_container&, Point_container&, const Tag_false&) { }

// Note also that I duplicated the code to get rid if the if's for
// the boolean use_extension which was constant over the construction
Node_handle
create_internal_node(Point_container& c)
inline bool try_parallel_internal_node_creation (Internal_node_handle, Point_container&,
Point_container&, const Sequential_tag&)
{
Internal_node node(false);
internal_nodes.push_back(node);
Internal_node_handle nh = &internal_nodes.back();
Separator sep;

Point_container c_low(c.dimension(),traits_);
split(sep, c, c_low);
nh->set_separator(sep);
return false;
}

#ifdef CGAL_TBB_STRUCTURE_IN_KD_TREE

if (c_low.size() > split.bucket_size()){
nh->lower_ch = create_internal_node(c_low);
}else{
nh->lower_ch = create_leaf_node(c_low);
}
if (c.size() > split.bucket_size()){
nh->upper_ch = create_internal_node(c);
}else{
nh->upper_ch = create_leaf_node(c);
inline bool try_parallel_internal_node_creation (Internal_node_handle nh, Point_container& c,
Point_container& c_low, const Parallel_tag& tag)
{
/*
The two child branches are computed in parallel if and only if:

* both branches lead to internal nodes (if at least one branch
is a leaf, it's useless)

* the current number of points is sufficiently high to be worth
the cost of launching new threads. Experimentally, using 10
times the bucket size as a limit gives the best timings.
*/
if (c_low.size() > split.bucket_size() && c.size() > split.bucket_size()
&& (c_low.size() + c.size() > 10 * split.bucket_size()))
{
nh->lower_ch = new_internal_node();
nh->upper_ch = new_internal_node();
tbb::parallel_invoke (std::bind (&Self::create_internal_node<Parallel_tag>, this, nh->lower_ch, std::ref(c_low), std::cref(tag)),
std::bind (&Self::create_internal_node<Parallel_tag>, this, nh->upper_ch, std::ref(c), std::cref(tag)));
return true;
}



return nh;

return false;
}


#endif

public:

Expand All @@ -261,6 +298,32 @@ class Kd_tree {
return pts.empty();
}

void build()
{
build<Sequential_tag>();
}

/*
Note about parallel `build()`. Several different strategies have
been tried, among which:

* keeping the `deque` and using mutex structures to secure the
insertions in them
* using free stand-alone pointers generated with `new` instead of
pushing elements in a container
* using a global `tbb::task_group` to handle the internal node
computations
* using one `tbb::task_group` per internal node to handle the
internal node computations

Experimentally, the options giving the best timings is the one
kept, namely:

* nodes are stored in `tbb::concurrent_vector` structures
* the parallel computations are launched using
`tbb::parallel_invoke`
*/
template <typename ConcurrencyTag>
void
build()
{
Expand All @@ -277,14 +340,21 @@ class Kd_tree {
for(unsigned int i = 0; i < pts.size(); i++){
data.push_back(&pts[i]);
}

#ifndef CGAL_TBB_STRUCTURE_IN_KD_TREE
CGAL_static_assertion_msg (!(boost::is_convertible<ConcurrencyTag, Parallel_tag>::value),
"Parallel_tag is enabled but TBB is unavailable.");
#endif

Point_container c(dim_, data.begin(), data.end(),traits_);
bbox = new Kd_tree_rectangle<FT,D>(c.bounding_box());
if (c.size() <= split.bucket_size()){
tree_root = create_leaf_node(c);
}else {
tree_root = create_internal_node(c, UseExtendedNode());
tree_root = new_internal_node();
create_internal_node (tree_root, c, ConcurrencyTag());
}

//Reorder vector for spatial locality
std::vector<Point_d> ptstmp;
ptstmp.resize(pts.size());
Expand Down