[registration] Add multi-threaded NDT #4135

koide3 · 2020-05-22T06:58:55Z

This PR is created according to the suggestion on koide3/ndt_omp#19. It adds a multi-threaded and SSE-optimized NDT implementation that is derived from pcl::NormalDistributionsTransform and is considerably faster than the original one. A benchmark result is available at the bottom of this post.

[Added] NormalDistributionsTransformOMP

The following two methods are added to the original NDT
setNumThreads() specifies the number of threads to be used
setNeighborhoodSearchMethod() offers a trade-off between registration stability and speed (see here for brief explanation)

[Updated] VoxelGridCovariance

const qualifer is added to some methods to clarify that they are thread-safe
getNeighborhoodAtPoint() is generalized for different neighboring voxel search patterns

[Added] Unit test for NormalDistributionsTransformOMP

Benchmark result from https://github.com/koide3/ndt_omp

--- original NDT ---
single : 282.222[msec]
10times: 2921.92[msec]
fitness: 0.213937

--- multi-threaded NDT (KDTREE, 1 threads) ---
single : 207.697[msec]
10times: 2059.19[msec]
fitness: 0.213937

--- multi-threaded NDT (DIRECT7, 1 threads) ---
single : 139.433[msec]
10times: 1356.79[msec]
fitness: 0.214205

--- multi-threaded NDT (DIRECT1, 1 threads) ---
single : 34.6418[msec]
10times: 317.03[msec]
fitness: 0.208511

--- multi-threaded NDT (KDTREE, 8 threads) ---
single : 54.9903[msec]
10times: 500.51[msec]
fitness: 0.213937

--- multi-threaded NDT (DIRECT7, 8 threads) ---
single : 63.1442[msec]
10times: 343.336[msec]
fitness: 0.214205

--- multi-threaded NDT (DIRECT1, 8 threads) ---
single : 17.2353[msec]
10times: 100.025[msec]
fitness: 0.208511

kunaltyagi · 2020-05-22T07:02:39Z

Looking at the time taken, any reason why DIRECT1 should not be the default case for NDT implementation?

koide3 · 2020-05-22T07:35:41Z

I observed that registration results with DIRECT1 can be unreliable when the displacement is larger than the voxel resolution while DIRECT7 can deal with such cases by smoothing gradients over neighboring voxels.

As DIRECT7 is reasonably fast and reliable, I personally think it should be the default case. But, I consider that it depends on the policy of PCL on the default case choice (speed vs reliability).

kunaltyagi

Drive-through review!! Haven't seen any source material.

kunaltyagi · 2020-05-22T07:06:57Z

registration/src/ndt_omp.cpp

+#include <pcl/registration/ndt_omp.h>
+#include <pcl/registration/impl/ndt_omp.hpp>
+
+template class PCL_EXPORTS pcl::NormalDistributionsTransformOMP<pcl::PointXYZ, pcl::PointXYZ>;


Explicit instantiations should only be done in case PCL_NO_PRECOMPILE is not provided.

kunaltyagi · 2020-05-22T07:07:14Z

registration/include/pcl/registration/ndt_omp.h

+  /** \brief The voxel grid generated from target cloud containing point means and covariances. */
+  TargetGrid target_cells_;
+
+  // double fitness_epsilon_;


Unneeded code/comment?

kunaltyagi · 2020-05-22T07:09:04Z

registration/include/pcl/registration/ndt_omp.h

+                      Eigen::Matrix<double, 6, 6>& hessian,
+                      PointCloudSource& trans_cloud);
+
+  /** \brief Update interval of possible step lengths for More-Thuente method, \f$ I \f$ in More-Thuente (1994)


While I appreciate the effort in documentation, it'd be better to create a struct, and provide comments in one place, not multiple places.

kunaltyagi · 2020-05-22T07:11:35Z

registration/include/pcl/registration/ndt_omp.h

+  int num_threads_;
+
+  /** \brief Neighboring voxel search method */
+  NeighborSearchMethod search_method;


search_method_ instead

kunaltyagi · 2020-05-22T07:12:13Z

registration/include/pcl/registration/ndt_omp.h

+      h_ang_f2_, h_ang_f3_;
+
+  /** \brief Matrix composed of h_ang_*_ (for SSE optimization) */
+  Eigen::Matrix<float, 16, 4> h_ang;


There seems no reason to keep h_ang_*_ and h_ang separate

kunaltyagi · 2020-05-22T07:29:00Z

registration/include/pcl/registration/ndt_omp.h

+  inline float
+  getResolution() const
+  {
+    return (resolution_);


In new code, no need to parenthesis protect the value.

kunaltyagi · 2020-05-22T07:32:27Z

registration/include/pcl/registration/impl/ndt_omp.hpp

+template <typename PointSource, typename PointTarget>
+void
+pcl::NormalDistributionsTransformOMP<PointSource, PointTarget>::computePointDerivatives(Eigen::Vector3d& x, Eigen::Matrix<double, 3, 6>& point_gradient, Eigen::Matrix<double, 18, 6>& point_hessian, bool compute_hessian) const
+{


You need to zero the point_{gradient,hessian} (Though point_jacobian is better name due to symmetry with hessian)

kunaltyagi · 2020-05-22T07:33:53Z

registration/include/pcl/registration/ndt_omp.h

+
+  /** \brief Precompute anglular components of derivatives.
+   * \note Equation 6.19 and 6.21 [Magnusson 2009].
+   * \param[in] p the current transform vector


p isn't a nice name for a transform vector in implementation, although it may be true to its literary roots

kunaltyagi · 2020-05-22T07:38:56Z

registration/include/pcl/registration/impl/ndt_omp.hpp

+    sz = sin(p(5));
+  }
+
+  // Precomputed angular gradiant components. Letters correspond to Equation 6.19 [Magnusson 2009]


gradient, or better yet, jacobian

kunaltyagi · 2020-05-22T07:40:20Z

registration/include/pcl/registration/impl/ndt_omp.hpp

+  j_ang_h_ << (sx * cz + cx * sy * sz), (cx * sy * cz - sx * sz), 0;
+
+  j_ang.setZero();
+  j_ang.row(0).noalias() = Eigen::Vector4f((-sx * sz + cx * sy * cz), (-sx * cz - cx * sy * sz), (-cx * cy), 0.0f);


Feels like you're duplication calculation for some unknown reason

kunaltyagi · 2020-05-22T07:41:58Z

I observed that registration results with DIRECT1 can be unreliable when the displacement is larger than the voxel resolution while DIRECT7 can deal with such cases by smoothing gradients over neighboring voxels

Adding this observation in detail docstring for enum would be helpful 😄

koide3 · 2020-05-22T08:29:22Z

Thanks for reviewing. I'll fix them soon. Please note that some of the issues came from the original NDT. If you like, I can refactor the original one as well. What do you think?

kunaltyagi · 2020-05-22T08:39:21Z

I can refactor the original one as well. What do you think?

Is it possible to have 3 PR instead?

VoxelGridCovariance
Original NDT
NDT OMP

That'd make it easier to digest changes, specially since this is pushing 1700 lines already

koide3 · 2020-05-22T08:59:02Z

OK, I'll open 3 separated PRs.

stale · 2020-06-21T09:10:54Z

Marking this as stale due to 30 days of inactivity. Commenting or adding a new commit to the pull request will revert this.

kunaltyagi added changelog: new feature Meta-information for changelog generation needs: code review Specify why not closed/merged yet module: registration labels May 22, 2020

kunaltyagi requested changes May 22, 2020

View reviewed changes

kunaltyagi added needs: more work Specify why not closed/merged yet and removed needs: code review Specify why not closed/merged yet labels May 22, 2020

kunaltyagi mentioned this pull request May 23, 2020

[Registration] There is likely a bug in the implementation of line search in the NDT #3832

Closed

koide3 mentioned this pull request Jun 12, 2020

Refactoring and Bugfix of NDT #4180

Merged

stale bot added the status: stale label Jun 21, 2020

koide3 closed this Jul 3, 2020

koide3 force-pushed the master branch from db75ce5 to a4ab750 Compare July 3, 2020 11:29

koide3 mentioned this pull request Jul 7, 2020

[filters] Refatoring VoxelGridCovariance to make it multi-thread safe (and more) #4251

Merged

koide3 mentioned this pull request Jul 17, 2020

[registration] Add OMP based Multi-threading to NDT #4277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[registration] Add multi-threaded NDT #4135

[registration] Add multi-threaded NDT #4135

koide3 commented May 22, 2020

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020

kunaltyagi left a comment

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi May 22, 2020

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020 •

edited

Loading

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020

stale bot commented Jun 21, 2020

[registration] Add multi-threaded NDT #4135

[registration] Add multi-threaded NDT #4135

Conversation

koide3 commented May 22, 2020

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020

kunaltyagi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020 • edited Loading

kunaltyagi commented May 22, 2020

koide3 commented May 22, 2020

stale bot commented Jun 21, 2020

koide3 commented May 22, 2020 •

edited

Loading