Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[registration] Add OMP based Multi-threading to NDT #4277

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

koide3
Copy link
Contributor

@koide3 koide3 commented Jul 17, 2020

This PR adds a multi-threaded and SSE-optimized NDT (for reference: #4180 #4135). Taking a diff with ndt.(h|hpp) would be helpful to see the added/updated codes.

[Added] NormalDistributionsTransformOMP

  • The following two methods are added to the original NDT
  • setNumThreads() specifies the number of threads to be used
  • setNeighborSearchMethod() offers a trade-off between registration stability and speed.

Benchmark result on KITTI00 (ApproxVoxelGrid: 0.25m, voxel resolution: 2.0m, transformation epsilon 1e-3):
(Average processing time per scan and relative trajectory errors are reported)

--- NDT ---
Time per scan    : 496.053 +- 162.218 [msec]
RTE(10, 50, 100m): 0.320342 1.564389 3.363665


--- NDT_OMP (KDTREE, 1 thread) ---
Time per scan    : 367.684 +- 126.874 [msec]
RTE(10, 50, 100m): 0.320346, 1.564428, 3.363695

--- NDT_OMP (KDTREE, 16 threads) ---
Time per scan    : 76.216 +- 39.373 [msec]
RTE(10, 50, 100m): 0.320346, 1.564428, 3.363695


--- NDT_OMP (DIRECT1, 1 thread) ---
Time per scan    : 73.426 +- 21.062 [msec]
RTE(10, 50, 100m): 0.240691, 0.931008, 2.039546

--- NDT_OMP (DIRECT1, 16 threads) ---
Time per scan    : 23.819 +- 7.633 [msec]
RTE(10, 50, 100m): 0.240691, 0.931008, 2.039546


--- NDT_OMP (DIRECT7, 1 thread) ---
Time per scan    : 287.151 +- 95.607 [msec]
RTE(10, 50, 100m): 0.216932, 0.926056, 1.977688

--- NDT_OMP (DIRECT7, 16 threads) ---
Time per scan    : 60.003 +- 32.637 [msec]
RTE(10, 50, 100m): 0.216932, 0.926056, 1.977688


--- NDT_OMP (DIRECT26, 16 threads) ---
Time per scan    : 241.036 +- 133.761 [msec]
RTE(10, 50, 100m): 13.434766, 52.471817, 81.097012

--- NDT_OMP (DIRECT27, 16 threads) ---
Time per scan    : 138.623 +- 84.487 [msec]
RTE(10, 50, 100m): 0.293359, 1.378685, 3.139081

It's not a very comprehensive comparison, but we can see the trend of processing speed and accuracy.

Note:

  • It's faster than the original NDT even when the number of threads is set to 1 thanks to the SSE optimization.
  • KDTREE shows the same result as the original implementation (very small difference is caused by SSE single-float calculation).
  • DIRECT1 is very fast, but can be unreliable when the voxel resolution is small
  • DIRECT7 is faster and more accurate than KDTREE. I think this should be the default method.
  • DIRECT27 is worse than DIRECT7 in terms of both processing speed and accuracy in this case. But, it may have a larger convergence basin than the other methods.
  • I didn't include DIRECT26 in NDT_OMP because it looks not worthy.

@kunaltyagi
Copy link
Member

Tagging @shrijitsingh99 for relevance with GSoC

@shrijitsingh99
Copy link
Contributor

shrijitsingh99 commented Jul 17, 2020

Tagging @shrijitsingh99 for relevance with GSoC

Looking into this seems as if this will be similar to how integration with filters is going to take place. The only point which would need to be taken care of is the setNeighborSearchMethod which doesn't seem to be present the original ndt implementation.

Is the implementation set to replace the original CPU implementation too?

@kunaltyagi kunaltyagi added changelog: new feature Meta-information for changelog generation module: registration needs: code review Specify why not closed/merged yet labels Jul 17, 2020
@kunaltyagi
Copy link
Member

Should we create a separate issue to note that the non-special NDT code needs update (till the level of NDT_OMP implementation)? If (or rather when) the executors get integrated, a large update would not be needed.

@shrijitsingh99
Copy link
Contributor

shrijitsingh99 commented Jul 17, 2020

Is an update to non-special needed? Is there any reason we are not just removing original NDT and renaming ndt_omp to ndt besides maybe potential bugs?

@kunaltyagi kunaltyagi added the needs: feedback Specify why not closed/merged yet label Jul 17, 2020
@koide3
Copy link
Contributor Author

koide3 commented Jul 20, 2020

Is an update to non-special needed? Is there any reason we are not just removing original NDT and renaming ndt_omp to ndt besides maybe potential bugs?

Some concerns about replacing the original ndt with the new one are:

  • The new code requires more memory during registration, because it allocates score, gradient, and hessian for each input point for thread safety
  • The readability of updateDerivatives gets a bit worse in exchange for SSE optimization
  • Does the policy of PCL allow to use OpenMP in a method whose name doesn't explicitly indicate its a multi-threaded algorithm? I thought PCL intentionally separate sinle-thread and multi-threaded algorithms like pcl::NormalEstimation and pcl::NormalEstimationOMP (I don't really know...)

@kunaltyagi
Copy link
Member

Regarding the last point:

OMP and normal implementations started diverging and it was decided to bring both under one umbrella and to use #pragma omp if to achieve that.

We'll be seeing some larger unification of API across PCL 🤞 Hopefully, all goes well

If possible something along the lines of PRs started by @shrijitsingh99 would be better 😄

@koide3
Copy link
Contributor Author

koide3 commented Jul 20, 2020

OK, the other concerns don't matter IMO. I think we can use less memory consuming routines when num_threads = 1. Should we append the new additions to this PR #4180 (or open another PR)?

@kunaltyagi kunaltyagi removed the needs: code review Specify why not closed/merged yet label Jul 20, 2020
@kunaltyagi
Copy link
Member

Please make the modifications here since we've merged 1.11.1 and next target release is API/ABI breaking 1.12

@kunaltyagi kunaltyagi added needs: more work Specify why not closed/merged yet and removed needs: feedback Specify why not closed/merged yet labels Aug 22, 2020
@koide3
Copy link
Contributor Author

koide3 commented Aug 24, 2020

OK, thanks! I'll push updates here.

@koide3
Copy link
Contributor Author

koide3 commented Sep 5, 2020

I pushed multi-threaded NDT code. In addition to #4277 (comment), I made two modifications:

I'm testing the correctness of the updated NDT on KITTI. I'll report the test result once it's finished.

@koide3 koide3 force-pushed the multi-threaded-ndt branch 2 times, most recently from 8eeab76 to 3e261cf Compare September 7, 2020 11:25
@kunaltyagi
Copy link
Member

kunaltyagi commented Sep 7, 2020

Try not to force-push. Reviewers lose all context

@koide3
Copy link
Contributor Author

koide3 commented Sep 7, 2020

Oh, sorry, I'll avoid force pushing

Copy link
Member

@kunaltyagi kunaltyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned an unrolled loop, but I can't find it

registration/include/pcl/registration/ndt.h Outdated Show resolved Hide resolved
registration/include/pcl/registration/ndt.h Outdated Show resolved Hide resolved
registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
registration/include/pcl/registration/ndt.h Outdated Show resolved Hide resolved
registration/include/pcl/registration/ndt.h Outdated Show resolved Hide resolved
registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
@kunaltyagi kunaltyagi added changelog: ABI break Meta-information for changelog generation changelog: API break Meta-information for changelog generation changelog: deprecation Meta-information for changelog generation and removed changelog: API break Meta-information for changelog generation labels Sep 8, 2020
Copy link
Member

@kunaltyagi kunaltyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please verify that the deprecated functions aren't used anywhere inside PCL (except any code explicitly testing them)

registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
test/registration/test_ndt.cpp Outdated Show resolved Hide resolved
registration/include/pcl/registration/impl/ndt.hpp Outdated Show resolved Hide resolved
@kunaltyagi kunaltyagi changed the title [registration] Multi-threaded NDT [registration] Add OMP based Multi-threading to NDT Sep 18, 2020
@koide3
Copy link
Contributor Author

koide3 commented Sep 24, 2020

All the deprecated ones are all protected members, and they are not used in anywhere other than NDT.

Copy link
Member

@kunaltyagi kunaltyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comment only

test/registration/test_ndt.cpp Outdated Show resolved Hide resolved
@kunaltyagi kunaltyagi added needs: pr merge Specify why not closed/merged yet needs: code review Specify why not closed/merged yet and removed needs: more work Specify why not closed/merged yet labels Sep 24, 2020
kunaltyagi
kunaltyagi previously approved these changes Oct 12, 2020
@kunaltyagi kunaltyagi removed the needs: pr merge Specify why not closed/merged yet label Oct 16, 2020
@koide3
Copy link
Contributor Author

koide3 commented Oct 23, 2020

Can I do rebase, squash, and force-push to resolve the conflict?

@kunaltyagi
Copy link
Member

There's a conflict due to the merge of formatting in registration module

@koide3
Copy link
Contributor Author

koide3 commented Nov 3, 2020

I resolved the conflict.

@kunaltyagi
Copy link
Member

GCC 16.04 is failing because the it doesn't accept the fourth lambda

@koide3
Copy link
Contributor Author

koide3 commented Nov 5, 2020

I remove the lambda in INSTANTIATE_TEST_SUITE_P

@kunaltyagi
Copy link
Member

LGTM

@koide3
Copy link
Contributor Author

koide3 commented Nov 24, 2020

I resolved conflicts.

@larshg
Copy link
Contributor

larshg commented Apr 23, 2021

Should this be merged 🚀 ?

@kunaltyagi
Copy link
Member

Needs 1 more review. Currently, requested from mvieth, but please feel free to review instead Lars :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: ABI break Meta-information for changelog generation changelog: deprecation Meta-information for changelog generation changelog: new feature Meta-information for changelog generation module: registration needs: code review Specify why not closed/merged yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants