Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize sensor model update #32

Merged
merged 2 commits into from
Dec 30, 2022
Merged

Parallelize sensor model update #32

merged 2 commits into from
Dec 30, 2022

Conversation

nahueespinosa
Copy link
Member

@nahueespinosa nahueespinosa commented Dec 30, 2022

This patch parallelizes the computation of particle weights using standard C++ execution policies (libstdc++ uses libtbb to implement them). It also adds logs for the update steps.

Particles: 2000
Laser points: 1080

  Sequential Duration [seconds] Parallel Duration [seconds] Speed up
Motion update 0.0055 0.0055 0.9928128766
Sensor update 0.1423 0.0172 8.268438989
Resampling 0.0092 0.0094 0.9754982861
Total 0.1570 0.0321 4.883631913

Benchmarked in a system with the following CPU specs:

$ lscpu
Architecture:                    x86_64                                                                                                                     
CPU op-mode(s):                  32-bit, 64-bit                                                                                                             
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           80
Model name:                      AMD Ryzen 9 5900HX with Radeon Graphics
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         3148.815
CPU max MHz:                     4680,0000
CPU min MHz:                     400,0000
BogoMIPS:                        6587.55
Virtualization:                  AMD-V
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        4 MiB
L3 cache:                        16 MiB

Related to #3.

@nahueespinosa nahueespinosa force-pushed the nahuel/parallel branch 2 times, most recently from 0c42547 to 9db4a0a Compare December 30, 2022 15:13
@nahueespinosa nahueespinosa self-assigned this Dec 30, 2022
glpuga
glpuga previously approved these changes Dec 30, 2022
Copy link
Collaborator

@glpuga glpuga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one probing question, but LGTM. Great speedup!

beluga/include/beluga/algorithm/particle_filter.hpp Outdated Show resolved Hide resolved
As there is no observable difference in performance and the unsequenced version imposes additional restrictions to user mixins (users cannot perform any insecure vectoring operations when using these policies).
@nahueespinosa
Copy link
Member Author

@glpuga Thanks for the review! Going in!

@nahueespinosa nahueespinosa merged commit 1dddc9d into master Dec 30, 2022
@nahueespinosa nahueespinosa deleted the nahuel/parallel branch December 30, 2022 17:59
@nahueespinosa nahueespinosa added enhancement New feature or request cpp Related to C++ code labels Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Related to C++ code enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants