Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend raycast microbenchmarks #201

Merged
merged 4 commits into from
May 31, 2023

Conversation

nahueespinosa
Copy link
Member

@nahueespinosa nahueespinosa commented May 29, 2023

Proposed changes

Related to #167.

This patch extends existing microbenchmarks to use as reference to compare the performance of the raycast and occupancy grid implementation.

Type of change

  • 馃悰 Bugfix (change which fixes an issue)
  • 馃殌 Feature (change which adds functionality)
  • 馃摎 Documentation (change which fixes or extends documentation)

Checklist

  • Lint and unit tests (if any) pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • All commits have been signed for DCO

Additional comments

The baseline raycast and occupancy grid implementations were inspired by nav2_amcl.

Command to run:

colcon build --packages-up-to beluga && ./build/beluga/test/benchmark/benchmark_beluga --benchmark_filter=Ray

First results:

Running ./build/beluga/test/benchmark/benchmark_beluga
Run on (16 X 3300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.92, 0.81, 0.76
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
----------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations
----------------------------------------------------------------------------------------------------
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/128               50.1 ns         50.1 ns     13820496
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/256                112 ns          112 ns      6260214
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/512                238 ns          238 ns      3116621
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1024               480 ns          480 ns      1427829
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_BigO              0.47 N          0.47 N    
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_RMS                  3 %             3 %    
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/128         154 ns          154 ns      4331608
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/256         322 ns          322 ns      2081307
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/512         661 ns          661 ns      1064237
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1024       1405 ns         1405 ns       495128
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_BigO       1.35 N          1.35 N    
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_RMS           4 %             4 %    
BM_RayCasting2d<StaticOccupancyGrid>/128                         175 ns          175 ns      3977466
BM_RayCasting2d<StaticOccupancyGrid>/256                         364 ns          364 ns      1927759
BM_RayCasting2d<StaticOccupancyGrid>/512                         752 ns          752 ns       927877
BM_RayCasting2d<StaticOccupancyGrid>/1024                       1610 ns         1609 ns       437840
BM_RayCasting2d<StaticOccupancyGrid>_BigO                       0.16 NlgN       0.16 NlgN 
BM_RayCasting2d<StaticOccupancyGrid>_RMS                           4 %             4 % 

The baseline grid with the baseline raycast algorithm implementation is three times faster than those provided by the library.
Performance drop occurs when replacing BaselineGrid with StaticOccupancyGrid. More research is required, but I wanted to share the first results.

@nahueespinosa nahueespinosa self-assigned this May 29, 2023
@nahueespinosa nahueespinosa added the cpp Related to C++ code label May 29, 2023
@nahueespinosa nahueespinosa changed the title Add baseline to raycast benchmark Add baseline implementation to raycast benchmark May 29, 2023
@nahueespinosa nahueespinosa added the enhancement New feature or request label May 29, 2023
@nahueespinosa nahueespinosa changed the title Add baseline implementation to raycast benchmark Add baseline raycast microbenchmark May 29, 2023
@nahueespinosa nahueespinosa force-pushed the nahuel/add-raycast-benchmark-baseline branch 2 times, most recently from 347e958 to e0c0a9f Compare May 29, 2023 15:32
@hidmic
Copy link
Collaborator

hidmic commented May 29, 2023

Hmm, the extra grid cell validity checks perhaps? Or maybe it's just the extra indirections, pushing and popping the call stack takes time. Though that doesn't explain the performance drop when using StaticOccupancyGrid with baseline raycasting 馃

@glpuga
Copy link
Collaborator

glpuga commented May 29, 2023

Hmm, the extra grid cell validity checks perhaps?

From what I saw, contains() is an expensive enough function that you notice it in perf very clearly, but not 3x expensive, by far.

Or maybe it's just the extra indirections, pushing and popping the call stack takes time. Though that doesn't explain the performance drop when using StaticOccupancyGrid with baseline raycasting thinking

I noticed we use this->self().<function> indirection to call functions located within the same mixin (see BaseDenseGrid2 contains(), data_at(), data_near(), etc.). Assuming we don't want to override function in other mixins (which I think is opening a can of worms because makes mixin order important), we can probably get some speed up by not using the indirection in these time critical functions.

I'm almost sure, even for O3, the compiler won't be smart enough to inline those calls.

@nahueespinosa
Copy link
Member Author

I'm almost sure, even for O3, the compiler won't be smart enough to inline those calls.

Actually, I think the compiler can inline those calls.

@nahueespinosa
Copy link
Member Author

New results:

Running ./build/beluga/test/benchmark/benchmark_beluga
Run on (16 X 3300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 6.28, 2.79, 1.56
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
----------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations
----------------------------------------------------------------------------------------------------
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/128               30.7 ns         30.7 ns     22695166
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/256               70.3 ns         70.2 ns      9867182
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/512                137 ns          137 ns      5082187
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1024               470 ns          469 ns      1490708
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_BigO              0.00 N^2        0.00 N^2  
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_RMS                 14 %            14 %    
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/128         175 ns          175 ns      3996580
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/256         371 ns          371 ns      1876697
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/512         764 ns          764 ns       904452
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1024       1625 ns         1624 ns       427804
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_BigO       1.56 N          1.56 N    
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_RMS           4 %             4 %    
BM_RayCasting2d<BaselineGrid>/128                                120 ns          120 ns      5776296
BM_RayCasting2d<BaselineGrid>/256                                252 ns          252 ns      2766790
BM_RayCasting2d<BaselineGrid>/512                                512 ns          512 ns      1353398
BM_RayCasting2d<BaselineGrid>/1024                              1211 ns         1210 ns       612107
BM_RayCasting2d<BaselineGrid>_BigO                              0.12 NlgN       0.12 NlgN 
BM_RayCasting2d<BaselineGrid>_RMS                                  3 %             3 %    
BM_RayCasting2d<StaticOccupancyGrid>/128                         177 ns          177 ns      3895466
BM_RayCasting2d<StaticOccupancyGrid>/256                         355 ns          355 ns      1964765
BM_RayCasting2d<StaticOccupancyGrid>/512                         740 ns          739 ns       939946
BM_RayCasting2d<StaticOccupancyGrid>/1024                       1612 ns         1612 ns       435167
BM_RayCasting2d<StaticOccupancyGrid>_BigO                       0.16 NlgN       0.16 NlgN 
BM_RayCasting2d<StaticOccupancyGrid>_RMS                           3 %             3 %  

@nahueespinosa nahueespinosa marked this pull request as ready for review May 29, 2023 16:54
@hidmic
Copy link
Collaborator

hidmic commented May 29, 2023

I'm almost sure, even for O3, the compiler won't be smart enough to inline those calls.

It's fully typed and everything is within the same translation unit, it should be able to do the inline (pero a seguro se lo llevaron preso 馃憖).

@glpuga
Copy link
Collaborator

glpuga commented May 29, 2023

It's fully typed and everything is within the same translation unit, it should be able to do the inline (pero a seguro se lo llevaron preso eyes).

Never trust a compiler to detect two things are the same when a pointer/far-reference is involved. However a quick measurement seems to make no difference in the benchmarks, so probably these are getting detected and inlined.

@nahueespinosa nahueespinosa changed the title Add baseline raycast microbenchmark Add raycast microbenchmark May 29, 2023
Copy link
Collaborator

@glpuga glpuga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glpuga glpuga self-requested a review May 30, 2023 00:03
@nahueespinosa nahueespinosa force-pushed the nahuel/add-raycast-benchmark-baseline branch 3 times, most recently from a531151 to afeaace Compare May 31, 2023 04:35
@nahueespinosa
Copy link
Member Author

Latest results after rebasing:

Running ./build/beluga/test/benchmark/benchmark_beluga
Run on (16 X 3300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.99, 1.87, 1.71
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark                                                            Time             CPU   Iterations
------------------------------------------------------------------------------------------------------
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/0/128                122 ns          122 ns      5805647 Horizontal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1/128                103 ns          103 ns      6939474 Vertical
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/2/128                123 ns          123 ns      5659506 Diagonal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/0/256                240 ns          240 ns      2987280 Horizontal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1/256                215 ns          215 ns      3268580 Vertical
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/2/256                269 ns          269 ns      2511502 Diagonal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/0/512                462 ns          461 ns      1528456 Horizontal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1/512                417 ns          417 ns      1678822 Vertical
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/2/512                579 ns          578 ns      1252690 Diagonal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/0/1024               913 ns          913 ns       749492 Horizontal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/1/1024               951 ns          951 ns       623338 Vertical
BM_RayCasting2d_BaselineRaycast<BaselineGrid>/2/1024              1219 ns         1219 ns       547418 Diagonal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_BigO                0.99 N          0.99 N     Horizontal
BM_RayCasting2d_BaselineRaycast<BaselineGrid>_RMS                   17 %            17 %     Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/0/128         106 ns          106 ns      6519438 Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1/128         100 ns          100 ns      6983497 Vertical
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/2/128         120 ns          120 ns      5835728 Diagonal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/0/256         190 ns          190 ns      3685862 Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1/256         203 ns          203 ns      3433917 Vertical
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/2/256         230 ns          230 ns      3046282 Diagonal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/0/512         358 ns          358 ns      1948903 Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1/512         401 ns          401 ns      1727412 Vertical
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/2/512         471 ns          471 ns      1472981 Diagonal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/0/1024        697 ns          697 ns      1003000 Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/1/1024        912 ns          912 ns       760673 Vertical
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>/2/1024       1042 ns         1042 ns       629106 Diagonal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_BigO         0.85 N          0.85 N     Horizontal
BM_RayCasting2d_BaselineRaycast<StaticOccupancyGrid>_RMS            19 %            19 %     Horizontal
BM_RayCasting2d<BaselineGrid>/0/128                                168 ns          168 ns      4422941 Horizontal
BM_RayCasting2d<BaselineGrid>/1/128                                164 ns          164 ns      4427442 Vertical
BM_RayCasting2d<BaselineGrid>/2/128                                187 ns          187 ns      3736949 Diagonal
BM_RayCasting2d<BaselineGrid>/0/256                                299 ns          299 ns      2330043 Horizontal
BM_RayCasting2d<BaselineGrid>/1/256                                302 ns          302 ns      2340612 Vertical
BM_RayCasting2d<BaselineGrid>/2/256                                359 ns          359 ns      1950424 Diagonal
BM_RayCasting2d<BaselineGrid>/0/512                                584 ns          584 ns      1186736 Horizontal
BM_RayCasting2d<BaselineGrid>/1/512                                584 ns          584 ns      1199741 Vertical
BM_RayCasting2d<BaselineGrid>/2/512                                704 ns          703 ns       995023 Diagonal
BM_RayCasting2d<BaselineGrid>/0/1024                              1154 ns         1152 ns       611503 Horizontal
BM_RayCasting2d<BaselineGrid>/1/1024                              1219 ns         1213 ns       573371 Vertical
BM_RayCasting2d<BaselineGrid>/2/1024                              1472 ns         1469 ns       474615 Diagonal
BM_RayCasting2d<BaselineGrid>_BigO                                1.25 N          1.24 N     Horizontal
BM_RayCasting2d<BaselineGrid>_RMS                                   13 %            13 %     Horizontal
BM_RayCasting2d<StaticOccupancyGrid>/0/128                         160 ns          160 ns      4367932 Horizontal
BM_RayCasting2d<StaticOccupancyGrid>/1/128                         156 ns          156 ns      4391840 Vertical
BM_RayCasting2d<StaticOccupancyGrid>/2/128                         173 ns          172 ns      4083535 Diagonal
BM_RayCasting2d<StaticOccupancyGrid>/0/256                         294 ns          294 ns      2386787 Horizontal
BM_RayCasting2d<StaticOccupancyGrid>/1/256                         296 ns          296 ns      2362053 Vertical
BM_RayCasting2d<StaticOccupancyGrid>/2/256                         328 ns          328 ns      2125891 Diagonal
BM_RayCasting2d<StaticOccupancyGrid>/0/512                         594 ns          594 ns      1211068 Horizontal
BM_RayCasting2d<StaticOccupancyGrid>/1/512                         579 ns          579 ns      1190939 Vertical
BM_RayCasting2d<StaticOccupancyGrid>/2/512                         640 ns          640 ns      1085082 Diagonal
BM_RayCasting2d<StaticOccupancyGrid>/0/1024                       1136 ns         1136 ns       615735 Horizontal
BM_RayCasting2d<StaticOccupancyGrid>/1/1024                       1182 ns         1181 ns       588810 Vertical
BM_RayCasting2d<StaticOccupancyGrid>/2/1024                       1274 ns         1274 ns       547307 Diagonal
BM_RayCasting2d<StaticOccupancyGrid>_BigO                         1.17 N          1.17 N     Horizontal
BM_RayCasting2d<StaticOccupancyGrid>_RMS                             6 %             6 %     Horizontal

The different variants are more similar now, and BM_RayCasting2d_BaselineRaycast doesn't seem as good.

@nahueespinosa nahueespinosa changed the title Add raycast microbenchmark Extend raycast microbenchmarks May 31, 2023
Signed-off-by: Nahuel Espinosa <nespinosa@ekumenlabs.com>
Signed-off-by: Nahuel Espinosa <nespinosa@ekumenlabs.com>
Signed-off-by: Nahuel Espinosa <nespinosa@ekumenlabs.com>
Signed-off-by: Nahuel Espinosa <nespinosa@ekumenlabs.com>
@nahueespinosa nahueespinosa force-pushed the nahuel/add-raycast-benchmark-baseline branch from afeaace to aa84148 Compare May 31, 2023 20:20
@nahueespinosa nahueespinosa merged commit 9436030 into main May 31, 2023
3 checks passed
@nahueespinosa nahueespinosa deleted the nahuel/add-raycast-benchmark-baseline branch May 31, 2023 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp Related to C++ code enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants