Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization slowdown since 4.1.0 with python #1165

Closed
shteren1 opened this issue Apr 10, 2022 · 13 comments · Fixed by #1168
Closed

optimization slowdown since 4.1.0 with python #1165

shteren1 opened this issue Apr 10, 2022 · 13 comments · Fixed by #1168

Comments

@shteren1
Copy link

shteren1 commented Apr 10, 2022

Description

Since gtsam=4.1.0 i notice slow down of factor x2-3 when running a LM optimization on standard Visual Slam problem.

Steps to reproduce

  1. create two new python virtual environment, one with gtsam=4.0.3 pip package, and one with any of the packages >=4.1.0
  2. copy the following file somewhere locally: https://github.com/shteren1/gtsam/blob/python_timing/python/gtsam_unstable/tests/python_timing.py
  3. run the script and see the timing, i got ~9 seconds using gtsam=4.0.3 and 22 seconds using 4.2a5

This script generate 20 uniform poses on a circle of radius 20m all facing towards the center, it then generate 1000 landmarks around the center of the circle and generate measurements for them in each of the cameras represented by the poses.
it adds the generated landmarks positions + some random noise and poses position + random noise as initial guess and optimizes.
The script is compatible with both versions of gtsam before and after 4.1

Expected behavior

runtime should be nearly the same, or ideally even better with newer versions

Environment

tested on ubuntu 18.04 using clean python virtual environment
EDIT:
added priors on the first pose and landmark that were missing, the timing diff is x4 now instead of the x2.5 it was before. and its possible to use GN and DL optimizers also, without the prior they were throwing indeterminant linear system error.

@shteren1 shteren1 changed the title optimization slowdown since 4.1.0 optimization slowdown since 4.1.0 with python Apr 10, 2022
@dellaert
Copy link
Member

Could you share the timings for just the optimizer call in line 76? That will disambiguate whether this is a graph construction issue with python wrapper, or a true slowdown in optimization.

@shteren1
Copy link
Author

@dellaert here are the results from those lines of code at the bottom, the timer is sampling only the optimizer.optimize() call:

t1 = time()
result = optimizer.optimize()
print(f"optimizer ran: {optimizer.iterations()} iterations, time took: {time() - t1} seconds, initial error: {error0}, final error: {graph.error(result)}")

gtsam=4.2a5:
optimizer ran: 63 iterations, time took: 18.369078636169434 seconds, initial error: 7505.995184644437, final error: 5742.6712427961265

gtsam=4.0.3:
optimizer ran: 63 iterations, time took: 4.867237329483032 seconds, initial error: 7505.995184644437, final error: 5742.671242796135

@dellaert
Copy link
Member

Wow! Well, that's certainly annoying. @ProfFan ?

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 12, 2022

Could be a problem with the switch from boost pool allocator. I already have some machinery for micro profiling, but that would need to be after end-of-semester stuff....
I fixed it!

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 12, 2022

OK, most time is wasted in throwing bad_casts....
image

@shteren1 Change L69 to values.insert_point3 immediately gives me a 4 sec execution time

@shteren1
Copy link
Author

@ProfFan Thanks! i wasn't aware of values.insert_point3 option at all, it didn't exist in 4.0.3 and i didn't see it in any example.
after the pull request this won't be necessary right?
i also tried using insert_pose3 for the poses but it doesn't seem to make any difference on the runtime, why is that different than point3? because the poses are still gtsam class and not eigen vector like point3?

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 12, 2022

@shteren1 You don't need to change any code, the PR will fix the slowdown :)

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 12, 2022

Yeah, the reason is that because we need to handle arbitrary sized matrices and vectors, but not arbitrary sized Poses, so the poses code is all fixed-sized and good....

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 12, 2022

So the final verdict is that if you use the insert_point3 (which is guaranteed static 3x1 vector) you get 4.50s of runtime
If you use insert after the fix PR you get 4.80s of runtime. The difference is caused by 2 extra dynamic_casts.

So, if your code is not too sensitive to performance, just use insert for everything is fine.

@dellaert
Copy link
Member

I think we should not point people to insert_point3, which is an accidental name generated by pybind11. I would support adding an explicit insertPoint2 and insertPoint3 and properly document their reason for existence. @ProfFan , would you be willing to add that in the same PR?

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 14, 2022

@shteren1 I think you can now use insertPoint3 if you build from develop

@shteren1
Copy link
Author

shteren1 commented Apr 14, 2022

@ProfFan thanks! i'll try it out asap.

btw, how did you run that profiling, it looks usefull.

Edit:
complied the new source and all seems to work, using regular insert still results in reasonable runtime, and using insertPoint3 works and shaves off 4-5% of the runtime.

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 15, 2022

@shteren1 That one is done using Apple's Instruments app. Just compile and use instruments to run the resulting binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants