-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support thread-safe const marginalCovariance #460
Conversation
…added copy/copy assignment constructors to allow for full previous functionality while adding thread safety to marginalCovariance.
We are playing around with this change at work, and thought we ought to submit it upstream since it might benefit others. We still need to benchmark optimization to make sure there's no impact, but I think it should be fine. Ptal @dellaert? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Would be nice to have the profiling to back up the claim it does not slow down single-thread. I don't expect it.
Also, for multi-threading we have used TBB, and this is std::mutex. No interference? Again, just a question to ponder and document in the PR
@@ -75,10 +76,28 @@ namespace gtsam { | |||
/** Construct from a conditional, leaving parent and child pointers uninitialized */ | |||
BayesTreeCliqueBase(const sharedConditional& conditional) : conditional_(conditional), problemSize_(1) {} | |||
|
|||
/** Shallow copy constructor */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rationale for these in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::mutex has a deleted copy constructor, but BayesTreeCliqueBase implements the copy/copy assignment constructors (by default) as a shallow copy of the node in tree. To maintain the same interface as the previous BayesTreeCliqueBase, we have to implement a copy and copy assignment constructor which doesn't try and copy the cache mutex. This can still happen async because we also don't attempt to copy the cachedSeperatorMarginal, so we don't mess up the mutual exclusion of it.
Looks like the TBB-specific code was also migrated to |
Ok so we ran a threaded optimization benchmark over a factor graph with 141471 nodes and 1013966 factors and the execution times were: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! BTW that’s slow even for a large graph. Did you initialize with InitializePose3?
Minor nitpick. @Cyberdr8gon can you please ensure your commits are being mapped to your github username? I guess you need to update the This can help us down the road if we ever have to rework this (non-trivial) code in the future. |
@varunagrawal Whoops, sorry about that. If I make a PR in the future, I will ensure this is the case. Rest assured that each commit has my work email attached to it (see https://github.com/borglab/gtsam/commit/9564c4873b62ce7a4b6e3a467da10f1ad1138b5d.patch) which will be active for the foreseeable future. Also, @chrisbeall should be able to answer any questions I can about this change. Hopefully this PR serves to document me as a contact related to these commits. |
@dellaert With regards to initializing with InitializePose3, I believe we did not, I will have to look into that. Thanks for the tip! |
Looks like this is ready to go. I'll merge. |
Currently in gtsam::Marginals, the function marginalCovariance is const, which implies to most readers that the function is likely thread safe. However, internally, the BayesTreeCliqueBase uses a mutable cachedSeperatorMarginal which makes the function not thread-safe.
As a result, one needs to either create a new Marginals object for each thread (cannot exploit caching at all resulting in very slow operation) or control access to the Marginals object using a external mutex (also very slow when calculating many marginalCovariances). Instead, adding a mutex to the cachedSeperatorMarginal variable allows for a very large speed up in parallel computation of marginalCovariances on a single graph as the remaining computation other then the cachedSeperatorMarginal is non-trivial by itself and further computations can reuse each others work.
In summary, applying the proposed patch: