-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switched to in-place update of the diagonal Hessian #337
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, speedup is not as large as we expected. 3-4% rather than 10% ? Although esp. in Jacobianfactor there is room to be even smarter and reduce mallocs to n (number of variables) rather than m (number of factors).
gtsam/linear/JacobianFactor.cpp
Outdated
@@ -554,9 +560,12 @@ VectorValues JacobianFactor::hessianDiagonal() const { | |||
model_->whitenInPlace(column_k); | |||
dj(k) = dot(column_k, column_k); | |||
} | |||
d.emplace(j, dj); | |||
if(d.exists(j)) { | |||
d.at(j) += dj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could be a bit smarter still and avoid the malloc on line 556. Since we know we're going to add, it is already allocated or we'll have to emplace which has a new malloc.
It’s 3%-4% of the SEQUENTIAL_CHOLESKY time, should be more with EIGEN_CHOLESKY |
gtsam/linear/JacobianFactor.cpp
Outdated
@@ -560,8 +560,9 @@ void JacobianFactor::hessianDiagonalAdd(VectorValues& d) const { | |||
model_->whitenInPlace(column_k); | |||
dj(k) = dot(column_k, column_k); | |||
} | |||
if(d.exists(j)) { | |||
d.at(j) += dj; | |||
auto item = d.find(j); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think we can do even better! Currently, we do a find (one traversal) and another if it's not there (emplace). By always doing emplace and inspecting its return value we only do one traversal per key:
If the function successfully inserts the element (because no equivalent element existed already in the map), the function returns a pair of an iterator to the newly inserted element and a value of true.
Otherwise, it returns an iterator to the equivalent element within the container and a value of false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* ************************************************************************* */
VectorValues::iterator VectorValues::emplace(Key j, const Vector& value) {
#ifdef TBB_GREATER_EQUAL_2020
std::pair<iterator, bool> result = values_.emplace(j, value);
#else
std::pair<iterator, bool> result = values_.insert(std::make_pair(j, value));
#endif
if(!result.second)
throw std::invalid_argument(
"Requested to emplace variable '" + DefaultKeyFormatter(j)
+ "' already in this VectorValues.");
return result.first;
}
This will throw if value exists. Should I make a function to access the inner values_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, dang! I had no idea the semantics of our emplace were different. That’s actually terrible :-) I think you should change this to an in-line straight call to emplace in the header, and just return the result. Might have to check current uses of emplace but I suspect none of them actually use the return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the aim is to avoid the allocation of Vector dj(nj)
? If so then the emplace will need a new Vector, thus resummoning the allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - either use the memory already found by emplace in the tree, or use the newly allocated memory it just created (if the key was not in the tree yet). Either way, emplace will give you a reference to the memory, so no allocation should be necessary in your code. and the tree is traversed only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is try_emplace
, but only with C++17/////////
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need it. emplace will forward arguments, right. So, for normal maps this should work:
size_t nj = ...
auto item = emplace(j, nj);
auto& dj = *item.first;
if (item.second) dj.setZero();
for () {
dj(k) += ...
}
Try and work it out entirely before sending next comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the way we are using emplace
is probably wrong. The argument to emplace
is the argument to the object constructor, so basically we are calling the copy constructor all the time when we have that VectorValues::emplace
indirection :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There are some strange errors:
|
Unit test emplace
On Tue, Jun 2, 2020 at 21:44 Fan Jiang ***@***.***> wrote:
There are some strange errors:
94/181 Test #94: testJacobianFactor .................***Failed 0.09 sec
Not equal:
expected:
: 3 elements
5: 1 1 1
10: 4 4 4
15: 9 9 9
actual:
: 3 elements
5: 0.999998 0.999998 0.999998
10: 4 4 4
15: 9 9 9
98/181 Test #98: testRegularJacobianFactor ..........***Failed 0.08 sec
Not equal:
expected:
: 3 elements
0: 4 4 4
1: 16 16 16
2: 36 36 36
actual:
: 3 elements
0: 4 4 4
1: 16 16 16
2: 36 36 36
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#337 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQHGSMUGEFZ43LT6MG22GTRUWTGXANCNFSM4NQ4XUOA>
.
--
Best !
Frank Dellaert
http://frank.dellaert.com
|
@dellaert Fixed. Now all unit tests should pass. |
New timing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think we see eye to eye now ;-) Curious to see whether all this skullduggery made any difference in timing :-)
Some more things to cleanup, but feel free to merge after CI of those fixes pans out.
gtsam/linear/linearAlgorithms-inst.h
Outdated
auto result = collectedResult.emplace(*frontal, solution.segment(vectorPosition, c.getDim(frontal))); | ||
if(!result.second) | ||
throw std::invalid_argument( | ||
"Requested to emplace variable '" + DefaultKeyFormatter(*frontal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error message is unhelpful in this context. Maybe: std::runtime_error("Internal error while optimizing clique.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return d; | ||
} | ||
|
||
/// Return the diagonal of the Hessian for this factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@dellaert 1:14 with |
Wait, I don't understand. It's 73s now, compared to 82s when you started the PR (yes!!!! > 10% saving !?), but with what solver? Note Eigen is not in develop yet, so the description of this PR should state the savings for using an existing solver, e.g. SEQUENTIAL_CHOLESKY. Could you update the PR description with a before and after for SEQUENTIAL_CHOLESKY? |
@dellaert I updated the benchmark with proper warm-up (for filling CPU cache). Can see about 1s of improvement in runtime. |
Hmm. That's disappointing. But, on the other hand, why is "warm-started?" the right benchmark? People typically run optimizations once. |
@dellaert Because that will eliminate the disturbance caused by other processes filling the CPU cache, etc. https://engineering.appfolio.com/appfolio-engineering/2017/5/2/what-about-warmup |
I understand warm-up :-) But my comment is that it does not apply, and we should cold-start. |
I did another optimization to further reduce heap allocation. Now the benchmark:
|
@ProfFan awesome! But (a) could you update the description with cold timing? (b) the build seems to be failing :-/ |
updated and fixed the GCC issues :) |
@dellaert Should I merge this in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update description with cold timing. I'll then do a last review
@dellaert Already done, see top |
BTW, in profile the amount of total execution time consumed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I noticed something in this last review:
virtual VectorValues hessianDiagonal() const = 0;
No longer needs to be abstract. It's the same in all derived classes :-) Just make it concrete and remove all copies in derived?
@dellaert It's possible but not trivial - |
Can you not add a .cpp file ? |
@dellaert Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay !
Awesome ! Let's do some SwiftFusion now, pick up Eigen back up after we meet w those folks :-) |
Benchmark with
SEQUENTIAL_CHOLESKY
Warmed-up timing
Before:
After:
Cold timing
Before:
After:
This change is