Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make C++ RigidBodyManipulator methods capable of handling double, TrigPoly, AutoDiffScalar inputs #1199

Closed
tkoolen opened this issue Jul 22, 2015 · 21 comments

Comments

@tkoolen
Copy link
Contributor

tkoolen commented Jul 22, 2015

Opening an official issue for this.

Having this capability would allow us to get rid of the RigidBodyManipulator code duplication between Matlab and C++ and make things faster.

@tkoolen
Copy link
Contributor Author

tkoolen commented Jul 23, 2015

I started investigating this a bit. I think there are currently two roadblocks:

  1. Kinematics cache variables are not templated. See Separate out doKinematics cache variables into a class #1167 for a solution for this.
  2. DrakeJoint methods such as qdot2v, jointTransform, ... are not templated.

I'm now of the opinion that Number 2 should be tackled first. What makes it tricky is that the methods are currently virtual, and it's not possible to have a templated virtual method.

I've mocked up two possible solutions for this. See https://www.dropbox.com/s/hf8vd8i54l5bubm/DrakeJointMockup.zip?dl=0.

Option 1 comes from a StackOverflow answer. I really like Option 2, which uses the curiously recurring template pattern (CRTP). Don't be afraid, it's really not all that hard to understand. Eigen uses it too. Basically, the inheritance diagram is:
DrakeJoint <-- DrakeJointImpl<FixedAxisOneDoFJoint> <-- FixedAxisOneDoFJoint

  • DrakeJoint defines the interface: it has non-templated, pure virtual methods that specify exactly which matrix types are accepted.
  • FixedAxisOneDoFJoint has a templated, non-virtual method that works for any matrix type
  • DrakeJointImpl<FixedAxisOneDoFJoint> implements all of the pure virtual methods of DrakeJoint by calling the templated function in FixedAxisOneDoFJoint on a field called derived.

Advantages of this approach:

  • both the interface and the concrete implementation are not templated, only the intermediate layer is, which is nice for the user
  • there is no code duplication in the implementation of the methods (one templated method in the subclass vs. reimplementing the same method for each input type)
  • the interface is clearly defined in DrakeJoint; it is easy to see which matrix types can be passed into the methods

@RussTedrake
Copy link
Contributor

Awesome.

I can't look at the zip easily (on my phone), but the design pattern you've described below seems perfectly reasonable to me. The Impl class feels pretty analogous to an explicit instantiation.

  • Russ

On Jul 22, 2015, at 8:33 PM, Twan Koolen notifications@github.com wrote:

I started investigating this a bit. I think there are currently two roadblocks:

Kinematics cache variables are not templated. See #1167 for a solution for this.
DrakeJoint methods such as qdot2v, jointTransform, ... are not templated.
I'm now of the opinion that Number 2 should be tackled first. What makes it tricky is that the methods are currently virtual, and it's not possible to have a templated virtual method.

I've mocked up two possible solutions for this. See https://www.dropbox.com/s/hf8vd8i54l5bubm/DrakeJointMockup.zip?dl=0.

Option 1 comes from a StackOverflow answer. I really like Option 2, which uses the curiously recurring template pattern (CRTP). Don't be afraid, it's really not all that hard to understand. Eigen uses it too. Basically, the inheritance diagram is:
DrakeJoint <-- DrakeJointImpl <-- FixedAxisOneDoFJoint

DrakeJoint defines the interface: it has non-templated, pure virtual methods that specify exactly which matrix types are accepted.
FixedAxisOneDoFJoint has a templated, non-virtual method that works for any matrix type
DrakeJointImpl implements all of the pure virtual methods of DrakeJoint by calling the templated function in FixedAxisOneDoFJoint on a field called derived.
Advantages of this approach:

both the interface and the concrete implementation are not templated, only the intermediate layer is, which is nice for the user
there is no code duplication in the implementation of the methods (one templated method in the subclass vs. reimplementing the same method for each input type)
the interface is clearly defined in DrakeJoint; it is easy to see which matrix types can be passed into the methods

Reply to this email directly or view it on GitHub.

@psiorx
Copy link
Contributor

psiorx commented Jul 23, 2015

I like option 2 more.

P.S. @RussTedrake if you want to take a look at the stack overflow answer that outlines option 1 without opening the zip file, here's the link http://stackoverflow.com/a/5872633/2228557

@RussTedrake
Copy link
Contributor

Thanks John. I also like option 2 more.

On Jul 23, 2015, at 11:30 AM, John Carter notifications@github.com wrote:

I like option 2 more.

P.S. @RussTedrake https://github.com/RussTedrake if you want to take a look at the stack overflow answer that outlines option 1 without opening the zip file, here's the link http://stackoverflow.com/a/5872633/2228557 http://stackoverflow.com/a/5872633/2228557

Reply to this email directly or view it on GitHub #1199 (comment).

@tkoolen
Copy link
Contributor Author

tkoolen commented Jul 23, 2015

I just put it on github here: https://github.com/tkoolen/DrakeJointMockup

Good, unless there are any more comments today I'll start working on it.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 6, 2015

Some thoughts on this issue after experimenting a bit.

We should use fixed size vectors for derivatives wherever possible

Gradient computations using AutoDiffScalar<Some fixed size vector type> can be extremely efficient, as fast as user-supplied gradients or even faster. However, gradient computations using AutoDiffScalar<VectorXd> are much slower due to lots and lots of expensive heap allocations (https://forum.kde.org/viewtopic.php?f=74&t=126383). For example, any time you create a local variable m in a function that is computed based on an AutoDiffScalar, m.size() heap allocations need to happen to create the derivative vectors. I experimented with a simple method, RollPitchYawJoint::motionSubspace. Results:

Method Time (microseconds)
User gradients (current implementation, optimized a little) 0.31
Gradient computation using AutoDiffScalar<VectorXd> input 7.14
Gradient computation using AutoDiffScalar<Matrix<double, 6, 1>> input 0.21

So incredibly, AutoDiffScalar gradient computation is actually faster than user gradients in this case, as long as you use fixed size vectors!

The question is how to handle higher level methods, where gradients w.r.t. dynamic-size vectors such as q and v are desired.

We could consolidate Matrix<AutoDiffScalar<...>> and GradientVar<...>

Our GradientVar is kind of the storage-only version of a matrix of AutoDiffScalars. I was thinking it might be good to consolidate by changing our GradientVars to matrices of AutoDiffScalars. This would make it easier to write gradients for new code, as you can seemlessly switch between user-provided gradients and TaylorVar-style gradients. This would clean up gradient computations a lot; possibly even obviating the need for a lot of the user gradient computations we have.

If there is a lot of sparsity to exploit, one could still write a template specialization for a function with AutoDiffScalar arguments.

Comparing to Matlab, this would be like replacing the multiple outputs of various functions by a single TaylorVar output and using the TaylorVar field df to store all gradients, be they computed by combining TaylorVars or set directly by the user.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 14, 2015

@psiorx and I made some good progress on this this week. See https://github.com/tkoolen/drake/tree/tk-kinematics-cache

We pulled the kinematics cache out of RigidBody and RigidBodyManipulator and put it into template<typename Scalar> KinematicsCache<Scalar>

As an intermediate step, we had a KinematicsCache<double> member of RigidBodyManipulator. This is not viable long-term however, since you still can't call doKinematics with q and v scalar types other than double. So we changed doKinematics so that it returns a KinematicsCache<typename DerivedQ::Scalar>, and we changed the methods in RBM that rely on the kinematics cache so that they explicitly require a KinematicsCache<Scalar> input. doKinematicsmex now returns a DrakeMexPointer to a KinematicsCache. I have a few tests to fix still, but most are already passing!

I'll need to do some profiling to see whether the continuous KinematicsCache allocation and deallocation is too expensive. There are workarounds if that is the case.

Note that this also resolves #1167 and allows multiple 'active' mex kinsols at the same time, which is great for trajectory optimization.

@RussTedrake
Copy link
Contributor

it would be amazing to zap all of the gradient code and use autodiff scalar everywhere. it would be really good to understand how that 7.14 number could change if you were to really scrutinize the motionSubspace code to avoid any internal dynamic memory allocation. (and it will be really good to test it on the kinematics/dynamics methods!)

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 27, 2015

I just got the first TaylorVar mex call to work!
See https://github.com/tkoolen/drake/tree/tk-taylorvar-support

I can compute the gradient of a mass matrix by calling a mex function with TaylorVar inputs, and the gradient matches the numerical gradient!

@avalenzu
Copy link
Contributor

Awesome! What's the speed like?

On Wed, Aug 26, 2015, 8:40 PM Twan Koolen notifications@github.com wrote:

I just got the first TaylorVar mex call to work!
See https://github.com/tkoolen/drake/tree/tk-taylorvar-support

I can compute the gradient of a mass matrix by calling a mex function with
TaylorVar inputs, and the gradient matches the numerical gradient!


Reply to this email directly or view it on GitHub
#1199 (comment)
.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 27, 2015

Haven't tested yet, I'll let you know.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 27, 2015

Here's a comparison between the various C++ methods for computing the mass matrix and its gradient for Atlas.

Method Total time (ms) doKinematicsmex time (ms) massMatrixmex time (ms)
User gradients 3.79 1.25 1.95
Gradient computation using AutoDiffScalar<VectorXd> 11.64 6.95 3.10
Gradient computation using AutoDiffScalar<Matrix<double, 36, 1>> 9.99 7.09 1.49

I included gradient computations with a fixed size derivative vector because Russ came up with the idea of fixing the number of states etc. at compile time if you know which URDF you're going to load. I'm not 100% happy with the last row of the table. It's weird that the time spent in doKinematicsmex doesn't decrease. I'll have to investigate this a little more. If we can get the doKinematicsmex time to decrease as much as the massMatrixmex time, this could be pretty cool.

I was also able to compute TaylorVar gradients of the user gradient output (i.e. second order gradients of the mass matrix) by the way.

@RussTedrake
Copy link
Contributor

Super super exciting to be at the point where you can get these numbers.

I'd be looking really hard for hidden dynamic allocations in the doKinematics case. Agreed that it would be amazing if we can get those times down.

Another option, perhaps, is to have template specialization and effectively user gradients, but using the autodiff data structure, for just the few methods where it makes a really big difference. If everything is in the autodiff data structure, then we could still use the templated autodiff inside the loop for eg the geometry transformations, and only have our user gradients at the level where we really know sometime special eg about the sparsity. That actually feels like the most likely and optimal conclusion of all this.

On Aug 27, 2015, at 12:02 AM, Twan Koolen notifications@github.com wrote:

Here's a comparison between the various C++ methods for computing the mass matrix and its gradient for Atlas.

Method Total time (ms) doKinematicsmex time (ms) massMatrixmex time (ms)
User gradients 3.79 1.25 1.95
Gradient computation using AutoDiffScalar 11.64 6.95 3.10
Gradient computation using AutoDiffScalar<Matrix<double, 36, 1>> 9.99 7.09 1.49
I included gradient computations with a fixed size derivative vector because Russ came up with the idea of fixing the number of states etc. at compile time if you know which URDF you're going to load. I'm not 100% happy with the last row of the table. It's weird that the time spent in doKinematicsmex doesn't decrease. I'll have to investigate this a little more. If we can get the doKinematicsmex time to decrease as much as the massMatrixmex time, this could be pretty cool.

I was also able to compute TaylorVar gradients of the user gradient output (i.e. second order gradients of the mass matrix) by the way.


Reply to this email directly or view it on GitHub.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 27, 2015

I just did some profiling. Caveat: I used Instruments, which apparently comes standard on OSX and seems awesome, but I haven't used it before so not sure how trustworthy it is yet.

Results:

  • we're spending most of our time copying the entire KinematicsCache object from the stack-allocated output of doKinematics to the heap-allocated version needed as persistent memory in a DrakeMexPointer. This was the quickest way to get it to work, but can obviously be improved by only creating the heap allocated version and passing it by reference to doKinematics.
  • another big chunk is constructing the KinematicsCache object in doKinematics itself. This could be mitigated by reusing an existing KinematicsCache from Matlab, e.g. by having the option of passing an old kinsol into Matlab doKinematics to be reused. This would also get rid of DrakeMexPointer creation/deletion overhead, although that appears to be minor.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 28, 2015

I think I spoke too soon in my previous comment. I misread; it's not copying the KinematicsCache object that takes a lot of time, it's just constructing a new one. See these profiling results (note that operator= only takes 3.7 percent of the time):
image

I stupidly construct a complete new KinematicsCache on the heap (so it already exists) and then call operator=:

  auto cache = new KinematicsCache<typename DrakeJoint::AutoDiffScalarFixedSize>(model->bodies, compute_gradients ? 1 : 0);
  *cache = model->doKinematics(q, v, compute_gradients, compute_Jdotv);

Calling the copy constructor should already be a lot better:

auto cache = new KinematicsCache<typename DrakeJoint::AutoDiffScalarFixedSize>(model->doKinematics(q, v, compute_gradients, compute_Jdotv));

but reusing an already constructed KinematicsCache object would be even better. I'll try all of this.

@tkoolen
Copy link
Contributor Author

tkoolen commented Sep 1, 2015

New results. I did this just in C++, so there's no longer any Matlab overhead. Still calling doKinematics and massMatrix. No longer creating a KinematicsCache object in the loop (reusing an existing one).

Method Total time (ms)
User gradients 3.63
Using AutoDiffScalar<VectorXd> 9.21
Using AutoDiffScalar<Matrix<double, 36, 1>> 1.22
Using AutoDiffScalar<Matrix<double, Dynamic, 1, 0, 100, 1>> 1.91

So using fixed size derivative types with AutoDiffScalar is in this case three times faster than our user gradients! I think it's likely that these results would be significantly different if we were to use fixed size types in our user gradients as well, but I'm pretty happy about this.

@RussTedrake
Copy link
Contributor

fantastic!

also super exciting is that the dynamic sizes are only 3x worse than the user gradients. if we keep seeing numbers like this, then i’d say it’s close enough that we should make the switch and never look back (especially since we could potentially help with the sparsity later).

On Sep 1, 2015, at 2:09 PM, Twan Koolen notifications@github.com wrote:

New results. I did this just in C++, so there's no longer any Matlab overhead. Still calling doKinematics and massMatrix. No longer creating a KinematicsCache object in the loop (reusing an existing one).

Method Total time (ms)
User gradients 3.63
Gradient computation using AutoDiffScalar 9.21
Gradient computation using AutoDiffScalar<Matrix<double, 36, 1>> 1.22
So using fixed size derivative types with AutoDiffScalar is in this case three times faster than our user gradients! I think it's likely that these results would be significantly different if we were to use fixed size types in our user gradients as well, but I'm pretty happy about this.


Reply to this email directly or view it on GitHub #1199 (comment).

@tkoolen
Copy link
Contributor Author

tkoolen commented Sep 1, 2015

I have a feeling that it would be hard to get the AutoDiffScalar<VectorXd> gradients down to the user gradient time by just exploiting sparsity... And getting down to the level of AutoDiffScalar with fixed size vectors seems impossible.

It poses a bit of a tough problem of weighing runtime against compile time if you have a bunch of robots with different size state vectors.

One other option we could explore is having fixed maximum size vectors for the derivative type, e.g. AutoDiffScalar<Matrix<double, Dynamic, 1, 0, 100, 1>>. These would still be stack-allocated, but would allow handling different robots with a single set of instantiations. I don't know what the overhead would be compared to using fixed size vectors, but I would imagine it's a lot better than using VectorXd, even if you grossly oversize the derivative vector. There are currently some compiler errors in our codebase when I try to use them, but I think I might be able to get them to work.

@RussTedrake
Copy link
Contributor

I think it’s possible we could save loads of time if we get the big part of the sparsity down. Or play some tricks to pre-allocate (dynamically, just not inside the loop). Almost certainly not down to the fixed size, but maybe to the user grads.

I still also like the idea of (eventually) templating the RBM class with num_positions / velocities, so that we can get the performance we want on a few special robots.

On Sep 1, 2015, at 2:59 PM, Twan Koolen notifications@github.com wrote:

I have a feeling that it would be hard to get the AutoDiffScalar gradients down to the user gradient time by just exploiting sparsity... And getting down to the level of AutoDiffScalar with fixed size vectors seems impossible.

It poses a bit of a tough problem of weighing runtime against compile time if you have a bunch of robots with different size state vectors.

One other option we could explore is having fixed maximum size vectors for the derivative type, e.g. AutoDiffScalar<Matrix<double, Dynamic, 1, 0, 100, 1>>. These would still be stack-allocated, but would allow handling different robots with a single set of instantiations. I don't know what the overhead would be compared to using fixed size vectors, but I would imagine it's a lot better than using VectorXd, even if you grossly oversize the derivative vector. There are currently some compiler errors in our codebase when I try to use them, but I think I might be able to get them to work.


Reply to this email directly or view it on GitHub #1199 (comment).

@tkoolen
Copy link
Contributor Author

tkoolen commented Sep 4, 2015

I worked around some issues with fixed max size AutoDiffScalars and added a new row to the table above. Fixed max size AutoDiff lands between fixed size AutoDiff and user gradients.

So I think that using fixed maximum size matrices is a very viable option, as it's still pretty fast while limiting the number of template method instantiations required. For most stuff, you could use the instantiation for fixed max size AutoDiffScalar up to maybe size 100, and for larger derivative vector sizes you can fall back to using AutoDiffScalar<VectorXd>.

I still need to investigate the issues I had to work around more. Hopefully I can track them down and propose a fix to the Eigen devs.

@RussTedrake
Copy link
Contributor

i believe we can consider this one resolved. (which is awesome!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants