Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassie bench gbench #13782

Merged

Conversation

rpoyner-tri
Copy link
Contributor

@rpoyner-tri rpoyner-tri commented Jul 31, 2020

This PR subsumes a snapshot of #13770. I'm publishing it so we can compare the two benchmark programs and discuss how to go forward.


This change is Reviewable

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@sherm1 for discussion and possible feature review.
@mposa FYI

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):
Interesting. It appears that the number of malloc/free calls is greater when cassie_bench is built -c dbg than for release builds.


@sherm1
Copy link
Member

sherm1 commented Jul 31, 2020

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Interesting. It appears that the number of malloc/free calls is greater when cassie_bench is built -c dbg than for release builds.

Yes, that's true. The reason is a debug-only check in the caching system. I had to comment that out to get meaningful heap results in Debug mode. The offending method is CacheEntry::CheckValidAbstractValue(), invoked from CacheEntry::Calc() as DRAKE_ASSERT_VOID(CheckValidAbstractValue(*value)); (line 50 in cache_entry.cc).


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

Yes, that's true. The reason is a debug-only check in the caching system. I had to comment that out to get meaningful heap results in Debug mode. The offending method is CacheEntry::CheckValidAbstractValue(), invoked from CacheEntry::Calc() as DRAKE_ASSERT_VOID(CheckValidAbstractValue(*value)); (line 50 in cache_entry.cc).

aha. good to know.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):
I'm seeing another debug-specific artifact locally. (Probably masked in the original program by timeouts in CI and general slowness.) The autodiff forward and reverse dynamics tests are throwing assertions about vector size in debug mode. Still investigating.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

I'm seeing another debug-specific artifact locally. (Probably masked in the original program by timeouts in CI and general slowness.) The autodiff forward and reverse dynamics tests are throwing assertions about vector size in debug mode. Still investigating.

Confirmed the same problems occur in the original program. Pushing some small tweaks to help with debugging.


Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Confirmed the same problems occur in the original program. Pushing some small tweaks to help with debugging.

I believe I figured out what's broken here -- not a bug in AutoDiff or MBP but a bug in the way the test cases are using math::initializeAutoDiff(). For forward dynamics it creates x with 32 derivatives and u with 10 derivatives and then mixes them in the same computation. Everything has to have the same number of derivatives, and that number is an optional parameter to initializeAutoDiff(). Doing this works:

    x = VectorXd::Constant(2 * nq, i);
    u = VectorXd::Constant(nu, i);
    multibody_context_autodiff->FixInputPort(
        multibody_plant_autodiff->get_actuation_input_port().get_index(),
        math::initializeAutoDiff(u, 2 * nq + nu, 2 * nq));
    multibody_plant_autodiff->SetPositionsAndVelocities(
        multibody_context_autodiff.get(),
        math::initializeAutoDiff(x, 2 * nq + nu));

Then the subsequent CalcTimeDerivatives() call (which normally return ẋ of size 32) now returns value=ẋ and derivatives ∂ẋ/∂x,u (size 32 x 42).

A similar fix is needed for the inverse dynamics test:

    x = VectorXd::Constant(2 * nq, i);
    desired_vdot = VectorXd::Constant(nv, i);
    multibody_plant_autodiff->SetPositionsAndVelocities(
        multibody_context_autodiff.get(), math::initializeAutoDiff(x, 2*nq + nv));
    multibody_plant_autodiff->CalcInverseDynamics(
        *multibody_context_autodiff,
        math::initializeAutoDiff(desired_vdot, 2*nq + nv, 2*nq),
        external_forces_autodiff);

These may not be the right fixes for @mposa's purposes since they are now returning some odd-shaped derivatives, but they should allow performance investigation of autodiff.


Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 33 at r3 (raw file):

  using benchmark::Fixture::SetUp;
  void SetUp(const benchmark::State&) override {
    plant_ = builder_.AddSystem<MultibodyPlant>(0);

BTW did you mean to invoke benchmark::Fixture::SetUp() here?

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

aha. good to know.

@rpoyner-tri I made a few hacks locally to the BUILD file and cassie_bench.cc and now it (a) produces a Debug configuration in CLion so I can Debug in the IDE, and (b) runs to completion in Debug. What is the best way to get those to you?


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

@rpoyner-tri I made a few hacks locally to the BUILD file and cassie_bench.cc and now it (a) produces a Debug configuration in CLion so I can Debug in the IDE, and (b) runs to completion in Debug. What is the best way to get those to you?

Couple of possibilities.

You can try adding my fork as a remote and pushing to the branch of this PR. I'm not sure that will work, but it might.

You can emit your changes as a patch file with git format-patch and email it to me.

You could also just paste diffs into a slack snippet. I should be able to download that and figure it out.


a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

I believe I figured out what's broken here -- not a bug in AutoDiff or MBP but a bug in the way the test cases are using math::initializeAutoDiff(). For forward dynamics it creates x with 32 derivatives and u with 10 derivatives and then mixes them in the same computation. Everything has to have the same number of derivatives, and that number is an optional parameter to initializeAutoDiff(). Doing this works:

    x = VectorXd::Constant(2 * nq, i);
    u = VectorXd::Constant(nu, i);
    multibody_context_autodiff->FixInputPort(
        multibody_plant_autodiff->get_actuation_input_port().get_index(),
        math::initializeAutoDiff(u, 2 * nq + nu, 2 * nq));
    multibody_plant_autodiff->SetPositionsAndVelocities(
        multibody_context_autodiff.get(),
        math::initializeAutoDiff(x, 2 * nq + nu));

Then the subsequent CalcTimeDerivatives() call (which normally return ẋ of size 32) now returns value=ẋ and derivatives ∂ẋ/∂x,u (size 32 x 42).

A similar fix is needed for the inverse dynamics test:

    x = VectorXd::Constant(2 * nq, i);
    desired_vdot = VectorXd::Constant(nv, i);
    multibody_plant_autodiff->SetPositionsAndVelocities(
        multibody_context_autodiff.get(), math::initializeAutoDiff(x, 2*nq + nv));
    multibody_plant_autodiff->CalcInverseDynamics(
        *multibody_context_autodiff,
        math::initializeAutoDiff(desired_vdot, 2*nq + nv, 2*nq),
        external_forces_autodiff);

These may not be the right fixes for @mposa's purposes since they are now returning some odd-shaped derivatives, but they should allow performance investigation of autodiff.

Nice. I have followup questions, but they will have to wait until Monday.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 33 at r3 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW did you mean to invoke benchmark::Fixture::SetUp() here?

I don't think it's strictly necessary; it's implementation is empty. The reason for the using statement is because of the unpleasant way in which gbench deprecated the const-parameter versions of the method. Without that, compilations founder on some obscure error. i can recover the details and write a comment if you like. It will read like the sadness of the world in miniature, but such is the life of a programmer.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Nice. I have followup questions, but they will have to wait until Monday.

Followup #1: a bunch of this code is using 2*nq, when (for example) the documented contents of the parameter to SetPositionsAndVelocities() is [q; v]. Do we just habitually assume that nq == nv? Should I be considering that sloppy, or a well-understood shorthand?


Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @rpoyner-tri and @sherm1)

a discussion (no related file):
Nice!, in my laptop
cassie_banchmark_test prints:
100000x mass matrix took 2198 ms. 21 us per.

while cassie_bench prints:
DoubleFixture/DoubleMassMatrix 21486 ns 21485 ns 32337

Trying to conciliate those numbers, scaling by 100000/32337 doesn't make them similar. Anything else?
I do get a warning; "WARNING CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.", not sure if related.


Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @rpoyner-tri and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Followup #1: a bunch of this code is using 2*nq, when (for example) the documented contents of the parameter to SetPositionsAndVelocities() is [q; v]. Do we just habitually assume that nq == nv? Should I be considering that sloppy, or a well-understood shorthand?

Sloppy -- should be nq + nv and in general we have nq > nv (because of quaternions). We should actually have nq > nv here -- the only reason we don't is that the original benchmark adds a weld to the Cassie model. Might be a slightly better benchmark with a free-floating base so that quaternion code gets brought in.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, amcastro-tri (Alejandro Castro) wrote…

Nice!, in my laptop
cassie_banchmark_test prints:
100000x mass matrix took 2198 ms. 21 us per.

while cassie_bench prints:
DoubleFixture/DoubleMassMatrix 21486 ns 21485 ns 32337

Trying to conciliate those numbers, scaling by 100000/32337 doesn't make them similar. Anything else?
I do get a warning; "WARNING CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.", not sure if related.

Note that Sherm's program is reporting variously milliseconds (ms) and microseconds (us); cassie_bench (via google benchmark) is reporting nanoseconds.

The "Time" column of cassie_bench is wall-time. If you want to stabilize that, you have to temporarily disable cpu scaling. The CPU column should be pretty stable in at least the first 2 sig-figs. Modern computers have a variety of sources of execution time noise. I suspect the timings reported by cassie_bench (with their observable variance) are close to the best we can do without real effort.


Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Note that Sherm's program is reporting variously milliseconds (ms) and microseconds (us); cassie_bench (via google benchmark) is reporting nanoseconds.

The "Time" column of cassie_bench is wall-time. If you want to stabilize that, you have to temporarily disable cpu scaling. The CPU column should be pretty stable in at least the first 2 sig-figs. Modern computers have a variety of sources of execution time noise. I suspect the timings reported by cassie_bench (with their observable variance) are close to the best we can do without real effort.

Is there a way to get Gbench to add a "cost per iteration" column to the table? Right now I'm getting:

Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
DoubleFixture/DoubleMassMatrix               24816 ns        24814 ns        22047
DoubleFixture/DoubleInverseDynamics          20655 ns        20649 ns        25176
DoubleFixture/DoubleForwardDynamics          50198 ns        50197 ns        10263
AutodiffFixture/AutodiffMassMatrix         3103131 ns      3102771 ns          232
AutodiffFixture/AutodiffInverseDynamics    3386391 ns      3386215 ns          194
AutodiffFixture/AutodiffForwardDynamics    5346707 ns      5346164 ns           95

with different numbers of iterations from run to run. That leaves me with arithmetic to do to get comparable timings for different runs.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

Sloppy -- should be nq + nv and in general we have nq > nv (because of quaternions). We should actually have nq > nv here -- the only reason we don't is that the original benchmark adds a weld to the Cassie model. Might be a slightly better benchmark with a free-floating base so that quaternion code gets brought in.

Cool. I will look more closely at the dimensions and attempt to clean them up.


a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

Is there a way to get Gbench to add a "cost per iteration" column to the table? Right now I'm getting:

Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
DoubleFixture/DoubleMassMatrix               24816 ns        24814 ns        22047
DoubleFixture/DoubleInverseDynamics          20655 ns        20649 ns        25176
DoubleFixture/DoubleForwardDynamics          50198 ns        50197 ns        10263
AutodiffFixture/AutodiffMassMatrix         3103131 ns      3102771 ns          232
AutodiffFixture/AutodiffInverseDynamics    3386391 ns      3386215 ns          194
AutodiffFixture/AutodiffForwardDynamics    5346707 ns      5346164 ns           95

with different numbers of iterations from run to run. That leaves me with arithmetic to do to get comparable timings for different runs.

You are getting the cost per iteration. Note that both Time and CPU are in nanoseconds.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

You are getting the cost per iteration. Note that both Time and CPU are in nanoseconds.

Try running cassie_bench --v=3 or so. That should make it a bit more clear what is happening.


Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Try running cassie_bench --v=3 or so. That should make it a bit more clear what is happening.

Oh, got it -- thanks! Looks great then.


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, sherm1 (Michael Sherman) wrote…

Oh, got it -- thanks! Looks great then.

Done.


Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @sherm1)

a discussion (no related file):

Is there a way to get Gbench to add a "cost per iteration"

I had the same question until @rpoyner-tri told us yesterday that the cost actually is per iteration. I wonder, would there be a way to reformat the table and/or titles to show "Time per iter" and/or "ns/iter"?


Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @amcastro-tri and @sherm1)

a discussion (no related file):

Previously, amcastro-tri (Alejandro Castro) wrote…

Is there a way to get Gbench to add a "cost per iteration"

I had the same question until @rpoyner-tri told us yesterday that the cost actually is per iteration. I wonder, would there be a way to reformat the table and/or titles to show "Time per iter" and/or "ns/iter"?

@amcastro-tri having not actually looked, I suspect that would require either reaching into the library code, or grabbing the data and formatting our own table. You can get the data "directly" as a JSON structure, so it's possible.

The question is: what is it worth? This is a pretty widely used framework, and the relative sizes of quantities, iterations, and actual runtime seem pretty clear to me.


Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 unresolved discussions, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @rpoyner-tri and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

@amcastro-tri having not actually looked, I suspect that would require either reaching into the library code, or grabbing the data and formatting our own table. You can get the data "directly" as a JSON structure, so it's possible.

The question is: what is it worth? This is a pretty widely used framework, and the relative sizes of quantities, iterations, and actual runtime seem pretty clear to me.

Fair. Mostly for my own education. I love how the benchmarks look, super clean.



examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 47 at r5 (raw file):

    nu_ = plant_->num_actuators();

    x_ = VectorXd::Zero(nq_ + nv_);

bad idea when the model has quaternions per f2f.

A solution is to use the plant's APIs:

const Joint<double>& joint = plant_->GetJointByName("hip_roll_left");
hip_roll_left_position_dof_index_ = joint.position_start;  // an int.

then in the loops below replace x(0) = i with x(hip_roll_left_position_dof_index_) = i


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 68 at r5 (raw file):

  int i = 0;
  for (auto _ : state) {
    x_(0) = i;

per f2f, most likely this corresponds to trashing the free base quaternion state.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 85 at r5 (raw file):

    desired_vdot = VectorXd::Constant(nv_, i);
    plant_->SetPositionsAndVelocities(context_.get(), x);
    plant_->CalcInverseDynamics(*context_, desired_vdot, external_forces);

maybe add TODO to add LimitMalloc calls here and elsewhere? per @sherm1's recent work.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 100 at r5 (raw file):

    x = VectorXd::Constant(nq_ + nv_, i + 1);
    u = VectorXd::Constant(nu_, i);
    context_->FixInputPort(plant_->get_actuation_input_port().get_index(), u);

how costly is this call?


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 125 at r5 (raw file):

// ceilings for allocations until that is addressed.
#ifdef NDEBUG
#define RELEASE_LIMIT_MALLOC(x) \

love it! could we add these to the "double" versions?

Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 unresolved discussions, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @rpoyner-tri and @sherm1)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 139 at r5 (raw file):

    x_(0) = i;
    plant_autodiff_->SetPositionsAndVelocities(
        context_autodiff_.get(), math::initializeAutoDiff(x_));

maybe initializeAutoDiff (which already allocates) can be outside and then within the loop do x(hip_roll_left_position_dof_index_).value() = i

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape, @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: How about some guidance on this magic number? Same for every invocation this guard with various hard-coded limits.

Not sure what to say here. These were the exact numbers of allocations used by the guarded scopes at the time the test was written. Is that not clear?


tools/cc_toolchain/BUILD.bazel, line 60 at r15 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

BTW Someone in the build domain should probably feature review this.

+@jamiesnape for review of changes to this file only.

@rpoyner-tri
Copy link
Contributor Author

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.


Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r17.
Reviewable status: 4 unresolved discussions, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape, @rpoyner-tri, and @SeanCurtis-TRI)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Not sure what to say here. These were the exact numbers of allocations used by the guarded scopes at the time the test was written. Is that not clear?

Actually, the fact that it's not clear is exactly the point. For example, the numbers could've been computed based on known strict analysis of the code under test (ok, not the case here). Describing it as "empirical" invites further questions; a reader doesn't know if this is the "correct" number of merely the observed number. In the case where it's 3, there seems to be a reasonable guess that this is a known quantity that is exactly correct and can't be smaller. Should the number 175 inherit the same apparent credibility? I don't know. Furthermore, does this depend on the complexity of the robot? Again, I don't know.

You've posited that one benefit of this running as a test is so that it can catch regressions in the underlying code (in terms of number of allocations). It should be clear that these numbers are tight bounds (presumably). If this fails in CI, it might be a regression and it might be a legitimate increase due to algorithmic changes. And if actions are taken to reduce allocations, how should these numbers change? Should they be kept tight? Should they be ignored? If we choose to change the numbers, what means should we use? Reasoning? Measurement? Something else?

These are magic numbers and, as is, defy reasoning or modification by future developers. The only thing I can currently know by looking at the code is that the actual numbers of allocations were somewhere below these numbers at at least one point in the past.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 75 at r17 (raw file):

  // those would get computed once and re-used (like in real applications) but
  // with caching off they would get recalculated repeatedly, affecting the
  // timing results.

BTW Marvelous refactor and comment. Thanks.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r17 (raw file):

  }

  std::unique_ptr<MultibodyPlant<double>> plant_{};

nit: Against the styleguide to have these public members. At the very least protected.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 139 at r17 (raw file):

  // autodiff version of state.
  void InvalidateState() {
    context_autodiff_->NoteContinuousStateChange();

BTW I'd strongly recommend making the double-valued context (and its InvalidateState() methods private in the parent class.

My original concern is that in calling this shadowing method, only one of the two contexts available to this class would be invalidated. That seems to invite confusion. It seems simplest and most reasonable to make sure that neither method nor context from the parent are available to the derived class.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 142 at r17 (raw file):

  }

  std::unique_ptr<MultibodyPlant<AutoDiffXd>> plant_autodiff_;

nit: Same note here; style guide doesn't want public members on classes.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape, @rpoyner-tri, and @SeanCurtis-TRI)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

Actually, the fact that it's not clear is exactly the point. For example, the numbers could've been computed based on known strict analysis of the code under test (ok, not the case here). Describing it as "empirical" invites further questions; a reader doesn't know if this is the "correct" number of merely the observed number. In the case where it's 3, there seems to be a reasonable guess that this is a known quantity that is exactly correct and can't be smaller. Should the number 175 inherit the same apparent credibility? I don't know. Furthermore, does this depend on the complexity of the robot? Again, I don't know.

You've posited that one benefit of this running as a test is so that it can catch regressions in the underlying code (in terms of number of allocations). It should be clear that these numbers are tight bounds (presumably). If this fails in CI, it might be a regression and it might be a legitimate increase due to algorithmic changes. And if actions are taken to reduce allocations, how should these numbers change? Should they be kept tight? Should they be ignored? If we choose to change the numbers, what means should we use? Reasoning? Measurement? Something else?

These are magic numbers and, as is, defy reasoning or modification by future developers. The only thing I can currently know by looking at the code is that the actual numbers of allocations were somewhere below these numbers at at least one point in the past.

Good points. A different path I have considered is to simply count allocations as a benchmark measurement, rather than trying to enforce a cap. Benefits: no CI breakage when allocations increase for some legitimate reason; better detailed reporting (fun fact: even in the current code, the number of mallocs invoked is not constant over iterations of the benchmark loop; this phenomenon is completely uninvestigated). Possible downsides: no automated detection of regressions.

So I have several ways I could go:

  • just comment the present values as empirical caps
  • withdraw the limits altogether, and introduce the statistical approach in a later PR (requires a fair amount of work)
  • something else?

In any case, I would posit that all of these measured values, like the rent, are too damn high.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r17 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: Against the styleguide to have these public members. At the very least protected.

The class is a fixture, very much analogous to those used in googletest/boost test/any xUnit framework. Making those data members private won't work without also introducing accessor methods for the benchmark cases to use. I can certainly mark them protected, which is what the style guide recommends for test fixture classes.

FWIW, the point of access control is to facilitate local reasoning. Given that the entire class is inside an anonymous namespace, and that the compilation unit publishes main, there is nowhere else we need to look other than this file to analyze usage of those data.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 139 at r17 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

BTW I'd strongly recommend making the double-valued context (and its InvalidateState() methods private in the parent class.

My original concern is that in calling this shadowing method, only one of the two contexts available to this class would be invalidated. That seems to invite confusion. It seems simplest and most reasonable to make sure that neither method nor context from the parent are available to the derived class.

Given that protected is the best we can do (above), there's no real avenue for using access control to hide anything from a derived class. We could refactor the class hierarchy or use composition somehow, but I'm not sure I see the value, given that the entire universe of discourse is the anonymous namespace within this file.

Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved all my comments, which pretty much were just questions to learn from this. Great work @rpoyner-tri, this is awesome!

Reviewable status: 2 unresolved discussions, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape, @rpoyner-tri, and @SeanCurtis-TRI)

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and add the documentation on the magic numbers and I'm satisfied. Also, need @jamiesnape's eyes on the build file.

Reviewed 1 of 1 files at r18.
Reviewable status: 2 unresolved discussions, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape, @rpoyner-tri, and @SeanCurtis-TRI)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Good points. A different path I have considered is to simply count allocations as a benchmark measurement, rather than trying to enforce a cap. Benefits: no CI breakage when allocations increase for some legitimate reason; better detailed reporting (fun fact: even in the current code, the number of mallocs invoked is not constant over iterations of the benchmark loop; this phenomenon is completely uninvestigated). Possible downsides: no automated detection of regressions.

So I have several ways I could go:

  • just comment the present values as empirical caps
  • withdraw the limits altogether, and introduce the statistical approach in a later PR (requires a fair amount of work)
  • something else?

In any case, I would posit that all of these measured values, like the rent, are too damn high.

I'd accept documentation for now, in absence of anything more disciplined. Plus, this is a sufficiently informal tool that the cost of discipline (particularly at this point in time) is probably not worth it. A note indicating the origin and limitations of these values would satisfy my concern.


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r17 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

The class is a fixture, very much analogous to those used in googletest/boost test/any xUnit framework. Making those data members private won't work without also introducing accessor methods for the benchmark cases to use. I can certainly mark them protected, which is what the style guide recommends for test fixture classes.

FWIW, the point of access control is to facilitate local reasoning. Given that the entire class is inside an anonymous namespace, and that the compilation unit publishes main, there is nowhere else we need to look other than this file to analyze usage of those data.

Protected is necessary for test fixtures, as each TEST_F creates a sub-class of the Test type. I presume that gbench takes the same tack.

FWIW, the other point of consistent access control is so future developers can walk into the code and not necessarily read the whole thing to realize it's special. :)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 139 at r17 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Given that protected is the best we can do (above), there's no real avenue for using access control to hide anything from a derived class. We could refactor the class hierarchy or use composition somehow, but I'm not sure I see the value, given that the entire universe of discourse is the anonymous namespace within this file.

I like this. It's a happy compromise.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignees SeanCurtis-TRI(platform),jamiesnape (waiting on @jamiesnape and @SeanCurtis-TRI)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

I'd accept documentation for now, in absence of anything more disciplined. Plus, this is a sufficiently informal tool that the cost of discipline (particularly at this point in time) is probably not worth it. A note indicating the origin and limitations of these values would satisfy my concern.

Commented. PTAL

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM: pending @jamiesnape's review.

Reviewed 1 of 1 files at r19.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee jamiesnape (waiting on @jamiesnape and @SeanCurtis-TRI)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 80 at r15 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Commented. PTAL

Great!

Copy link
Contributor

@jamiesnape jamiesnape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @rpoyner-tri and @SeanCurtis-TRI)


tools/cc_toolchain/BUILD.bazel, line 60 at r15 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

+@jamiesnape for review of changes to this file only.

This is causing output to stderr, which shows on the build console even when successful. We do not allow that.


tools/cc_toolchain/BUILD.bazel, line 63 at r19 (raw file):

genrule(
    name = "host_settings",
    srcs = [],

Remove srcs completely.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @SeanCurtis-TRI, and @sherm1)


tools/cc_toolchain/BUILD.bazel, line 60 at r15 (raw file):

Previously, jamiesnape (Jamie Snape) wrote…

This is causing output to stderr, which shows on the build console even when successful. We do not allow that.

Done.


tools/cc_toolchain/BUILD.bazel, line 63 at r19 (raw file):

Previously, jamiesnape (Jamie Snape) wrote…

Remove srcs completely.

Done.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @SeanCurtis-TRI, and @sherm1)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.


@rpoyner-tri
Copy link
Contributor Author

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.


Copy link
Contributor

@jamiesnape jamiesnape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


tools/cc_toolchain/BUILD.bazel, line 61 at r19 (raw file):

# Utility rule to save the compiler settings to a file.
genrule(

When I change my compiler on Mac, //:print_host_settings will print the changed compiler, but the genrule will not be re-run so host_settings.txt has old information. On Linux, switching between GCC and Clang should be picked up because CC changes, but minor version updates due to apt upgrade, for instance, may not be picked up. In general, there is also not going to be any guarantees that the compiler is the same one that was used to compile the benchmark. For now, you probably just need to note this in the documentation, a comment, or the output.

Copy link
Contributor

@jamiesnape jamiesnape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


examples/multibody/cassie_benchmark/record_results.sh, line 35 at r19 (raw file):

echo Full results are in:
echo ${TEST_UNDECLARED_OUTPUTS_DIR}/outputs.zip

BTW Not sure this is true if you run this with bazel test.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @SeanCurtis-TRI, and @sherm1)


tools/cc_toolchain/BUILD.bazel, line 61 at r19 (raw file):

Previously, jamiesnape (Jamie Snape) wrote…

When I change my compiler on Mac, //:print_host_settings will print the changed compiler, but the genrule will not be re-run so host_settings.txt has old information. On Linux, switching between GCC and Clang should be picked up because CC changes, but minor version updates due to apt upgrade, for instance, may not be picked up. In general, there is also not going to be any guarantees that the compiler is the same one that was used to compile the benchmark. For now, you probably just need to note this in the documentation, a comment, or the output.

Yuck. I found this while searching for alternatives: bazelbuild/bazel#3041 . So (and with pretty good reasons) there won't be a way to fix this within (say) a genrule.

Given that this implementation of compiler reporting can lie, I'm tempted to just remove all of it, and replace it with TODOs. Thoughts?

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


examples/multibody/cassie_benchmark/record_results.sh, line 35 at r19 (raw file):

Previously, jamiesnape (Jamie Snape) wrote…

BTW Not sure this is true if you run this with bazel test.

I think it's not exactly correct, but need to do more experiments. AFAICT so far, looks like the test invocation does the zipping, but the run invocation does not.

Copy link
Contributor

@jamiesnape jamiesnape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


tools/cc_toolchain/BUILD.bazel, line 61 at r19 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

Yuck. I found this while searching for alternatives: bazelbuild/bazel#3041 . So (and with pretty good reasons) there won't be a way to fix this within (say) a genrule.

Given that this implementation of compiler reporting can lie, I'm tempted to just remove all of it, and replace it with TODOs. Thoughts?

Yes, I think removal is preferable.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @rpoyner-tri, @SeanCurtis-TRI, and @sherm1)


examples/multibody/cassie_benchmark/record_results.sh, line 35 at r19 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

I think it's not exactly correct, but need to do more experiments. AFAICT so far, looks like the test invocation does the zipping, but the run invocation does not.

The run verb produces the zip file and the unzipped text files; the test verb produces only the zip file. I'm ok with the text as-is. Thoughts?

Resolves RobotLocomotion#10322.

h/t Michael Posa and Michael Sherman for earlier draft benchmarks.
Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jamiesnape (waiting on @jamiesnape, @SeanCurtis-TRI, and @sherm1)


tools/cc_toolchain/BUILD.bazel, line 61 at r19 (raw file):

Previously, jamiesnape (Jamie Snape) wrote…

Yes, I think removal is preferable.

Done.

@rpoyner-tri
Copy link
Contributor Author

a discussion (no related file):

@drake-jenkins-bot mac-catalina-clang-bazel-experimental-release please.


Copy link
Contributor

@jamiesnape jamiesnape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r14.
Reviewable status: 1 unresolved discussion (waiting on @SeanCurtis-TRI and @sherm1)


examples/multibody/cassie_benchmark/record_results.sh, line 35 at r19 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

The run verb produces the zip file and the unzipped text files; the test verb produces only the zip file. I'm ok with the text as-is. Thoughts?

Fine to leave it as it is, then.

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r20.
Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),jamiesnape,sherm1(platform)

@SeanCurtis-TRI SeanCurtis-TRI merged commit 4a7d267 into RobotLocomotion:master Aug 11, 2020
@jamiesnape jamiesnape removed their assignment Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants