-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LimitMalloc: instrumentation to dump all disallowed stacks #14899
LimitMalloc: instrumentation to dump all disallowed stacks #14899
Conversation
1936876
to
30f0679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @jwnimmer-tri @ggould-tri @sherm1 @SeanCurtis-TRI useful? too crude?
+@EricCousineau-TRI for feature review (yep, glutton for punishment ;) )
+(priority: low)
Reviewable status: LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @EricCousineau-TRI)
// Non-fatal breakpoint action; use with helper script: | ||
// tools/dynamic_analysis/dump_limit_malloc_stacks | ||
if (std::getenv("DRAKE_LIMIT_MALLOC_NONFATAL_BREAKPOINT")) { | ||
::raise(SIGTRAP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Be sure this (and the new include) builds on macOS.
I think the general idea is fine. If someone else finds this useful, I think its fine for master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @EricCousineau-TRI and @rpoyner-tri)
common/test_utilities/limit_malloc.cc, line 176 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Be sure this (and the new include) builds on macOS.
Basic smoke test looks OK:
rico@macsim:~/checkout/drake$ git checkout dump-limit-malloc-stacks
Branch 'dump-limit-malloc-stacks' set up to track remote branch 'dump-limit-malloc-stacks' from 'origin'.
Switched to a new branch 'dump-limit-malloc-stacks'
rico@macsim:~/checkout/drake$ bazel test //common/test_utilities/...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analyzed 75 targets (45 packages loaded, 1352 targets configured).
INFO: Found 32 targets and 43 test targets...
INFO: Elapsed time: 55.126s, Critical Path: 22.42s
INFO: 312 processes: 164 internal, 148 darwin-sandbox.
INFO: Build completed successfully, 312 total actions
Executed 43 out of 43 tests: 43 tests pass.
INFO: Build completed successfully, 312 total actions
rico@macsim:~/checkout/drake$
Of course this code can't be made to do anything on mac, since LimitMalloc doesn't instrument at all on mac.
a discussion (no related file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass; LGTM for general idea + usefulness!
Requesting concrete reproduction case to limit need for reviewer creativity 😬
Reviewed 2 of 2 files at r1.
Reviewable status: 6 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @rpoyner-tri)
common/test_utilities/limit_malloc.cc, line 175 at r1 (raw file):
// Non-fatal breakpoint action; use with helper script: // tools/dynamic_analysis/dump_limit_malloc_stacks if (std::getenv("DRAKE_LIMIT_MALLOC_NONFATAL_BREAKPOINT")) {
nit User contract may not be sufficiently constrained, may admit confusion; concretely, this may trigger if the env var is set, but empty.
Suggestions (assuming appropriate prior null check)
- Explicitly check for
string(env) == "1"
(fully constrained) - Just check for
!string(env).empty()
tools/dynamic_analysis/dump_limit_malloc_stacks, line 1 at r1 (raw file):
#!/bin/bash
Can you ensure that there's a concrete way to show this?
(It may easy enough (5-10min?) to write a Python unittest for this script, if you have concrete example?)
tools/dynamic_analysis/dump_limit_malloc_stacks, line 2 at r1 (raw file):
#!/bin/bash #
Should have set -eu -o pipefail
.
If you think it'll need debugging, please also add -x
.
tools/dynamic_analysis/dump_limit_malloc_stacks, line 11 at r1 (raw file):
# Example: # # $ bazel build -c dbg //my_thing:my_test
nit Consider making this a concrete example (even if it doesn't explicitly fail).
tools/dynamic_analysis/dump_limit_malloc_stacks, line 13 at r1 (raw file):
# $ bazel build -c dbg //my_thing:my_test # $ tools/dynamic_analysis/dump_limit_malloc_stacks \ # bazel-bin/my_thing/my_test |& tee stacks.out
BTW TIL about |&
- woot! I've just been using the more kludgy 2>&1 |
(or whatever the right ordering is)
tools/dynamic_analysis/dump_limit_malloc_stacks, line 38 at r1 (raw file):
eof DRAKE_LIMIT_MALLOC_NONFATAL_BREAKPOINT=1 \
nit Consider using exec
if there's nothing else after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @rpoyner-tri)
common/test_utilities/limit_malloc.cc, line 175 at r1 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
nit User contract may not be sufficiently constrained, may admit confusion; concretely, this may trigger if the env var is set, but empty.
Suggestions (assuming appropriate prior null check)
- Explicitly check for
string(env) == "1"
(fully constrained)- Just check for
!string(env).empty()
I don't think I can trust std::string
not to allocate, so things will be a bit more primitive...
30f0679
to
43ef7a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @EricCousineau-TRI and @rpoyner-tri)
tools/dynamic_analysis/dump_limit_malloc_stacks, line 1 at r1 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
Can you ensure that there's a concrete way to show this?
(It may easy enough (5-10min?) to write a Python unittest for this script, if you have concrete example?)
Food for thought: limit-malloc is a test-only diagnostic tool. How much recursive testing is enough?
Also, will gdb even work under CI? ptrace (2)
in the cloud seems fraught at best.
tools/dynamic_analysis/dump_limit_malloc_stacks, line 2 at r1 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
Should have
set -eu -o pipefail
.If you think it'll need debugging, please also add
-x
.
FWIW, you can always externally kick on debugging of bash scripts. For script foo
, you can invoke it via bash -vx foo
and get plenty of diagnostics. If it were invoked by CI, then I'd be in favor of internal -x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @EricCousineau-TRI and @rpoyner-tri)
tools/dynamic_analysis/dump_limit_malloc_stacks, line 1 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
Food for thought: limit-malloc is a test-only diagnostic tool. How much recursive testing is enough?
Also, will gdb even work under CI?
ptrace (2)
in the cloud seems fraught at best.
I'm inclined to punt on another level of automated test at this point.
tools/dynamic_analysis/dump_limit_malloc_stacks, line 2 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
FWIW, you can always externally kick on debugging of bash scripts. For script
foo
, you can invoke it viabash -vx foo
and get plenty of diagnostics. If it were invoked by CI, then I'd be in favor of internal-x
.
Done.
a discussion (no related file): Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
@drake-jenkins-bot mac-catalina-clang-bazel-experimental-debug please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform), needs at least two assigned reviewers (waiting on @EricCousineau-TRI and @rpoyner-tri)
common/test_utilities/limit_malloc.cc, line 175 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
I don't think I can trust
std::string
not to allocate, so things will be a bit more primitive...
OK Ah, good point!
common/test_utilities/test/limit_malloc_manual_test.cc, line 5 at r2 (raw file):
int main(void) { drake::test::LimitMalloc guard; new int; // Deliberately cause LimitMalloc to report failure.
nit Unclear why you introduce an extra defect, leaked memory. (Naively, I could think that's part of the purpose of this test.)
Can you instead use something like std::make_unique<int>()
, or just comment that leaking memory is not the point of this test?
tools/dynamic_analysis/dump_limit_malloc_stacks, line 1 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
I'm inclined to punt on another level of automated test at this point.
OK IMO not that hard to try, but def. won't force the issue!
tools/dynamic_analysis/dump_limit_malloc_stacks, line 2 at r1 (raw file):
FWIW, you can always externally [...]
OK Noted - thanks for the info!
tools/dynamic_analysis/dump_limit_malloc_stacks, line 11 at r2 (raw file):
# Example: # # $ bazel build -c dbg //common/test_utilities:limit_malloc_manual_test
Working: Running locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 unresolved discussions, needs at least two assigned reviewers (waiting on @rpoyner-tri)
tools/dynamic_analysis/dump_limit_malloc_stacks, line 11 at r2 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
Working: Running locally.
OK Sweet! w/ fixed typo, confirmed works as intended.
tools/dynamic_analysis/dump_limit_malloc_stacks, line 13 at r2 (raw file):
# $ bazel build -c dbg //common/test_utilities:limit_malloc_manual_test # $ tools/dynamic_analysis/dump_limit_malloc_stacks \ # bazel-bin/common_test_utilities/limit_malloc_manual_test \
Typo: Should be common/test_utilities
, not common_test_utilities
I've found this useful to capture all the stack traces at once from unwanted allocations. The resulting data is a bit raw, but can be easily post-processed to count/sort events, provide IDE-compatible error messages indicating source lines, etc. To demonstrate the tooling, this patch adds a trivial limit_malloc_manual_test that just deliberately triggers a failure.
43ef7a5
to
f4fe3a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, needs at least two assigned reviewers (waiting on @EricCousineau-TRI and @rpoyner-tri)
a discussion (no related file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
@drake-jenkins-bot mac-catalina-clang-bazel-experimental-debug please.
@drake-jenkins-bot mac-catalina-clang-bazel-experimental-debug please.
tools/dynamic_analysis/dump_limit_malloc_stacks, line 13 at r2 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
Typo: Should be
common/test_utilities
, notcommon_test_utilities
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers (waiting on @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@soonho-tri for platform review; Wednesday schedule. Hopefully the mac build will be done by then. :)
Reviewable status: 1 unresolved discussion, LGTM missing from assignee soonho-tri(platform) (waiting on @rpoyner-tri and @soonho-tri)
a discussion (no related file): Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
@drake-jenkins-bot mac-catalina-clang-bazel-experimental-debug please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r2, 2 of 2 files at r3.
Reviewable status: 1 unresolved discussion (waiting on @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all discussions resolved, LGTM from assignees soonho-tri(platform),EricCousineau-TRI(platform) (waiting on @rpoyner-tri)
a discussion (no related file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
@drake-jenkins-bot mac-catalina-clang-bazel-experimental-debug please.
passed.
I've found this useful to capture all the stack traces at once from
unwanted allocations. The resulting data is a bit raw, but can be easily
post-processed to count/sort events, provide IDE-compatible error
messages indicating source lines, etc.
This change is