Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve callstack walk perf further #711

Open
derekbruening opened this issue Nov 28, 2014 · 4 comments
Open

improve callstack walk perf further #711

derekbruening opened this issue Nov 28, 2014 · 4 comments

Comments

@derekbruening
Copy link
Contributor

From bruen...@google.com on December 07, 2011 22:14:44

this issue extends issue #460 but for malloc interception for leak detection where callstacks are gathered on every malloc, though there are far fewer low-hanging fruits here b/c this has been profiled and optimized in the past.

I did a bunch of performance improvements on callstack walking for
Dr. Heapstat (xref PR 473640), resulting in today's optimized in-module
checks, lowest-frame checks, DRi#228, DRi#226, and packed_callstack_hash().

** TODO cfrac on Windows built /Ox /Oy-

now we avoid fp scans (xref issue #460 #s):

app mallocs: 10890330, frees: 10890127, large mallocs: 0
unique malloc stacks: 7050289
callstack fp scans: 0
callstack is_retaddr: 10890130, backdecode: 10890130, unreadable: 0

*** INFO times for different modes

this is after issue #460 A through L

script:
echo native
for ((i=0; i<3; i++)); do
/usr/bin/time ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
echo DR
for ((i=0; i<3; i++)); do
/usr/bin/time ~/dr/git/exports/bin32/drrun.exe -quiet ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
for j in "" "-no_count_leaks" "-no_check_uninitialized" "-no_check_uninitialized -no_count_leaks" "-leaks_only" "-leaks_only -no_zero_stack" "-leaks_only -no_count_leaks" "-leaks_only -no_count_leaks -no_track_allocs"; do
echo $j
for ((i=0; i<3; i++)); do
/usr/bin/time ~/drmemory/git/build_drmem_rel/bin/drmemory.exe $j -quiet -dr c:/src/dr/git/exports -batch -- ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
done

native
0.00user 0.01system 0:01.74elapsed 0%CPU (0avgtext+0avgdata 234752maxresident)k
0.00user 0.01system 0:01.69elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:01.72elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
DR
0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:02.39elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
(drmemory defaults)
0.01user 0.00system 1:18.12elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 1:15.48elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 1:15.84elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-no_count_leaks
0.00user 0.00system 0:57.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:57.03elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:57.42elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-no_check_uninitialized
0.00user 0.00system 0:48.50elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:45.48elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:45.59elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-no_check_uninitialized -no_count_leaks
0.00user 0.00system 0:27.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:27.06elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:27.78elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-leaks_only
0.00user 0.00system 0:34.41elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:34.38elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
0.00user 0.00system 0:34.66elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-leaks_only -no_zero_stack
0.00user 0.00system 0:33.57elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:33.54elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:33.58elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
-leaks_only -no_count_leaks
0.00user 0.00system 0:19.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:17.76elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:17.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-leaks_only -no_count_leaks -no_track_allocs
0.00user 0.00system 0:03.22elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:03.27elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:03.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k

=> rough split:
1.7 1.7 app

  • 0.7 2.4 DR
  • 0.9 3.3 drmem base client init, -code_api, etc.
    1. 18.3 malloc interception
    1. 33.6 callstack on every malloc
    1. 45.6 addronly instru
    1. 75.6 full instru

prior to my issue #460 improvements, malloc interception was 30s instead of 15s

so this issue tries to shrink the 15s from callstack walking

one thing that shows up led to DRi#635: provide faster dr_try_setup() that doesn't allocate memory

Original issue: http://code.google.com/p/drmemory/issues/detail?id=711

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on December 08, 2011 08:30:16

xref issue #75

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on December 15, 2011 08:12:44

xref issue #703 : dynamically swap between scan-every-frame and shadow stack
based on malloc freq

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on December 20, 2011 07:59:16

shadow stack is issue #724

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on January 10, 2012 19:23:14

for ui_tests the scan dominates (tends to happen, not surprisingly, on apps that use a lot of memory):

on laptop:
% ./batch.sh
native
[----------] 1 test from NPAPITesterBase (2677 ms total)
[----------] 1 test from NPAPITesterBase (992 ms total)
[----------] 1 test from NPAPITesterBase (799 ms total)
DR
[----------] 1 test from NPAPITesterBase (8522 ms total)
[----------] 1 test from NPAPITesterBase (8390 ms total)
[----------] 1 test from NPAPITesterBase (8185 ms total)
DR -code_api -disable_traces -bb_single_restore_prefix -max_bb_instrs 256
[----------] 1 test from NPAPITesterBase (5167 ms total)
[----------] 1 test from NPAPITesterBase (5052 ms total)
[----------] 1 test from NPAPITesterBase (5823 ms total)
-no_check_uninitialized
[----------] 1 test from NPAPITesterBase (222740 ms total)
[----------] 1 test from NPAPITesterBase (48060 ms total)
[----------] 1 test from NPAPITesterBase (45426 ms total)
-no_check_uninitialized -no_leak_scan
[----------] 1 test from NPAPITesterBase (36710 ms total)
[----------] 1 test from NPAPITesterBase (37639 ms total)
[----------] 1 test from NPAPITesterBase (34794 ms total)
-no_check_uninitialized -no_count_leaks
[----------] 1 test from NPAPITesterBase (33162 ms total)
[----------] 1 test from NPAPITesterBase (33325 ms total)
[----------] 1 test from NPAPITesterBase (33522 ms total)

xref issue #151 (improve leak scan perf)
xref issue #568 (parallelize leak scan)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant