New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve callstack walk perf further #711
Comments
From bruen...@google.com on December 08, 2011 08:30:16 xref issue #75 |
From bruen...@google.com on December 15, 2011 08:12:44 xref issue #703 : dynamically swap between scan-every-frame and shadow stack |
From bruen...@google.com on December 20, 2011 07:59:16 shadow stack is issue #724 |
From bruen...@google.com on January 10, 2012 19:23:14 for ui_tests the scan dominates (tends to happen, not surprisingly, on apps that use a lot of memory): on laptop: xref issue #151 (improve leak scan perf) |
From bruen...@google.com on December 07, 2011 22:14:44
this issue extends issue #460 but for malloc interception for leak detection where callstacks are gathered on every malloc, though there are far fewer low-hanging fruits here b/c this has been profiled and optimized in the past.
I did a bunch of performance improvements on callstack walking for
Dr. Heapstat (xref PR 473640), resulting in today's optimized in-module
checks, lowest-frame checks, DRi#228, DRi#226, and packed_callstack_hash().
** TODO cfrac on Windows built /Ox /Oy-
now we avoid fp scans (xref issue #460 #s):
app mallocs: 10890330, frees: 10890127, large mallocs: 0
unique malloc stacks: 7050289
callstack fp scans: 0
callstack is_retaddr: 10890130, backdecode: 10890130, unreadable: 0
*** INFO times for different modes
this is after issue #460 A through L
script:
echo native
for ((i=0; i<3; i++)); do
/usr/bin/time ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
echo DR
for ((i=0; i<3; i++)); do
/usr/bin/time ~/dr/git/exports/bin32/drrun.exe -quiet ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
for j in "" "-no_count_leaks" "-no_check_uninitialized" "-no_check_uninitialized -no_count_leaks" "-leaks_only" "-leaks_only -no_zero_stack" "-leaks_only -no_count_leaks" "-leaks_only -no_count_leaks -no_track_allocs"; do
echo $j
for ((i=0; i<3; i++)); do
/usr/bin/time ~/drmemory/git/build_drmem_rel/bin/drmemory.exe $j -quiet -dr c:/src/dr/git/exports -batch -- ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system
done
done
native
0.00user 0.01system 0:01.74elapsed 0%CPU (0avgtext+0avgdata 234752maxresident)k
0.00user 0.01system 0:01.69elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:01.72elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
DR
0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:02.39elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
(drmemory defaults)
0.01user 0.00system 1:18.12elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 1:15.48elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 1:15.84elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-no_count_leaks
0.00user 0.00system 0:57.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:57.03elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:57.42elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-no_check_uninitialized
0.00user 0.00system 0:48.50elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:45.48elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:45.59elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-no_check_uninitialized -no_count_leaks
0.00user 0.00system 0:27.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:27.06elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:27.78elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-leaks_only
0.00user 0.00system 0:34.41elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:34.38elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
0.00user 0.00system 0:34.66elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k
-leaks_only -no_zero_stack
0.00user 0.00system 0:33.57elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:33.54elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:33.58elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
-leaks_only -no_count_leaks
0.00user 0.00system 0:19.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:17.76elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k
0.00user 0.00system 0:17.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
-leaks_only -no_count_leaks -no_track_allocs
0.00user 0.00system 0:03.22elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.00system 0:03.27elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
0.00user 0.01system 0:03.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k
=> rough split:
1.7 1.7 app
prior to my issue #460 improvements, malloc interception was 30s instead of 15s
so this issue tries to shrink the 15s from callstack walking
one thing that shows up led to DRi#635: provide faster dr_try_setup() that doesn't allocate memory
Original issue: http://code.google.com/p/drmemory/issues/detail?id=711
The text was updated successfully, but these errors were encountered: