New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce overhead of malloc tracking #460
Comments
From bruen...@google.com on December 07, 2011 08:43:30 studying cfrac on windows: symcache has no effect as 2nd run is no faster than 1st: % for i in "-no_count_leaks" "-no_check_uninitialized" "-no_check_uninitialized -no_count_leaks" "-leaks_only -no_count_leaks -no_zero_stack" "-perturb_only" "-leaks_only -no_count_leaks -no_zero_stack -no_track_allocs"; do echo $i; /usr/bin/time ~/drmemory/git/build_drmem_rel/bin/drmemory.exe $i -quiet -dr c:/src/dr/git/exports -batch -- ./cfrac.exe 41757646344123832613190542166099121; done -leaks_only -no_count_leaks -no_zero_stack + empty hook functions % /usr/bin/time ~/dr/git/exports/bin32/drrun.exe -quiet ./cfrac.exe 41757646344123832613190542166099121 % /usr/bin/time ./cfrac.exe 41757646344123832613190542166099121 I have a series of 10 or so optimizations including DRi#632 that get the 35 seconds down to about 24 seconds. |
From bruen...@google.com on December 07, 2011 13:12:48 I have it down at 20 seconds now, 14s more than just clean calls. but note that leak scan/callstack time dwarfs that. clean calls seem to incur 2x here. malloc interception was 5x on top of that: now 3x. |
From bruen...@google.com on December 07, 2011 13:21:19 -no_check_uninitialized w/ leak callstacks but no leak scan: |
From bruen...@google.com on December 07, 2011 13:46:25 strike comment 3: that was measured w/ a dozen different new optimizations, so it needs a new base to compare to. the leak scan is not an issue. this is with A through L (my internal names for the opts in my notes) so strange, getting slower times w/o the scan. but the point here is that -leaks_only -no_zero_stack (so malloc intercept + callstacks): |
From bruen...@google.com on December 07, 2011 14:18:34 -leaks_only -no_zero_stack -callstack_max_frames 6 -leaks_only -no_zero_stack -callstack_max_frames 3 -leaks_only -no_zero_stack -delay_frees 0 |
From bruen...@google.com on December 07, 2011 17:53:55 note that the above callstack walking is on cfrac built /Ox so not really fair: haven't looked in detail but it's probably having to scan frequently b/c of FPO. I will re-analyze w/ /Oy-. I have CodeAnalyst data for a series of optimizations I labeled A I will paste here at least the two endpoints. Wall-clock-time-wise: Prior to optimizations A through L: Process Name 64-bit Timer samples dynamorio: drmemorylib: after: keep in mind that these are percentages (would be nice to have a Process Name 64-bit Timer samples dynamorio: |
From bruen...@google.com on December 07, 2011 17:53:55 ... 2.44 drmemorylib: |
From bruen...@google.com on December 07, 2011 17:56:51 List of optimizations and other changes:
|
From bruen...@google.com on December 07, 2011 19:15:36 indeed /Ox is scanning a lot: /Ox /Oy-: (unfortunately on linux there is only 1 scan, and adding -fno-omit-frame-pointer makes no difference) splitting off any work on callstack walking perf to issue #711 . see also in that issue a new breakdown of times for cfrac built /Oy- (huge difference). |
From bruen...@google.com on December 08, 2011 07:21:00
vs old 1.66 => 9.3% improvement direct comparison shows 26% win on tonto, 20% on perlbench:
|
From bruen...@google.com on December 08, 2011 12:37:19 unfortunately the optimizations done here so far (A through L) don't have results on windesk for HEAD (i.e., no issue #460 opts): [----------] 1 test from NPAPITesterBase (544662 ms total) results for optimized (i.e., drmem has A through L): [----------] 1 test from NPAPITesterBase (481645 ms total) splitting investigating why -no_track_allocs is so slow to issue #714 |
From bruen...@google.com on April 12, 2012 09:22:51 |
From bruen...@google.com on June 16, 2011 10:24:33
below are perf numbers for -leaks_only -no_count_leaks vs native.
-no_count_leaks disables stack zeroing, so this is the cost of malloc
tracking (w/o recording callstacks): this is way too high!
this is one barrier to higher performance with -no_check_uninitialized.
back when we first created Dr. Heapstat I spent some time optimizing malloc
tracking, including recording callstacks, and I sure thought I had
performance better than this (I was looking at spec2000 and some
malloc-intensive benchmarks like roboop and cfrac). something could have
regressed.
from profiling, some of this cost is from maintaining the hashtables and
delay-free queue, much of which could go away w/ malloc replacement b/c the
header could be used more freely: or perhaps a larger redzone could be used
with more data than just the size stored directly in it. early injection
would make that much easier, since today no-redzone allocs must be handled:
one reason for the current design.
Original issue: http://code.google.com/p/drmemory/issues/detail?id=460
The text was updated successfully, but these errors were encountered: