-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APP CRASH drmem full on Chromium net_unitests #1723
Comments
Although I never got as far as finding precisely what was triggering the Some client code, while the TEB stack fields point to dstack, touched an Somehow I can't reproduce this in DR debug build, nor in other that DrMem I'm dumping my notes below: *** TODO soln #1: always keep dstack higher than app stack and don't swap StackLimit? if esp is lower than TEB.StackLimit, kernel updates it: 0:000> !teb xref DynamoRIO/dynamorio#1102: so this bug requires an app stack at a higher address than its dstack, an -stack_shares_gencode isn't active anyway due to the large client stacks: and should_swap_peb_pointer() is only if there's a client: so kind of mutually exclusive. impl: call os_heap_reserve_in_region() with lower bound of app stack and could have a 2nd -vm_stack_reserve up high. could make this best-effort and combine w/ #2 w/ various checks for Looks like many instances of low dstacks are once beyond vm reservation, for win7: Though remember that on win8 app xsp is much higher. Implementing this soln, here's where we go off vm_reserve: T3344 dstack=0x27bcf000, app esp=0x1181f87c, TEB lim=0x1181f000, TEB base=0x11820000 **** DONE don't swap StackLimit A high-dstack soln doesn't work by itself: sure, the kernel updates An alternative could be to swap StackLimit and have the priv-to-app cxt sw Seems better to essentially combine w/ soln#3, if we have property that For thread exit: really we need to swap after app stack is deallocated. A priv lib _chkstk will check if esp is >= TEB.StackLimit and if so, it's i#1676 should still be fine unless it has some max stack size or sthg So the only remaining problem is the client touching the guard page of a *** TODO soln #2: we check and update TEB.StackLimit ourselves prior to any harm many things could trigger guard: leak scan, safe read is_retaddr, reading if consequence was always a raised guard page fault (0x80000001), could what about _chkstk? can it hit AV? it might fail to trigger the right but haven't we seen more than 2 exit codes (0x80000001 and 0xc0000028)? I put in checks on NtRaiseException and KiUserExceptionDispatcher and they [ RUN ] DiskCacheBackendTest.CreateBackend_MissingFile Requires a little mem query syscall loop to find the guard page, but What about i#1676 where kernel now checks precise esp bounds? That What about asking client, for any read of app stack, to call some DR *** CANCELED soln #3: can we store a fake range in TEB field instead of dstack range? => impossible for DynamoRIO/dynamorio#921: need StackLimit page mapped, or have no _chkstk in DR+tool exit |
We have another related bug hit while testing: **** TODO ASSERT [ RUN ] DiskCacheBackendTest.ShaderCacheOnlyDoomAll another w/ more info (took 29 iters to hit it): [ RUN ] DiskCacheBackendTest.SimpleCacheLoad DrMem is calling: and swap_peb_pointer_ex() does nothing on to_priv for stack since |
The adopted solution is a combination of the 3 proposed solutions above.
*** TODO what about fibers or other stack swaps? need dstack above those too! |
This re-lands fixes for problems handling HeapWalk(). This also fixes several other bugs: DynamoRIO/dynamorio#1690, DynamoRIO/drmemory#1718, DynamoRIO/drmemory#1722, and DynamoRIO/drmemory#1723. BUG=481231 TBR=thakis@chromium.org NOTRY=true Review URL: https://codereview.chromium.org/1128923003 Cr-Commit-Position: refs/heads/master@{#328548}
I ran full net_ via:
and:
[ RUN ] DiskCacheBackendTest.InvalidRankingsFailure
Dr.MWARNING: application exited with abnormal code 0x40010006Strange:
define DBG_PRINTEXCEPTION_C ((NTSTATUS)0x40010006L) // winnt
Light mode seems to work.
Excluding that one test, and we hit:
[ RUN ] DiskCacheBackendTest.Enumerations
Dr.MWARNING: application exited with abnormal code 0x80000001Hmm, xref #1690.
Back to no exclusions, but after fixing i#1685 assert (see below) which is
the only problem I can hit if I run just one of these DiskCacheBackendTest
subtests, I still see problems:
http://build.chromium.org/p/client.drmemory/builds/drmemory-windows-r2075-sfx.exe
=> seems to work fine, so it looks like a regression.
Excluding all of the DiskCacheBackendTest.* and later we hit:
-no_share_xl8: still crashes.
-stack_size 128K: the crash w/ OutputDebugStringA
I reproduced w/:
But haven't found any smaller subset of tests that reproduces. The
0xc0000028 is the most common, often with no backtrace printed, and
-pause_at_exit won't pause (so a uncaught-by-DR crash?), so I'm having a
hard time getting information about the crash.
-dr_debug we see early on:
<Out of vmheap reservation - reserving 256KB.Falling back onto OS allocation>
but then the tests all pass. So something limited to DR release build?
The text was updated successfully, but these errors were encountered: