Skip to content

Loading…

Issue 4809: Help the stack walker to find the location of a throw statement #80

Merged
merged 1 commit into from

5 participants

@rainers
D Programming Language member

This patch creates a standard stack frame for the _d_thowc function under Windowsn, even if druntime is built in release mode. This allows the stack dump and a debugger to display a correct call stack when an exception is thrown.

In addition, a workaround is added to allow the stack walker to synchronize without symbols for kernel32/kernelbase being available.

http://d.puremagic.com/issues/show_bug.cgi?id=4809

@complexmath
D Programming Language member

This is a lot of stuff to push on the stack. Is it really necessary? Can it be smaller? As it stands, this patch would probably make throwing an exception within a Fiber crash the app.

D Programming Language member
D Programming Language member

It was 4k but I've recently bumped it to 16k.

@rainers
D Programming Language member

Just thought about this again. Maybe adding the additional stack entries could be limited to debug builds of the executable by checking the image header for a debug directory entry. (I think this should also be done for generating stack traces in exceptions and loading dbghelp.dll).

@rainers
D Programming Language member

I also noticed that the stack trace generation allocates an even larger buffer on the stack (char[2048] demangleBuf;) in addition to some rather large structs (e.g. CONTEXT), so it should not hurt to use the same stack space a little later.

@MartinNowak MartinNowak commented on the diff
src/rt/deh.d
((11 lines not shown))
// @@@ TODO @@@ Signature should change: h will always be a Throwable.
//printf("_d_throw(h = %p, &h = %p)\n", h, &h);
//printf("\tvptr = %p\n", *(void **)h);
_d_createTrace(h);
+
+ // add some space on the stack to allow the stack walker to resynchronize
+ // even without symbols for kernel32/kernelbase.dll
+ asm
+ {
+ mov EAX, 1000;
+ clear_stack:
+ push 0;
+ dec EAX;
+ jnz clear_stack;
+ }
+
@MartinNowak D Programming Language member

What's the reasoning for this?

@rainers D Programming Language member
@MartinNowak D Programming Language member

Why scanning? StackWalk64 could do normal return address/frame pointer unwinding.
You can try to instrument the ReadMemoryRoutine parameter of StackWalk64 to trace
why this fails.

@rainers D Programming Language member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@CyberShadow
D Programming Language member

BTW, personally I think we should just have debug and release Phobos+Druntime libs, and make use of the -debuglib DMD option. The release libs can get -inline, and the debug ones -g -gs.

@andralex andralex commented on the diff
src/rt/deh.d
@@ -593,15 +593,40 @@ int _d_exception_filter(EXCEPTION_POINTERS *eptrs,
extern(C)
void _d_throwc(Object h)
{
+ // create a standard stack frame, so walking the stack is possible
+ asm
+ {
+ naked;
+ push EBP;
+ mov EBP,ESP;
@andralex D Programming Language member
andralex added a note

space after comma :o)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@andralex andralex commented on the diff
src/rt/deh.d
((28 lines not shown))
//_d_setUnhandled(h);
RaiseException(STATUS_DIGITAL_MARS_D_EXCEPTION,
EXCEPTION_NONCONTINUABLE,
1, cast(void *)&h);
+
+ asm
+ {
+ mov ESP,EBP;
@andralex D Programming Language member
andralex added a note

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@andralex andralex merged commit 80625a2 into D-Programming-Language:master
@MartinNowak
D Programming Language member

I think we should revert this.

  • there is no reproducible test case
  • we need a rational for fixes
  • this might cause a stack overflow
  • the push loop could be much faster

    enum SIZE = 4096;
    
    ubyte* p;
    asm { mov p[RBP], RSP; }
    p[-SIZE .. 0] = 0;
    asm
    {
        sub RSP, SIZE;
        // ...
        add RSP, SIZE;
    }
    
  • I recently changed a lot of stack tracing code
@MartinNowak MartinNowak added a commit to MartinNowak/druntime that referenced this pull request
@MartinNowak MartinNowak Revert "Merge pull request #80 from rainers/issue4809"
This reverts commit 80625a2, reversing
changes made to ce783ff.
dcc5a08
@MartinNowak MartinNowak referenced this pull request
Merged

Fix4809 #268

@rainers
D Programming Language member

The forced stack frame helps, I guess you do not object to this, but the strange push-loop. I agree that pushing a lot of zeroes on the stack to hope for some better behaviour by the OS feels like a dirty hack, but if it helps... I'll keep an eye on it if it does actually have the expected benefit.

there is no reproducible test case

What about the issue in the bug report? The stack trace is already improved by the forced stack frame. Looking at the call stack inside the debugger is a bit harder to automate for a test case.

this might cause a stack overflow

I just rechecked: the traceNoSync call uses about 6kB of stack. As mentioned in the discussion, 400 words instead of 1000 seem fine aswell.

the push loop could be much faster

Taking a snapshot of the stack with resolving debug symbols indicates that performance is not an issue at all ;-)

I recently changed a lot of stack tracing code

Yeah, it looks much better now. Still I want a simple way to disable it (included its shared static this()) because it does not help when running in a debugger.

@MartinNowak
D Programming Language member

What about the issue in the bug report?

Right, I missed the bug report when I wrote this.
I used the posted code and it works as of #268.

Still I want a simple way to disable it (included its shared static this()) because it does not help when running in a debugger.

So Runtime.traceHandler won't help you?
I also switch to lazy symbol loading which significantly reduced the startup time.
Without a static initializer we'd need to load dbghelp.dll and calling SymInitialize in a possibly corrupted state.
It would be really cool if we could create traces lazily when an exception is catched and printed not when it's thrown.

@rainers
D Programming Language member
@MartinNowak
D Programming Language member

How about just recording addresses and resolving them when
printing?

Sounds reasonable.

@rainers
D Programming Language member

It would be really cool if we could create traces lazily when an exception is catched and printed not when it's thrown.

"_d_exception_filter" does not seem to be used at all, but I have tried something similar that seems to work: create a new class StackTraceThrowable deriving from Throwable, use it in the catch handler that wants the stack trace, and check for it in _d_framehandler. Then, generate the stack trace only If it is the searched exception, but continue as if a Throwable has been passed.
I can only test this on windows but the posix version might work similar inside _d_throwc.
If there is interest I can put together a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Showing with 25 additions and 0 deletions.
  1. +25 −0 src/rt/deh.d
View
25 src/rt/deh.d
@@ -593,15 +593,40 @@ int _d_exception_filter(EXCEPTION_POINTERS *eptrs,
extern(C)
void _d_throwc(Object h)
{
+ // create a standard stack frame, so walking the stack is possible
+ asm
+ {
+ naked;
+ push EBP;
+ mov EBP,ESP;
@andralex D Programming Language member
andralex added a note

space after comma :o)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ }
// @@@ TODO @@@ Signature should change: h will always be a Throwable.
//printf("_d_throw(h = %p, &h = %p)\n", h, &h);
//printf("\tvptr = %p\n", *(void **)h);
_d_createTrace(h);
+
+ // add some space on the stack to allow the stack walker to resynchronize
+ // even without symbols for kernel32/kernelbase.dll
+ asm
+ {
+ mov EAX, 1000;
+ clear_stack:
+ push 0;
+ dec EAX;
+ jnz clear_stack;
+ }
+
@MartinNowak D Programming Language member

What's the reasoning for this?

@rainers D Programming Language member
@MartinNowak D Programming Language member

Why scanning? StackWalk64 could do normal return address/frame pointer unwinding.
You can try to instrument the ReadMemoryRoutine parameter of StackWalk64 to trace
why this fails.

@rainers D Programming Language member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
//_d_setUnhandled(h);
RaiseException(STATUS_DIGITAL_MARS_D_EXCEPTION,
EXCEPTION_NONCONTINUABLE,
1, cast(void *)&h);
+
+ asm
+ {
+ mov ESP,EBP;
@andralex D Programming Language member
andralex added a note

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ pop EBP;
+ }
}
/***********************************
Something went wrong with that request. Please try again.