Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled Win32 Exception on primitiveQuit #128

Closed
fxgallego opened this issue Apr 30, 2017 · 6 comments
Closed

Unhandled Win32 Exception on primitiveQuit #128

fxgallego opened this issue Apr 30, 2017 · 6 comments
Labels

Comments

@fxgallego
Copy link

fxgallego commented Apr 30, 2017

I'm copying here the excelent description to this problem by @blairmcg in this comment

This sounds like the known VM issue where if an attempt is made to shutdown inside a callback that this fails and leaves the process running in a zombie state. The quit primitive works by throwing a Win32 exception that is trapped down in the main entry point function of the VM. This is done to unwind the stack cleanly running any clean-up blocks on the way. Or at least that is the theory. When Dolphin was originally written this worked fine because Windows allowed exceptions to be thrown over user->kernel->user mode transitions and be caught in the user mode code on the other side. Since Windows 7 (I think) it hasn't allowed this, so if Dolphin is (for example) inside a callback to process a windows message, then the exit attempt fails.
One sometimes encountered this when exiting a normal development image. I don't know of any good solution at present.

The simplest way to reproduce this is evaluating SessionManager current quit: 0

At the VM level it's easy to see what's happening (inspired by the Blair comment).
If you halt the primitiveQuit and step trough code you can follow this path:

1. primitiveQuit

void __fastcall Interpreter::primitiveQuit(CompiledMethod&,unsigned)
{
	Oop argPointer = stackTop();
	exitSmalltalk(ObjectMemoryIntegerValueOf(argPointer));
}

2. exitSmalltalk

void Interpreter::exitSmalltalk(int exitCode)
{
#ifdef _DEBUG
	{
		extern void DumpPrimitiveCounts(bool);
		extern void DumpBytecodeCounts(bool);

		DumpPrimitiveCounts(false);
		DumpBytecodeCounts(false);
	}
#endif

	DolphinExit(exitCode);
}

3. DolphinExit

void __stdcall DolphinExit(int exitCode)
{
	RaiseFatalError(IDP_EXIT, 1, exitCode);
}

4. RaiseFatalError

void __cdecl RaiseFatalError(int nCode, int nArgs, ...)
{
	va_list args;
	va_start(args, nArgs);
	::RaiseException(MAKE_CUST_SCODE(SEVERITY_ERROR, FACILITY_NULL, nCode), EXCEPTION_NONCONTINUABLE, nArgs, (CONST ULONG_PTR*)args);
}

When a normal image exits occurs (for example when user clicks the main shell close button) the primitiveQuit is executed and signals an exception as you can see in the code above.
The exception is managed in the handler block of the VMRun function in dolphin.cpp and initiates the Dolphin's vm clean shutdown process as you can see below:

int APIENTRY VMRun(DWORD dwArg)
{
	int exitCode = 0;
	EXCEPTION_RECORD exRec = { 0 };

	lpTopFilter = SetUnhandledExceptionFilter(unhandledExceptionFilter);
	_invalid_parameter_handler outerInvalidParamHandler = _set_invalid_parameter_handler(invalidParameterHandler);

	__try
	{
		DolphinRun(dwArg);
	}
	__except (vmmainFilter(GetExceptionInformation(), exRec))
	{
		SetUnhandledExceptionFilter(lpTopFilter);
		lpTopFilter = NULL;
		_set_invalid_parameter_handler(outerInvalidParamHandler);

		if (exRec.ExceptionCode == SE_VMEXIT)
			exitCode = exRec.ExceptionInformation[0];
		else
			FatalException(exRec);
	}

	DolphinExitInstance();

	return exitCode;
}

But when the primitiveQuit is executed trough a callback (or during a callback?) the quit exception is handled in other function, an exception filter in the function Interpreter::callback in file Interfac.cpp.
The exception filter (hanlder?) code is implemented in the function Interpreter::callbackTerminationFilter, see below:

int Interpreter::callbackTerminationFilter(LPEXCEPTION_POINTERS info, Process* callbackProcess, Oop prevCallbackContext)
{
	EXCEPTION_RECORD* pExRec = info->ExceptionRecord;

	switch (pExRec->ExceptionCode)
	{
	case SE_VMCALLBACKEXIT:

		{
			// Its a callback exception, now lets see if its in sync.
			if (callbackProcess == actualActiveProcess())
			{
				// Allow any pending callback exits to retry - we only signal the semaphore if necessary to avoid
				// building up a huge number of excess signals, and to avoid the overhead.
				wakePendingCallbacks();
				return EXCEPTION_EXECUTE_HANDLER;	// In sync., exit this callback returning value atop the stack
			}
			else
			{
				// Note: The primitive which raised the callback termination exception will
				// increment the count of pending (failed) callback exits so that on the next
				// successful callback, the appropriate number of signals will be sent to the pending
				// callbacks semaphore.
				return EXCEPTION_CONTINUE_EXECUTION;	// Out of sync. continue and fail primitiveCallbackReturn
			}
			break;
		}
		case SE_VMCALLBACKUNWIND:			// N.B. Similar, but subtly different (we continue the search)
		{
			// Its a callback unwind, now lets see if its in sync.
			if (callbackProcess == actualActiveProcess())
			{
				// Pop the top of the callback context stack
				currentCallbackContext = prevCallbackContext;

				// Allow any pending callback exits to retry
				wakePendingCallbacks();
				ASSERT(reinterpret_cast<SchedulerOTE*>(stackTop()) == schedulerPointer());
				return EXCEPTION_CONTINUE_SEARCH;
			}
			else
			{
				// This unwind must wait its turn as there are other pending callback exits
				// above it.

				// Note: The primitive which raised the callback termination exception will
				// increment the count of pending (failed) callback exits so that on the next
				// successful callback, the appropriate number of signals will be sent to the pending
				// callbacks semaphore.
				return EXCEPTION_CONTINUE_EXECUTION;	// Out of sync. continue and fail primitiveCallbackReturn
			}
			break;
		}
	}

	return EXCEPTION_CONTINUE_SEARCH;		// Not a callback exit (successful unwind or other)
}

There is no code to handle an exception case like the one generated by the primitiveQuit.

What if we can add an extra case to the above code in order to handle the quit exception doing the same stuff as in the VMRun handler? After all it's going to shutdown. Are there other cleanup blocks others than the ones in DolphinExitInstance()?

As a proof of concept I've added the extra case to this last handler as:
(The case value number is the quit exceptionCode and should be wrote in a more elegant way)

		case 2684355076:
		{ DolphinExitInstance();
			exit(0);
		}
		break;

And it appears to work well. I don't know what other implications could have this hack.
Sorry but I'm not a C++ expert nor a VM engineer probably I'm wrong naming things in some parts of this report.

@fxgallego
Copy link
Author

Another thing that I don´t understand and annoys me is why using a deferred evaluation works sometimes and others not? We have changed the quit: method as below:

!SessionManager methodsFor!
quit: anInteger
    "Private - Force a close down of the session with the specified exit code."

    self onExit == true ifTrue: [[self primQuit: anInteger] deferredValue].
    ^false    "cancelled" ! ! 

You could think that it's enough to solve the problem because the quit will be routed trough the VM's main interpret loop but sometimes fails in the same way.
How is this happening? Any idea on how we can reproduce this easily? e.g. generating lot of callbacks at the same time the main shell is being closed?

@blairmcg
Copy link
Contributor

blairmcg commented May 2, 2017

I'm not surprised you get unpredictable effects with DeferredValues, these are a kind of promise and fork a process that will execute the operation fairly non-deterministically. As there is only a single OS thread in use, it might be inside a callback when a context switch is made to the DeferredValue process, so the the quit primitive will fail in the same way.
It should work if you push the operation into the main process to be executed when the message loop is idle, i.e.

!SessionManager methodsFor!

quit: anInteger
	"Private - Force a close down of the session with the specified exit code."

	self onExit == true ifTrue: [[self primQuit: anInteger] postToInputQueue].
	^false	"cancelled"! !

@blairmcg
Copy link
Contributor

blairmcg commented May 2, 2017

Unfortunately, handling the Win32 exception in the callbackTerminationFilter as proposed above would not be the right fix. It will "work", but would really defeat the object of quitting this way in the first place. It would be similar to just calling the exit function directly. The objective is to unwind the stack cleanly, although that only really matters when running Dolphin in-proc.
The workaround of posting the primQuit: call to the input queue ought to work - I'd try that first.

@fxgallego
Copy link
Author

About the callbackTerminationFilter: I suspected you would would tell me that :)

And for the workaround for the primQuit: I'm feeling very dumb, because I thought deferredValue were doing the same as postToInputQueue. I forgotten the postToInputQueue message at all. I'm not doing Smalltalk these days so my head is throwing things out.

I've tested with a debug vm and works ok so I'm pretty confident it'll work as well in the runtime environment. I let you know.

@blairmcg blairmcg added the bug label May 11, 2017
@fxgallego
Copy link
Author

Using the same code as the recently added to the base image for dolphinsmalltalk/Dolphin#355 we can't no longer reproduce this error.

@blairmcg
Copy link
Contributor

Closing based on comment from @fxgallego that the workaround solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants