Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent deadlocks from hanging service - closes #20 #21

Merged
merged 5 commits into from Aug 14, 2014

Conversation

Projects
None yet
2 participants
@Iristyle
Copy link

commented Aug 14, 2014

These are the issues I've found with the service code, that help resolve hangs in our environment. Wanted to get this up for review new - will follow up tomorrow.

Iristyle added some commits Aug 14, 2014

Allocate correct number of pointers for events
 - The FFI memory pointer initializer takes a count of the type it's
   allocating, not the number of total bytes
Prevent Daemon exceptions from deadlocking service
 - If any methods that respond to the SCM throw an exception, the
   @@hStopCompletedEvent is never signaled and the service can hang.
   This includes service_pause, service_resume, service_interrogate,
   service_shutdown, service_paramchange, service_netbindadd,
   service_netbindremove, service_netbindenable, service_netbinddisable,
   or service_stop.  Effectively, guard against user-defined classes
   derived from Win32::Daemon misbheaving and hanging the service.
Use CreateThread user param for StartServiceCtrlDispatcher
 - The 4th parameter to the CreateThread function is an LPVOID that
   allows a user parameter to be passed to the newly created thread.
   Instead of using a Ruby closure over Service_Main within the
   ThreadProc (which could be considered a bit dangerous with respect
   go GC), perform a safer operation by passing a native pointer to
   the function.
Prevent Service_Main exceptions from deadlocking service
 - If the code within Service_Main throws an exception or otherwise
   exits prematurely, then the call will never be made to
   SetTheServiceStatus to tell the SCM that the service has successfully
   completed.  Without this notification sent to the SCM,
   StartServiceCtrlDispatcher will never return and the service will
   hang with a status of STOP_PENDING.
 - This exact behavior has been observed within the Ruby 2.0 x64
   environment, where due to unknown reasons, the Service_Main thread
   is prematurely terminated.  In this particular case, as soon as
   Service_Ctrl_ex is called by the SCM, Service_Main appears to die
   immediately.  In observed cases, there is some race condition
   present as the situation is not always reproducible.
Return correct codes from Service_Ctrl_Ex
 - According to the MSDN documentation for the HandlerEx function, it
   should geneally return NO_ERROR, or ERROR_CALL_NOT_IMPLEMENTED:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683241(v=vs.85).aspx
   Presently the return value from the HandlerEx function is whatever
   the return value happens to be from the SetTheServiceStatus function.
   This is technically not correct, and thus the code has been
   updated to reflect this.

@Iristyle Iristyle referenced this pull request Aug 14, 2014

Closed

Deadlock #20

djberg96 added a commit that referenced this pull request Aug 14, 2014

Merge pull request #21 from Iristyle/fix-deadlock
Prevent deadlocks from hanging service - closes #20

@djberg96 djberg96 merged commit 4f9982b into chef:ffi Aug 14, 2014

@djberg96

This comment has been minimized.

Copy link
Collaborator

commented Aug 14, 2014

Looks good.

@Iristyle

This comment has been minimized.

Copy link
Author

commented Aug 14, 2014

Thanks! Hopefully soon I can track down what is causing the Service_Main to prematurely exit in the first place. These fixes allow the service to shutdown as requested but I don't believe there are any guarantees that all cleanup code will get the opportunity to run, so it would be nice to track down what the root cause is.

Iristyle added a commit to Iristyle/puppet-win32-ruby that referenced this pull request Aug 14, 2014

(PUP-2987) Patch win32-service for service hang
 - Previously there were a number of conditions in the win32-service
   gem that could create a deadlock situation by not signaling a manual
   reset event.  Without these events signaled, the call to Win32
   StartServiceCtrlDispatcher will block forever, and the service will
   never enter the STOPPED state from STOP_PENDING.
 - This issue can be particularly problematic when attempting to
   upgrade Puppet using the MSI, as the MSI attempts to stop the
   puppet service prior to installing a new version.  If it fails to
   do so, the upgrade will prompt for a retry in interactive mode, or
   will simply fail in unattended.
 - This issue had been sporadically reported / observed in x86 code,
   but appears to be more prevalent / reproducible in x64 code.

 - This code has been merged into the win32-servie gem, but not yet
   released:
   chef/win32-service#21
   See included patch file for individual commit details.

Iristyle added a commit to Iristyle/puppet-win32-ruby that referenced this pull request Aug 14, 2014

(PUP-2987) Patch win32-service for service hang
 - Previously there were a number of conditions in the win32-service
   gem that could create a deadlock situation by not signaling a manual
   reset event.  Without these events signaled, the call to Win32
   StartServiceCtrlDispatcher will block forever, and the service will
   never enter the STOPPED state from STOP_PENDING.
 - This issue can be particularly problematic when attempting to
   upgrade Puppet using the MSI, as the MSI attempts to stop the
   puppet service prior to installing a new version.  If it fails to
   do so, the upgrade will prompt for a retry in interactive mode, or
   will simply fail in unattended.
 - This issue had been sporadically reported / observed in x86 code,
   but appears to be more prevalent / reproducible in x64 code.

 - This code has been merged into the win32-servie gem, but not yet
   released:
   chef/win32-service#21
   See included patch file for individual commit details.

Iristyle added a commit to Iristyle/puppet that referenced this pull request Aug 14, 2014

(PUP-2987) Restructure Windows service code
 - Move runinterval parsing and execution of the puppet agent to a new
   thread that maintains a busyloop until either:
    - the service is no longer running
    - an exception has occurred with parsing the runinterval
 - Rather than performing a wakeup against the main thread, simply kill
   this new thread on service_stop
 - Main service code will wait until the busyloop thread dies, then will
   exit
 - Note that this change is ultimately not responsible for the service
   hangs experienced in the service.  Additional changes have been made
   to the win32-service gem to ensure events being waited on with
   WaitForSingleObject are always signaled properly, even in failure
   cases.  Please see merged PR for more info:
   chef/win32-service#21
   puppetlabs/puppet-win32-ruby#51

Iristyle added a commit to Iristyle/puppet-win32-ruby that referenced this pull request Aug 15, 2014

(PUP-2987) Patch win32-service for service hang
 - Previously there were a number of conditions in the win32-service
   gem that could create a deadlock situation by not signaling a manual
   reset event.  Without these events signaled, the call to Win32
   StartServiceCtrlDispatcher will block forever, and the service will
   never enter the STOPPED state from STOP_PENDING.
 - This issue can be particularly problematic when attempting to
   upgrade Puppet using the MSI, as the MSI attempts to stop the
   puppet service prior to installing a new version.  If it fails to
   do so, the upgrade will prompt for a retry in interactive mode, or
   will simply fail in unattended.
 - This issue had been sporadically reported / observed in x86 code,
   but appears to be more prevalent / reproducible in x64 code.

 - This code has been merged into the win32-servie gem, but not yet
   released:
   chef/win32-service#21
   See included patch file for individual commit details.

Iristyle added a commit to Iristyle/puppet-win32-ruby that referenced this pull request Aug 15, 2014

(PUP-2987) Patch win32-service for service hang
 - Previously there were a number of conditions in the win32-service
   gem that could create a deadlock situation by not signaling a manual
   reset event.  Without these events signaled, the call to Win32
   StartServiceCtrlDispatcher will block forever, and the service will
   never enter the STOPPED state from STOP_PENDING.
 - This issue can be particularly problematic when attempting to
   upgrade Puppet using the MSI, as the MSI attempts to stop the
   puppet service prior to installing a new version.  If it fails to
   do so, the upgrade will prompt for a retry in interactive mode, or
   will simply fail in unattended.
 - This issue had been sporadically reported / observed in x86 code,
   but appears to be more prevalent / reproducible in x64 code.

 - This code has been merged into the win32-servie gem, but not yet
   released:
   chef/win32-service#21
   See included patch file for individual commit details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.