New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition in Babun Startup #332

Closed
leycec opened this Issue May 8, 2015 · 7 comments

Comments

Projects
None yet
6 participants
@leycec
Contributor

leycec commented May 8, 2015

Under Windows XP Pro x64 (Service Pack 2), babun's desktop shortcut fails with the following depressingly familiar fatal error:

Failed to fork child process: Resource temporarily unavailable.
DLL rebasing may be required. See 'rebaseall --help'.

Superficially, this would appear to be the result of BLODA. Various Cygwin mailing list threads corroborate this. It must be the case! In this case, appearances would be deceiving.

We now demonstrate this issue to be the result of a subtle race condition between Windows Explorer, Windows batch files, and Cygwin itself. We have not identified the exact underlying cause of this issue. ("Time! What is time? Reality, it hurts me so.")

Given fundamental incompatibilities between the Windows and POSIX process models, there may very well exist no genuine solution. We have, however, identified several babun-specific solutions applicable by both end users and babun developers – each resolving this issue with varying degrees of success.

Let's get to it, shall we?

The Problem

Or, How We Know It's a Race Condition and not BLODA

We first installed babun onto a spare laptop recently formatted with Windows XP Pro x64 (Service Pack 2) with all available updates. Babun's desktop shortcut repeatably yields the above error. Suspecting BLODA, we grep for verboten services, processes, and dynamic libraries (especially third-party antivirus software). We fail. Thanks to the unicorn-like magic of formatting, the system appears to be clean. No bloody BLODA.

In abject desperation, we run the .babun\rebase.bat script from the stock cmd.exe shell. It runs without error, which seems good! Babun's desktop shortcut continues to yields the same error, which seems bad. We press onward.

As we know, BLODA lurks in the darkest corners of every registry. A requisite device driver (say, any of the various Nvidia nForce motherboard drivers installed for this laptop) could conceivably be performing DLL hook injection. Incidentally, what sort of OS permits a process to inject arbitrary code into the address space of other processes? We digress.

To eliminate the unlikely (but possible) possibility of driver-based BLODA, we next installed babun onto an Ubuntu Linux VirtualBox-hosted Windows XP Pro x64 guest (Service Pack 2) with all available updates. No other software – either first- or third-party – is installed. Babun's it, baby. Unless VirtualBox itself or VirtualBox's Guest Additions perform DLL hook injection, BLODA is right out. Neither of them appear to, so BLODA is right out. So Babun just works!

Just kidding. Babun's desktop shortcut repeatably yields the above error. Here's where things get smelly. Whereas this shortcut always fails on the laptop, it only fails approximately half of the time on the VirtualBox guest. The other half of the time, babun starts as expected into a Cygwin-based MinTTY terminal. All startup checks succeed. We begin to get that lunatic gleam in our eyes. ("We will fix this. Oh, yesss.") Sanity, you have met your match.

We inspect the .babun\babun.bat script. Everything checks out. Nothing blatantly insane. Of course, we idly note that:

# The following lines...
:BEGIN
set CYGWIN_HOME=%BABUN_HOME%\cygwin 
if exist "%CYGWIN_HOME%\bin\mintty.exe" goto RUN
if not exist "%CYGWIN_HOME%\bin\mintty.exe" goto NOTFOUND 

:RUN
ECHO [babun] Starting babun
start "" "%CYGWIN_HOME%\bin\mintty.exe" - || goto :ERROR
GOTO END

:NOTFOUND 
ECHO [babun] MinTTY not found. Babun installation seems to be corrupted.
EXIT /b 255

# ...can be reduced to just this.
set CYGWIN_HOME=%BABUN_HOME%\cygwin

if exist "%CYGWIN_HOME%\bin\mintty.exe" goto RUN
ECHO [babun] MinTTY not found. Babun installation seems to be corrupted.
EXIT /b 255

:RUN
ECHO [babun] Starting babun
start "" "%CYGWIN_HOME%\bin\mintty.exe" - || goto :ERROR
GOTO END

Keep it simple, right? But that doesn't actually help us with anything. We try echoing out the values of %BABUN_HOME% and %CYGWIN_HOME% by adding the following two commands above the :BEGIN label (which, incidentally, is unreferenced and hence removable):

ECHO [babun] Babun home: %BABUN_HOME%
ECHO [babun] Cygwin home: %CYGWIN_HOME%

The error goes away. Wat? Yes, insanity walks among us. Adding two echo statements to the top of .babun\babun.bat makes Babun a happy boy. Not entirely, of course: the error still occurs approximately 7% of the time on the VirtualBox guest. But 7% beats 50%. We ran the numbers. (We didn't, actually.)

Under sane operating systems and scripting languages, output has no side effect other than output. But this is Windows. If it's terrible, it's possible. We delve deeper into madness by prefixing each of the prior ECHO statements with REM as follows:

REM ECHO [babun] Babun home: %BABUN_HOME%
REM ECHO [babun] Cygwin home: %CYGWIN_HOME%

This will effectively revert our changes and re-expose the error! It doesn't. The error continues to occur 7% of the time. We remove the REM-prefixed ECHO statements entirely. The error now occurs 50% of the time.

We tear out a fistful of hair, screaming to the penitent sky: "WHY, WHY, WHY." Then we... remember. REM is a command. It's not preprocessed away during lexing like under all sane languages. Being insane, it's actually run in the same parse phase as every other command.

Given that, what do REM and ECHO have in common? There's only one answer: they both consume a non-negligible number of CPU cycles. In batch scripting, even comments are slow. (Microsoft, how did you break comments? But you did.)

Given that, why does slowing down a batch script when executing a previously failing program cause that program to succeed with greater frequency? Again, there's only one answer: a fatal race condition.

Or, How We Know It's a Windows Explorer <-> Batch Script Race Condition

"Oh, it must be a Cygwin-wide race condition!", you bleat. Right? Wrong.

We revert all of our changes and try:

  • Running the original .babun\babun.bat directly from the Windows Explorer. It fails 50% of the time, which rules out Babun's desktop shortcut as a possible cause. It's not the shortcut itself.
  • Running the original .babun\babun.bat directly from a cmd.exe shell. It succeeds 100% of the time. 100%. No observed errors, ever. Which implicates the Windows Explorer itself.
  • Installing ConEmu and running the original .babun\babun.bat directly from a ConEmu shell. It also succeeds 100% of the time, thereby confirming that terminals are good and Explorer is bad.
  • Running .babun\bin\cygwin\mintty.exe directly from the Windows Explorer. It also succeeds 100% of the time. Which implicates the batch script.

Did you ever play logic puzzles when you were a kid? I did. Lots. (That probably explains a few things.) This is like those, only fun-less. We were tempted to draw up a Markdown-based table synopsizing the logic puzzle defined by the above findings. But we're tired... and so are you.

So the syllogistic conclusion is clear: there exists a race condition in babun's startup batch script when run (either directly or via a shortcut) from the Windows Explorer under Windows XP Pro x64.

Let's talk "solutions."

The Solutions

For End User Eyes Only

If you're an end user and just want babun to work fergodsakes, the simplest and most reliable solution is to:

  • Delete the current babun desktop shortcut.
  • Make a new shortcut pointing to the .babun\bin\cygwin\mintty.exe executable.
  • Rename it babun.
  • Move it to the desktop.
  • Happiness ensues. (It better.)

A less simple and less reliable solution is to add an absurd number of REM statements to the .babun\babun.bat script. About ten or so did it in our case, but your mileage may vary. Oddly, it does not appear to matter where in that script the REM statements are added to. We added them to the very end. We added them to the very beginning. The result was the same.

For Developer Eyes Only

If you're a developer and graciously want to try fixing this, the simplest and most reliable solution is to edit babun's installation batch script to also install a shortcut to .babun\bin\cygwin\mintty.exe – either in addition to or replacing the existing shortcut to .babun\babun.bat.

For systems on which Cygwin has not been previously installed, babun's startup batch script seems to serve no purpose. MinTTY seems to give the same results, with the added bonus of no race conditions.

For systems on which Cygwin has been previously installed, babun's startup batch script seems to serve the purpose of redefining %CYGWIN_HOME% to point to babun's version of Cygwin. That seems important. Or is it? Specifically, is babun's MinTTY directly runnable under such systems without error?

If so, our recommendation is to remove .babun\babun.bat entirely. Else, our recommendation is to both install an additional shortcut to babun's MinTTY and append .babun\babun.bat with an excessive number of REM statements. (Perhaps add the entirety of your license there? No one will know that it has an important undocumented side effect! Muhahaha.)

In Synopsis

We're tired. This was an absurd issue, and the fixes are equally absurd. The underlying issue is probably infeasible to debug and certainly infeasible to fix. Installing a desktop shortcut to babun's MinTTY appears to be the least bad of only bad options.

Welcome to Windows.

@jlupi

This comment has been minimized.

Contributor

jlupi commented May 15, 2015

wow, this is the most scary yet most enjoyable bug description I've ever read.
I know how much effort it cost to investigate such issues. Thanks for doing that.

In the early releases we had some good reasons to start babun via the bat script but right now I do not see any. We can make the desktop shortcut point straight to mintty and squash this change into the next release.

This would work for everybody who install babun from scratch, but not for those who will update it via babun update.
Do you mind if we take the bug description and put it onto our blog?

@jlupi jlupi closed this in 6d3fa67 May 15, 2015

@jlupi

This comment has been minimized.

Contributor

jlupi commented May 15, 2015

Fixed in 1.2.0
See release status: #304

@leycec

This comment has been minimized.

Contributor

leycec commented May 20, 2015

wow, this is the most scary yet most enjoyable bug description I've ever read.
I know how much effort it cost to investigate such issues. Thanks for doing that.

Wow! Thanks for the earnest acknowledgement. I did try to inject a bit of levity into the whole affair. It was pretty frustrating to debug. I was keen to see that no one hit that same threshold of unmitigated suffering.

In the early releases we had some good reasons to start babun via the bat script but right now I do not see any.

Right. I figured you had sensible reasons, but that this might be vestigial now.

We can make the desktop shortcut point straight to mintty and squash this change into the next release.

Yay! You are champions among Github men. Thanks for attending to this so swiftly.

Do you mind if we take the bug description and put it onto our blog?

Not at all! That'd be awesome, in fact. Spread the good cheer, I say.

@ArthurZ

This comment has been minimized.

ArthurZ commented Jan 12, 2017

I still get in 2017
Failed to fork child process: Resource temporarily unavailable.
DLL rebasing may be required. See 'rebaseall --help'.

@stayupthetree

This comment has been minimized.

stayupthetree commented Feb 1, 2017

Just started getting it today

@tom-sherman

This comment has been minimized.

tom-sherman commented Feb 5, 2017

Restarting my PC fixed the issue for me.

@Jeklah

This comment has been minimized.

Jeklah commented Aug 21, 2017

Nice! This appears to have fixed the slow prompt! I would often get the results of a command back quickish but the prompt was taking ages to reappear, very frustrating! changing the babun.bat and making a new desktop shortcut appears to have fixed it for me. Thank you very much, this has bothered me for a while now over multiple machines!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment