Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nginx 1.9.14 (devel) #88

Closed
wants to merge 1 commit into from
Closed

Conversation

matteocng
Copy link
Contributor

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same formula update/change?
  • Does your submission pass brew audit --strict --online <formula> (where <formula> is the name of the formula you're submitting)?
  • Have you built your formula locally prior to submission with brew install <formula>?

@apjanke apjanke added the CI-requeued PR has been re-added to the queue label Apr 6, 2016
@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

Build timed out after 3 hours.

Build timed out (after 180 minutes). Marking the build as aborted.
Build was aborted
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?
Archiving artifacts

Maybe it'll make it under the time limit now that we're removed --recursive from the brew uses check. Requeueing again.

@MikeMcQuaid
Copy link
Member

The timeout for this tap should be 60 minutes and not 3 hours 😞. I saw this with the last nginx devel version and figured it was a one-time bug. I think it may be related to the new Xcode version as it's not actually failing on devel.

@DomT4
Copy link
Member

DomT4 commented Apr 7, 2016

The timeout for this tap should be 60 minutes and not 3 hours 😞.

Spotted this last night and reset it. Should be 60 again now, not sure why it was at 180 initially.

Not sure what the problem is here, even stable builds fine locally using 7.3:

==> Summary
🍺  /usr/local/Cellar/nginx/1.8.1: 7 files, 935.0K, built in 22 seconds

As does devel:

==> Summary
🍺  /usr/local/Cellar/nginx/1.9.13: 7 files, 1M, built in 17 seconds

Tempted to just merge this as-is again.

@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

Re-queued just in case this was a transient network error. Here's the console output from the aborted build, for future reference.

I'm going to watch it and abort it manually after ten minutes or so if it's still going, to avoid hogging the test-bot queue.

@apjanke apjanke removed the CI-requeued PR has been re-added to the queue label Apr 7, 2016
@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

TL;DR

I reproduced interactively on an el cap test bot VM. Looks like the ENV/scm/git wrapper is stuck (probably recursively exec-ing itself instead of calling a real git) under some conditions. Possibly related to the env.userpaths in the formula in combination with no brewed git being installed on the test boxes. But the specific behavior is rather wonky WRT exec and debug logging.

Details

I logged in to the spare El Cap VM and ran brew test-bot --ci-pr --keep-logs --verbose nginx interactively.

It hung here during the build for me.

cp conf/nginx.conf '/usr/local/etc/nginx/nginx.conf.default'
test -d '/usr/local/var/run'        || mkdir -p '/usr/local/var/run'
test -d '/usr/local/var/log/nginx' ||       mkdir -p '/usr/local/var/log/nginx'
test -d '/usr/local/Cellar/nginx/1.8.1/html'        || cp -R html '/usr/local/Cellar/nginx/1.8.1'
test -d '/usr/local/var/log/nginx' ||       mkdir -p '/usr/local/var/log/nginx'

Based on the scrollback, this is when it was doing the 1.8.1 stable install, not the devel install.

When I build it locally, this looks like right before it does "cleaning" and "linking" steps of the install. So it looks like it's hanging inside brew install.

test -f '/usr/local/etc/nginx/nginx.conf'       || cp conf/nginx.conf '/usr/local/etc/nginx/nginx.conf'
cp conf/nginx.conf '/usr/local/etc/nginx/nginx.conf.default'
test -d '/usr/local/var/run'        || mkdir -p '/usr/local/var/run'
test -d '/usr/local/var/log/nginx' ||       mkdir -p '/usr/local/var/log/nginx'
test -d '/usr/local/Cellar/nginx/1.8.1/html'        || cp -R html '/usr/local/Cellar/nginx/1.8.1'
test -d '/usr/local/var/log/nginx' ||       mkdir -p '/usr/local/var/log/nginx'
==> Cleaning
==> Finishing up
ln -s ../Cellar/nginx/1.8.1/bin/nginx nginx
ln -s ../../../Cellar/nginx/1.8.1/share/man/man8/nginx.8 nginx.8

ps from another ssh session shows this.

elcapitanvm-2:1.8.1 brew$ ps -f
  UID   PID  PPID   C STIME   TTY           TIME CMD
  502   296   285   0  8:46AM ttys000    0:02.36 -bash
  502   403   402   0  8:48AM ttys001    0:00.31 -bash
  502  1324   403   0  9:28AM ttys001    0:00.55 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -W0 /usr/local/Library/brew.rb test-bot --ci-pr nginx --keep-logs --verbose
  502  4872  1324   0  9:31AM ttys001    0:00.50 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -W0 /usr/local/Library/brew.rb install --verbose --build-bottle nginx
  502  4900  4872   0  9:31AM ttys001    0:00.01 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -W0 /usr/local/Library/brew.rb install --verbose --build-bottle nginx
  502  4902  4900   0  9:31AM ttys001    0:00.33 /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby -W0 -I /usr/local/Library/Homebrew -- /usr/local/Library/Homebrew/build.rb /usr/local/Library/Taps/homebrew/homebrew-core/Formula/nginx.rb --build-bottle --verbose
  502  6985  4902   0  9:31AM ttys001   24:41.22 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version

top shows ruby is still running, using almost a full core, and has racked up 15 minutes of CPU time. Sounds like it's stuck in a loop or internal processing somewhere, and not blocking on absent I/O or an external hung command.

6985   ruby             83.6 15:14.06 2/1

And that PID 6985 is our git wrapper inside ENV/scm.

6985 ttys001   21:15.97 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version

There are no loops in that code. But since it calls exec, it might just be stuck recursively exec-ing itself, and keeping the same PID.

I confirmed most of this by sticking an abort "DEBUG" call in ENV/scm/git. That kept the installation from hanging. (I think because it caused git to return with bogus output, which didn't matter since just git --version was being called.)

Perhaps this is related to the combination of :env userpaths in the formula (which is uncommon) and the brewed git not being installed on the test bot machines (where it would be on the user machines we're testing on).

I added some debugging output to get the SCM wrapper to log what it was doing. That didn't capture self-execing, but I think that's because the exec might just be throwing the process's output buffers in the bit bucket. If I keep hitting ps while it's hung, every so often something like this will pop up, with a sh invocation briefly appearing. That sure looks like self-execing.

45941 ttys001    0:00.32 /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby -W0 -I /usr/local/Library/Homebrew -- /usr/local/Library/Homebrew/
48011 ttys001    1:40.49 /bin/sh /usr/local/Library/ENV/scm/git --version
53767 ttys001    0:00.00 (sh)

I tried to track it down further by adding some logging to ENV/scm/git, as below. But when I did that, it stopped hanging. This behavior was reproducible: if I restored the original ENV/scm/git wrapper with no debug output, it would go back to hanging in the first place. Heisenbug!

def logit(str)
  open("/Users/brew/Library/Logs/Homebrew/scm_git.log", "a") { |f|
    f.puts str
    f.flush
  }
end

logit "F: #{F} D: #{D}"

def exec(*args)
  # prevent fork-bombs
  arg0 = args.first
  logit "exec-ing #{arg0}: #{args}"
  logit "aborting due to self-exec" if arg0 =~ /^#{F}/i || File.expand_path(arg0) == File.expand_path(__FILE__)
  return if arg0 =~ /^#{F}/i || File.expand_path(arg0) == File.expand_path(__FILE__)
  super
end

That doesn't make any sense to me. Perhaps there's some weird interaction between flushing that log file handle and other file handles or output buffers or who knows what.

@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

A clue! Changing the ENV/scm/git exec wrapper to use system/exit instead of exec still hangs; it's probably not something specific to exec.

def exec2(*args)
  # prevent fork-bombs
  arg0 = args.first
  return if arg0 =~ /^#{F}/i || File.expand_path(arg0) == File.expand_path(__FILE__)
  system *args
  exit $?
end

...

exec2 brew_version, *ARGV if File.executable? brew_version

But then you can see this in ps.

56798 ttys001    0:00.27 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56801 ttys001    0:00.27 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56804 ttys001    0:00.28 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56807 ttys001    0:00.39 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56810 ttys001    0:00.27 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56813 ttys001    0:00.35 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56816 ttys001    0:00.37 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56819 ttys001    0:00.27 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56822 ttys001    0:00.28 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56825 ttys001    0:00.26 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56828 ttys001    0:00.40 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56831 ttys001    0:00.30 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56835 ttys001    0:00.28 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56838 ttys001    0:00.31 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56841 ttys001    0:00.28 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version
56844 ttys001    0:00.24 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/4.3/git --version
56847 ttys001    0:00.29 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -x /usr/local/Library/ENV/scm/git --version

It looks like under this configuration, the git wrappers in ENV/scm and ENV/x.y are mutually recursively calling each other.

(Why would logging output affect that? I have no idea. I payed pretty close attention when I was doing it and switched the file in and out a few times, but maybe there was something extraneous going on.)

@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

Strangely, adding logging to the system/exit version fixes the hang, too. Here's my whole git file.

Leave this logit call commented out, and it'll hang. Uncomment the call, no hang. I flipped it back and forth several times, doing one or more runs each way and cleaning up in between, and the behavior is consistent; I don't think I'm hallucinating this.

def logit(str)
  open("/Users/brew/Library/Logs/Homebrew/scm_git.log", "a") { |f|
    f.puts str
    f.flush
  }
end

# logit "F: #{F} D: #{D} expand_path: #{File.expand_path(__FILE__)}"

def exec2(*args)
  # prevent fork-bombs
  arg0 = args.first
  return if arg0 =~ /^#{F}/i || File.expand_path(arg0) == File.expand_path(__FILE__)
  system *args
  exit $?
end

Note: if you're using the system/exit version of this, eventually the hang will resolve itself because you'll have spawned so many processes that the next git call finally fails because it can't fork, and the stack unwinds and things proceed. Takes a couple minutes to happen.

@apjanke apjanke added the bug Reproducible Homebrew/homebrew-core bug label Apr 7, 2016
@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

Anyway, let's go ahead and merge this PR as-is and take the discussion about the hang over to a new issue. It's not specific to this PR, so no reason to hold up the new version and keep spamming @matteocng with notifications on this, right?

@DomT4
Copy link
Member

DomT4 commented Apr 7, 2016

Great sleuthing! Seems reasonable to move your discovered issue over to the brew repo for further probing and merge this. 🚢 as desired.

@apjanke apjanke closed this in c48f928 Apr 7, 2016
@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

Shipped.

Thanks for your contribution to Homebrew, @matteocng!

@apjanke
Copy link
Contributor

apjanke commented Apr 7, 2016

FYI, the hang was diagnosed and fixed in Homebrew/brew#46. Long story short: ENV/scm/git and ENV/4.3/git were both on the path and thus mutually recursively exec-ing each other in an infinite loop. The wonkiness I was seeing was due to logging attempts running in to the sandbox boundaries in ways that didn't produce obvious errors and happened to fix the hang. Moral for future readers: make sure when you do diagnostic logging it works in the sandbox, either to a known-safe path or by using logger.

@Homebrew Homebrew locked and limited conversation to collaborators May 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Reproducible Homebrew/homebrew-core bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants