Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Reel fails on first 2-3 requests, and after cooling down. #33

Closed
digitalextremist opened this issue Mar 13, 2013 · 36 comments
Closed

Reel fails on first 2-3 requests, and after cooling down. #33

digitalextremist opened this issue Mar 13, 2013 · 36 comments

Comments

@digitalextremist
Copy link
Member

My first 2-3 pageloads always fail, using the Rack adapter for Reel. Not always in the process of returning a response from my application, but more so from returning an essential public file, like a JS or CSS file.

Example stack trace:

Reel::Server crashed!
Celluloid::DeadTaskError: cannot resume a dead task (dead fiber called)
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/tasks/task_fiber.rb:51:in resume' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/tasks/task_fiber.rb:47:inresume'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/responses.rb:11:in dispatch' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/actor.rb:329:inhandle_message'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/actor.rb:196:in run' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/actor.rb:184:ininitialize'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/thread_handle.rb:17:in initialize' org/jruby/RubyProc.java:249:incall'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48:in `create'

10.126.130.1 - - [13/Mar/2013 15:10:56] "GET /01E.min.css 1.1" 304 - 0.0380

If I reload the URL in the browser 2-3 times, the request works.

And after an undetermined cool down period, 2-3 requests will also fail, as it does when Reel first starts up to return Rack responses.

@tarcieri
Copy link
Member

Are you seeing any other traces to help debug this problem? That's not a lot to go on

@digitalextremist
Copy link
Member Author

Trying to think of how to produce more diagnostic data.

Just returned from 90+ minutes away from my development environment, and my first pageload died with:

Celluloid::DeadTaskError: cannot resume a dead task (dead fiber called)

Will troubleshoot further //

@digitalextremist
Copy link
Member Author

Ok @tarcieri, I isolated this. I took my application out of the way, stopped using Rack and invoke Reel directly.

Here's a gist showing my sample server ( which is your sample server in the Reel readme ):

https://gist.github.com/digitalextremist/5316190

On pageload 1 and 2, I see "Hello, world!" Then, on pageload 3, this always happens:

E, [2013-04-04T19:39:25.098000 #15443] ERROR -- : MyServer crashed!
Celluloid::DeadTaskError: cannot resume a dead task (dead fiber called)
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/tasks/task_fiber.rb:55:in resume' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/responses.rb:11:indispatch'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/actor.rb:328:in handle_message' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/actor.rb:183:inrun'
/usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/actor.rb:171:in initialize' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/thread_handle.rb:12:ininitialize'
org/jruby/RubyProc.java:249:in call' /usr/local/rvm/gems/jruby-1.7.3/gems/celluloid-0.13.0/lib/celluloid/internal_pool.rb:55:increate'

Then the server exits.

I noticed this before in my large application, based on Rack. When I used Reel, it seemed every 3rd pageload dead, which was sometimes an image, sometimes critical javascript, sometimes a stylesheet. This example is much more simple though. I can't imagine this is something going on in the wild for people. Death by a 3rd pageload seems a bit limited in uptime :)

@tarcieri
Copy link
Member

tarcieri commented Apr 5, 2013

@digitalextremist I recently found a rather severe bug in Celluloid::IO. I'm going to do another release (0.13.1). If you can try to reproduce the same problem on that, that'd be great.

@digitalextremist
Copy link
Member Author

Then, with this gist: https://gist.github.com/digitalextremist/5316264

Same thing, except once supervised: pageload 4 and 5 work, then 6 dies; 7 and 8 work, then 9 dies... etc.

@digitalextremist
Copy link
Member Author

Ah sorry, missed your comment while I was gisting a second repro.
Any clue on when I should retry with 0.13.1?

@digitalextremist
Copy link
Member Author

Or anywhere I can dig in 0.13.0 to help?

@tarcieri
Copy link
Member

tarcieri commented Apr 5, 2013

I'll be rolling the gem shortly, hold tight...

@tarcieri
Copy link
Member

tarcieri commented Apr 5, 2013

I just released 0.13.1. This resolved a number of problems that @aniero had with one of his projects. Let me know if it fixes yours.

@digitalextremist
Copy link
Member Author

I re-ran my dummy application, and it is affected...
Now, on the third pageload, it just hangs.
Maybe it's tied to what you were resolving for 0.13.1?

When I cancel the pageload in the browser and refresh, the hang continues.
No new client request shows up in the console, but also no crash directly happens.

@digitalextremist
Copy link
Member Author

Thanks for your work on this branch of thinking in systems theory by the way. I appreciate your work. I feel a need out of just plain honesty in taking true advantage of what we're capable of to go down this route you've helped pave.

@digitalextremist
Copy link
Member Author

Enabled invokedynamic in jRuby and the same issue continues. I saw you mentioning something about invokedynamic being encouraged. Not sure how to troubleshoot. If you can let me into your thinking on this I will help all I can.

@tarcieri
Copy link
Member

tarcieri commented Apr 5, 2013

Let me take a look. If the gist you posted is still crashing, something is
definitely wrong.

On Thu, Apr 4, 2013 at 9:49 PM, digitalextremist
notifications@github.comwrote:

I re-ran my dummy application, and it is affected...
Now, on the third pageload, it just hangs.
Maybe it's tied to what you were resolving for 0.13.1?

When I cancel the pageload in the browser and refresh, the hang continues.
No new client request shows up in the console, but also no crash directly
happens.


Reply to this email directly or view it on GitHubhttps://github.com//issues/33#issuecomment-15938868
.

Tony Arcieri

@digitalextremist
Copy link
Member Author

Yeah, the Hello, World test seems so simple. If I can help somewhere, clue me into where the cheese moved in the last update in your mind and I'll dig around also in my fork.

@digitalextremist
Copy link
Member Author

With the code I have ( Reel 0.3.0 and Celluloid 0.13.0 and Celluloid::IO 0.13.1 ) running Reel as a Rack handler seems to allow pageloads to get past that hang, but the hang does occur. Everything seems generally slower too, after the C::IO change and the application cannot completely function. Before, when pageloads died, you knew something did not complete for the browser. Obviously them not dying is better, but a new issue comes with silent hangs.

It will usually take more than three pageloads for the application to render one instance. One for the HTML, one for the CSS, one for the JS, one for a JS template and one for an AJAX call, with another AJAX call I am trying to make into a websocket sync connection. There is a pageload for each image also. So depending on the order of those requests, something is always left out, sometimes vital, sometimes aesthetic.

I think this issue you are looking at is the crux of one of two issues keeping me from being able to shift to Reel, so thanks for digging into it.

@halorgium
Copy link
Contributor

A lot of the problems I found were because of the socket reuse occurring in the browsers.
I would imagine the issue relates to unclosed and pending sockets.

If you try and reproduce with curl, what happens?

@zerowidth
Copy link

If you reproduce with curl, you'll GET a baby with a <head> and a <body>.

Aside from not being able to get httperf working on my machine (wtf), I was able to hammer the Reel portion of ringleader pretty hard without any problems using ab, and I haven't seen any errors from the browser either.

@zerowidth
Copy link

Er, to clarify: since ringleader doesn't use the rack handler, I'd look there first.

@digitalextremist
Copy link
Member Author

@halorgium Same issue with curl... except there is a short warning:

root@two:/mu/zero# jruby reel.rb
Starting test server //
Client requested: GET /
Client requested: GET /
W, [2013-04-06T17:04:54.058000 #2664] WARN -- : reactor attempted to resume a dead task

So to clarify, there is that warning once, which is the start of an infinite hang. Curl sits at the command line waiting for a response, same as chrome and firefox did, until I stop the attempt ( Ctrl-C to Curl ). Then when I try to run curl again on the Reel endpoint, the hang continues but with no Log line for GET / and no warning. It's zombie.

This is happening with gist. No rack whatsoever, and not subclassed:

https://gist.github.com/digitalextremist/5316264

@digitalextremist
Copy link
Member Author

This does seem to work, for some strange reason:
https://gist.github.com/digitalextremist/5328506

That is a test using Reel::App and Octarine.

@digitalextremist
Copy link
Member Author

By "work" I mean in Chrome, I can refresh that endpoint over and over and it is always instantaneous and never hangs.

@digitalextremist
Copy link
Member Author

Strangely, upon checking the previous dummy test again, that also works. I updated my bundle which is the only thing I could guess that would affect anything. I had already been using 0.13.0 of Celluloid, 0.13.1 of Celluloid::IO and 0.3.0 of Reel. I brought in Octarine. Also, previous to that had restarted my vagrant ubuntu virtual machine.

It appears nothing is stopping me from fleshing out a Reel+Octarine test application to try and pull in my existing code without Rack+Sinatra. I'd like to keep this open until I get some more time to break this again. As a matter of fact I will try reusing the Rack handler version and see if that's reliable, in which case I'll try that in production first.

@digitalextremist
Copy link
Member Author

Wait! On the standard demo @tarcieri posted on the Reel readme (no subclass) there is still the cool-down issue, but it doesn't seem to be nearly as bad now. After X time away, when you return and refresh an endpoint, there is:

W, [2013-04-06T19:00:22.543000 #2889] WARN -- : reactor attempted to resume a dead task

@digitalextremist
Copy link
Member Author

The hang issue returned, after a cool down period. Only way to get the Rack::Reel server back is Ctrl-C, then I re-ran the ultra simple examples and saw they still work beyond 3 pageloads. Certain pageloads error out intermittently, so an image here and there is dead. Will build lite version of my application without Rack and see if it's Rack causing it.

@digitalextremist
Copy link
Member Author

Running my 01E.rb gist [ https://gist.github.com/digitalextremist/5328506 ] I am able to repeat the hang issue that started this thread, but unsure where the hangup is coming from. Once the hangup happens, the server never comes back. It's usually after this error:

W, [2013-04-06T20:02:53.487000 #4103] WARN -- : reactor attempted to resume a dead task

In this case, the hang occurred on the second pageload.

Tested with Chrome, Safari, Firefox and then curl. All are hung. No console activity announcing a request either. Reel just goes off into a netherworld, unresponsive.

@digitalextremist
Copy link
Member Author

Yeah, even just using Curl, after 2 or 3 pageloads, the 01E.rb example hangs.

@digitalextremist
Copy link
Member Author

Thinking restarting the server reopened the ability for Reel to provide pageloads, I went to shutdown -h now and something hung up the shutdown, so I had to hard-powerdown the virtual.

After returning with a fresh virtual, Reel can do many pageloads without hanging.

Going to give the test server time to cool down and see if I can trigger the hanging error again that way.

@digitalextremist
Copy link
Member Author

Caused hang again.

Without giving time for cooldown, leaving Chrome ( and closing it) I used curl and after the first pageload succeeded, on the second pageload it gave the same WARN line and is now hungup.

@digitalextremist
Copy link
Member Author

After Ctrl-C on hungup server, then rerunning server, using Curl to hit the test endpoint, the 3rd pageload hang returns, presumably unless I recycle the virtual.

Recycled virtual, and this time it did not hang on shutdown. In fresh virtual, reran with Curl, and it hangs again on the third pageload. Ctrl-C on 01E.rb, then using Chrome, does over 20 pageloads no problem. Then ran with Firefox, reloaded 10 or so times no problem, then hit the endpoint with curl once, and no pageload, just the WARN line. So in this case there wasn't even a first success, it just immediately crashed for curl, then Chrome and Firefox were also blocked out.

@digitalextremist
Copy link
Member Author

Not sure of the relationship to the issue linked. Read up on that and don't see a connection.

Hang can be caused by using Chrome/Firefox for over 0-50 pageloads, which seems like it could keep going forever no problem. Point being, you can pageload forever until you use Curl. Then, make one Curl request. That first request will succeed, then all requests no matter what from will hang.

Interesting nuance: If you do that Curl request, but only once, then use Chrome again, it will fire off a WARN for attempted to resume a dead task, then it will work for the following pageload, and then hang.

@halorgium
Copy link
Contributor

@digitalextremist are you able to jump on IRC? I'd love to get to the bottom of this? freenode#celluloid

@digitalextremist
Copy link
Member Author

So far the ways to hang this are:

  1. 3 Curl pageloads with no other browser. 3rd will fail.
  2. 0-X pageloads with other browser, then curl: 2nd curl request will fail.
    Both leave the server in a zombie state, unresponsive but not posting a "server unreachable"

I'll see you there. Nickname: decentrality

@digitalextremist
Copy link
Member Author

At prompting of @halorgium, added this after require 'reel':

Celluloid.task_class = Celluloid::TaskThread

Per Celluloid #169 regarding jRuby 1.7.3

That seems to fix it. All previous causes of hang are gone.

@digitalextremist
Copy link
Member Author

So far, what @halorgium suggested from @tarcieri post referenced is so effective it has also stopped hangs and failed fibers when using Reel as a Rack handler.

@digitalextremist
Copy link
Member Author

I am going to go ahead and close this for now, since it seems like a duplicate or at least cousin issue to Celluloid #169

@digitalextremist
Copy link
Member Author

Without any work around, reverting to jRuby 1.7.2 solves issue (again).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants