Therubyracer isn't thread-safe in production? #270

Open
ghost opened this Issue Aug 21, 2013 · 8 comments

Comments

Projects
None yet
5 participants
@ghost

ghost commented Aug 21, 2013

Hi,

I use therubyracer to evaluate some JS code (render template with doT.js in fact). In development environment it work's without any problems, tested with multithreaded Thin, multithreaded Puma and WEBRick. No deadlocks/hangups.

But, in production mode, therubyracer starts to deadlocks all server threads. With 8-core CPU with 32GB of RAM and 16 server threads, Puma and Thin manage to serve 16 concurrent connections for about 5 seconds, and after that all threads starts to hangup. Server stops responding to any requests or signals.

I tried to put whole therubyracer code into separate thread, using mutex, but without luck. All server threads stops responding at random moments, before or after executing therubyracer code, sometimes even can lock in middle of puts :)

Ruby - 2.0.0-p195 x64
Rails - 4.0.0
therubyracer - 0.12.0
libv8 - 3.16.14.3 x86_64-linux

Problem occurs in any multithreaded server. Turning off multithreading and switch to multiprocessing resolves the problem completely.

@ignisf

This comment has been minimized.

Show comment
Hide comment
@ignisf

ignisf Aug 21, 2013

Collaborator

Hi, @Galmido, just to clarify if this is a regression in the new version that was recently released, did you experience this issue with therubyracer 0.11, too?

Collaborator

ignisf commented Aug 21, 2013

Hi, @Galmido, just to clarify if this is a regression in the new version that was recently released, did you experience this issue with therubyracer 0.11, too?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Aug 21, 2013

Yes, problem occurs with 0.11 too. But, with 0.11 it seems that every server worker can manage much more requests before hangups. With 0.12, single worker with 16 threads can respond to 100-500 requests before lock. It seems that number of threads in worker doesn't matter. 0.11 can respond to around 1000-2000 requests before lock.

Actually, I find a way to resolve the problem, but I still think that this is a bug. V8 doesn't hangs up when running directly in worker thread, and all of the code running under separate mutex. Additionally, directly after JS eval, context object must be disposed, otherwise thread can still hang, but this is very rarely. I ran 1 hour stress-test with complex view, evaluating 5 different doT.js templates and it seems, that this resolves the problem completely and managed to get 100% successful responses.

ghost commented Aug 21, 2013

Yes, problem occurs with 0.11 too. But, with 0.11 it seems that every server worker can manage much more requests before hangups. With 0.12, single worker with 16 threads can respond to 100-500 requests before lock. It seems that number of threads in worker doesn't matter. 0.11 can respond to around 1000-2000 requests before lock.

Actually, I find a way to resolve the problem, but I still think that this is a bug. V8 doesn't hangs up when running directly in worker thread, and all of the code running under separate mutex. Additionally, directly after JS eval, context object must be disposed, otherwise thread can still hang, but this is very rarely. I ran 1 hour stress-test with complex view, evaluating 5 different doT.js templates and it seems, that this resolves the problem completely and managed to get 100% successful responses.

@cowboyd

This comment has been minimized.

Show comment
Hide comment
@cowboyd

cowboyd Aug 22, 2013

Owner

The Ruby Racer is thread safe, but it is not guranteed deadlock free.

Like Ruby, V8 also has a global interpreter lock, and so only one thread can be executing code inside the V8 VM at a time. The answer is, I think, to release the GIL where possible when entering the V8 VM, and then reacquire it if and when V8 calls back into Ruby. This may be incompatible with the current strategy of acquiring the V8 lock from within Ruby code.

Owner

cowboyd commented Aug 22, 2013

The Ruby Racer is thread safe, but it is not guranteed deadlock free.

Like Ruby, V8 also has a global interpreter lock, and so only one thread can be executing code inside the V8 VM at a time. The answer is, I think, to release the GIL where possible when entering the V8 VM, and then reacquire it if and when V8 calls back into Ruby. This may be incompatible with the current strategy of acquiring the V8 lock from within Ruby code.

@mhoran mhoran referenced this issue in jasmine/jasmine-gem Dec 4, 2013

Closed

Jasmine server spinning. #179

@mindreframer

This comment has been minimized.

Show comment
Hide comment
@mindreframer

mindreframer Mar 30, 2014

I have a similar problem, here the gist with sourcecode:
https://gist.github.com/mindreframer/9879200

It freezes after 10-50 requests, when benchmarked with wrk.

I have tried:
executing within V8 VM Lock, like this

V8::C::Locker() do
  V8::C::HandleScope() do
    @cxt = V8::C::Context::New()
    begin
      @cxt.Enter()
      yield
    ensure
      @cxt.Exit()
    end
  end
end

and also dispose-ing the context after each use. Still deadlocks. Only hard-9-killing helps.... Some advices?? This is easily reproducible (on my system at least....)...

I have a similar problem, here the gist with sourcecode:
https://gist.github.com/mindreframer/9879200

It freezes after 10-50 requests, when benchmarked with wrk.

I have tried:
executing within V8 VM Lock, like this

V8::C::Locker() do
  V8::C::HandleScope() do
    @cxt = V8::C::Context::New()
    begin
      @cxt.Enter()
      yield
    ensure
      @cxt.Exit()
    end
  end
end

and also dispose-ing the context after each use. Still deadlocks. Only hard-9-killing helps.... Some advices?? This is easily reproducible (on my system at least....)...

@cowboyd

This comment has been minimized.

Show comment
Hide comment
@cowboyd

cowboyd Apr 3, 2014

Owner

I think this could be a locking issue with the order in which the various VM locks are acquired and released. Currently, to call JS code, the sequence is:

Lock(V8) -> Unlock(Ruby) -> Call(JS)

If there is a context switch after V8 is locked, but before Ruby is unlocked, then it could result in deadlock.

Instead, it really ought to be:

Unlock(Ruby) -> Lock(V8) -> Call(JS)

The situation is worse for calling back out to Ruby from JavaScript. It looks like this currently:

Lock(V8) -> Unlock(Ruby) -> Call(JS) -> Lock(Ruby) -> Unlock(V8) -> Call(Ruby)

and should be:

Lock(V8) -> Unlock(Ruby) -> Call(JS) -> Unlock(V8) -> Lock(Ruby) -> Call(Ruby)

The reason for this is that for code like:

V8::C::Locker() do
V8::C::HandleScope() do
@cxt = V8::C::Context::New()
begin
@cxt.Enter()
yield
ensure
@cxt.Exit()
end
end
end

For this to work, V8 is locked from Ruby, so Ruby must needs be locked at the same time. To fix this it means moving all of the locking code out of Ruby and into C++
On March 30, 2014 at 3:33:09 PM, Roman H (notifications@github.com) wrote:

I have a similar problem, here the gist with sourcecode:
https://gist.github.com/mindreframer/9879200

It freezes after 10-50 requests, when benchmarked with wrk.

I have tried:
executing within V8 VM Lock, like this

V8::C::Locker() do
V8::C::HandleScope() do
@cxt = V8::C::Context::New()
begin
@cxt.Enter()
yield
ensure
@cxt.Exit()
end
end
end
and also dispose-ing the context after each use. Still deadlocks. Only hard-9-killing helps.... Some advices?? This is easily reproducible (on my system at least....)...


Reply to this email directly or view it on GitHub.

Owner

cowboyd commented Apr 3, 2014

I think this could be a locking issue with the order in which the various VM locks are acquired and released. Currently, to call JS code, the sequence is:

Lock(V8) -> Unlock(Ruby) -> Call(JS)

If there is a context switch after V8 is locked, but before Ruby is unlocked, then it could result in deadlock.

Instead, it really ought to be:

Unlock(Ruby) -> Lock(V8) -> Call(JS)

The situation is worse for calling back out to Ruby from JavaScript. It looks like this currently:

Lock(V8) -> Unlock(Ruby) -> Call(JS) -> Lock(Ruby) -> Unlock(V8) -> Call(Ruby)

and should be:

Lock(V8) -> Unlock(Ruby) -> Call(JS) -> Unlock(V8) -> Lock(Ruby) -> Call(Ruby)

The reason for this is that for code like:

V8::C::Locker() do
V8::C::HandleScope() do
@cxt = V8::C::Context::New()
begin
@cxt.Enter()
yield
ensure
@cxt.Exit()
end
end
end

For this to work, V8 is locked from Ruby, so Ruby must needs be locked at the same time. To fix this it means moving all of the locking code out of Ruby and into C++
On March 30, 2014 at 3:33:09 PM, Roman H (notifications@github.com) wrote:

I have a similar problem, here the gist with sourcecode:
https://gist.github.com/mindreframer/9879200

It freezes after 10-50 requests, when benchmarked with wrk.

I have tried:
executing within V8 VM Lock, like this

V8::C::Locker() do
V8::C::HandleScope() do
@cxt = V8::C::Context::New()
begin
@cxt.Enter()
yield
ensure
@cxt.Exit()
end
end
end
and also dispose-ing the context after each use. Still deadlocks. Only hard-9-killing helps.... Some advices?? This is easily reproducible (on my system at least....)...


Reply to this email directly or view it on GitHub.

@phstc

This comment has been minimized.

Show comment
Hide comment
@phstc

phstc Oct 13, 2014

I was having the same problem. My shoryuken processes were getting stuck when the number of simultaneously active threads was high.

My solution was a singleton class + a mutex to process our JavaScript. I'm no longer getting my processes stuck and I didn't notice any performance difference.

any performance difference is probably because what @cowboyd mentioned before:

Like Ruby, V8 also has a global interpreter lock, and so only one thread can be executing code inside the V8 VM at a time

class JavaScript
  include Singleton

  def initialize
    @mutex = Mutex.new
  end

  def transform(source, payload, timeout = 5000)
    @mutex.synchronize do
      transform!(source, payload, timeout)
    end
  end

  private

  def transform!(source, payload, timeout)
    context = V8::Context.new timeout: timeout  
    # ...
  end
end

# sample test:
# N.times.map do 
#   Thread.new do
#     JavaScript.instance.transform('...', '...')
#   end
# end.each(&:join)

@cowboyd any ideas why did that work as therubyracer is thread safe and it has a GIL?

phstc commented Oct 13, 2014

I was having the same problem. My shoryuken processes were getting stuck when the number of simultaneously active threads was high.

My solution was a singleton class + a mutex to process our JavaScript. I'm no longer getting my processes stuck and I didn't notice any performance difference.

any performance difference is probably because what @cowboyd mentioned before:

Like Ruby, V8 also has a global interpreter lock, and so only one thread can be executing code inside the V8 VM at a time

class JavaScript
  include Singleton

  def initialize
    @mutex = Mutex.new
  end

  def transform(source, payload, timeout = 5000)
    @mutex.synchronize do
      transform!(source, payload, timeout)
    end
  end

  private

  def transform!(source, payload, timeout)
    context = V8::Context.new timeout: timeout  
    # ...
  end
end

# sample test:
# N.times.map do 
#   Thread.new do
#     JavaScript.instance.transform('...', '...')
#   end
# end.each(&:join)

@cowboyd any ideas why did that work as therubyracer is thread safe and it has a GIL?

@phstc phstc referenced this issue in mperham/sidekiq Oct 13, 2014

Closed

sidekiq randomly(?) hangs. #547

@mperham mperham referenced this issue in mperham/sidekiq Mar 25, 2015

Closed

Stuck process #2249

flurin added a commit to DigitPaint/roger_babeljs that referenced this issue May 12, 2015

@cowboyd cowboyd added release-1.0 and removed release-1.0 labels Jul 12, 2015

@D-side D-side referenced this issue in mperham/sidekiq Feb 1, 2016

Closed

Sidekiq process disappearing from ui #2773

@ndushay ndushay referenced this issue in sul-dlss/sul-requests Dec 1, 2016

Open

Multiple item approvals overriding each other #681

@brauliobo

This comment has been minimized.

Show comment
Hide comment
@brauliobo

brauliobo Sep 5, 2017

same here when only using V8::Context

same here when only using V8::Context

@brauliobo

This comment has been minimized.

Show comment
Hide comment
@brauliobo

brauliobo Sep 5, 2017

related to #432?

related to #432?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment