Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Shutdown Does Not Complete #150

Merged
merged 1 commit into from Apr 16, 2013

Conversation

halorgium
Copy link
Contributor

When terminating actors externally, they seem to "stop what they're doing" but they don't return, which can prevent further code from running.

Note: This is on JRuby 1.7.2 and Celluloid 0.12.4.

The following example kills the loop as expected, but it never returns:

class LOL
  include Celluloid

  def run
    loop { puts "run loop"; sleep 0.1 }
  end  
end  

l = LOL.new
l.after(3) { l.terminate }
l.run
puts "…"

Whereas calling #terminate internally functions as expected:

class LOL
  include Celluloid

  def run
    after(3) { terminate }
    loop { puts "run loop"; sleep 0.1 }
  end  
end  

l = LOL.new
l.run
puts "…"

@halorgium
Copy link
Contributor

This will be because the terminate invocation in the first example is doing a SyncCall to the actor.

D, [2013-03-16T22:38:29.485354 #82131] DEBUG -- : Terminating 5 actors...
E, [2013-03-16T22:38:39.486165 #82131] ERROR -- : Couldn't cleanly terminate all actors in 10 seconds!
E, [2013-03-16T22:38:39.486912 #82131] ERROR -- : LOL crashed!
RuntimeError:
    celluloid/lib/celluloid/tasks/task_fiber.rb:50:in `resume'
    celluloid/lib/celluloid/tasks/task_fiber.rb:50:in `resume'
    celluloid/lib/celluloid/actor.rb:419:in `task'
    celluloid/lib/celluloid/actor.rb:291:in `block in after'
    timers-1.0.1/lib/timers.rb:87:in `call'
    timers-1.0.1/lib/timers.rb:87:in `fire'
    timers-1.0.1/lib/timers.rb:44:in `fire'
    celluloid/lib/celluloid/actor.rb:186:in `run'
    celluloid/lib/celluloid/actor.rb:171:in `block in initialize'
    celluloid/lib/celluloid/thread_handle.rb:12:in `block in initialize'
    celluloid/lib/celluloid/internal_pool.rb:55:in `call'
    celluloid/lib/celluloid/internal_pool.rb:55:in `block in create'

@halorgium
Copy link
Contributor

OK, after running this on a more recent version of Celluloid with insane debugging, I found that when you call terminate on the ActorProxy it is joining with the actor.thread.
https://github.com/celluloid/celluloid/blob/master/lib/celluloid/proxies/actor_proxy.rb#L89

This causes major problems because the actor thread is unable to continue to operate.
@tarcieri, does this make sense?

One solution to this is to use the terminate! method as you are waiting for the actor to terminate via the l.run invocation.

@jjyr
Copy link

jjyr commented Apr 13, 2013

And another problem is you call run and terminate both by sync call. this mean run should return value. and the sync-version terminate should block the thread, the run Fiber can't resume correct.

@jjyr
Copy link

jjyr commented Apr 13, 2013

seems no good way to sync terminate

@halorgium
Copy link
Contributor

@jjyr it is possible, the current implementation of the actor stops handling messages when terminate is called.
You are incorrect about the terminate blocking the thread. It only pretends to block the thread.
Remember we use Fibers or Threads to simulate synchronicity.

@jjyr
Copy link

jjyr commented Apr 14, 2013

@halorgium yes, i'm wrong, when terminate timer will never be fired

@halorgium
Copy link
Contributor

@jjyr I am not sure I understand what you mean.
The blocking run call is completely unrelated to this bug. Note the async.run.

require 'celluloid'

class LOL
  include Celluloid

  def run
    loop { puts "run loop"; sleep 0.1 }
  end
end

l = LOL.new
l.after(1) { l.terminate }
l.async.run
puts "..."
sleep

The timer does fire, but then the terminate call blocks the main actor loop.

Here is the backtrace of the actor thread.

2.0.0-p0/lib/ruby/2.0.0/thread.rb:72:in `sleep'
2.0.0-p0/lib/ruby/2.0.0/thread.rb:72:in `block (2 levels) in wait'
2.0.0-p0/lib/ruby/2.0.0/thread.rb:68:in `handle_interrupt'
2.0.0-p0/lib/ruby/2.0.0/thread.rb:68:in `block in wait'
2.0.0-p0/lib/ruby/2.0.0/thread.rb:66:in `handle_interrupt'
2.0.0-p0/lib/ruby/2.0.0/thread.rb:66:in `wait'
celluloid/lib/celluloid/thread_handle.rb:35:in `block in join'
celluloid/lib/celluloid/thread_handle.rb:35:in `synchronize'
celluloid/lib/celluloid/thread_handle.rb:35:in `join'
celluloid/lib/celluloid/actor.rb:133:in `join'
celluloid/lib/celluloid/proxies/actor_proxy.rb:89:in `terminate'
external.rb:15:in `block in <main>'
celluloid/lib/celluloid/tasks.rb:48:in `block in initialize'
celluloid/lib/celluloid/tasks/task_fiber.rb:12:in `block in create'

The actor has 2 tasks running at this stage.

#<Celluloid::TaskSet:0x007f863a60b6d0
 @tasks=
  #<Set: {#<Celluloid::TaskFiber:0x3fc31d090e64 @type=:call, @status=:sleeping>,
  #<Celluloid::TaskFiber:0x3fc31d37b3e0 @type=:timer, @status=:running>}>
>

@jjyr
Copy link

jjyr commented Apr 14, 2013

@halorgium Yes, you're right.
i think the actor thread blocked at this line
and i find to add a condition

def join
  @mutex.synchronize do
    if @thread == Thread.current
      raise 'deadlock'
    elsif @thread
      @join.wait(@mutex)
    end
  end
  self
end

can fix it, but i not really known why the old version crashed and never return

@halorgium
Copy link
Contributor

@jjyr in most situations, the join is useful.
Anyway, we'll have a look at this considering #212 too.

@tarcieri
Copy link
Member

#terminate provides synchronous behavior

#terminate! provides asynchronous behavior

Does #terminate! suit your needs?

On Sun, Apr 14, 2013 at 1:55 AM, Tim Carey-Smith
notifications@github.comwrote:

@jjyr https://github.com/jjyr in most situations, the join is useful.
Anyway, we'll have a look at this considering #212https://github.com/celluloid/celluloid/issues/212too.


Reply to this email directly or view it on GitHubhttps://github.com//issues/150#issuecomment-16348006
.

Tony Arcieri

@halorgium
Copy link
Contributor

@tarcieri are we able to solve the underlying issue here?

@tarcieri
Copy link
Member

@halorgium if I understand correctly, the issue is with self-referential joins? If so, Celluloid should probably raise in that case rather than deadlocking.

@halorgium
Copy link
Contributor

Do you mean the Actor thread is joined to itself?
I think I understand what @jjyr was doing now!

I had thought that ruby did not allow such a behaviour!
Is there a use for this case?

@tarcieri
Copy link
Member

There is not. The Ruby behavior is:

1.9.3p327 :001 > Thread.current.join
fatal: deadlock detected

I am not sure how @jjyr is affecting a self-referential join (or if that is really what's happening here), but it should definitely raise some kind of exception if it happens.

@halorgium
Copy link
Contributor

We do our own join...

tarcieri added a commit that referenced this pull request Apr 16, 2013
External Shutdown Does Not Complete
@tarcieri tarcieri merged commit 1433895 into celluloid:master Apr 16, 2013
@coveralls
Copy link

Coverage remained the same when pulling 0dea952 on halorgium:no-self-joins into b548842 on celluloid:master.

View Details

@jjyr
Copy link

jjyr commented Apr 17, 2013

The strange is the actor thread should be blocked forever(because self-join), but it crashed in @gf3 's code

@halorgium
Copy link
Contributor

@jjyr there was several changes to Celluloid which modified the behaviour.
This is probably the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants