Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Exceptions when a TCPSocket connection timed out in and out of the Reactor using Kernel.timeout #97

Open
HoneyryderChuck opened this issue Jan 30, 2014 · 19 comments
Milestone

Comments

@HoneyryderChuck
Copy link

I'm fighting against the non-establishment of connections using Celluloid::IO::TCPSocket s. Currently my issue is with connections timing out.

Let's say I have an "timeoutable" hostname.

Celluloid::IO::TCPSocket.new("timeoutable", 23)
(10 seconds later...)
Errno::ETIMEDOUT: Connection timed out - connect(2)

Which is consistent with what happens when using a regular TCPSocket.

Now, when I do that inside an actor...

Celluloid A
  include Celluloid::IO
  def establish
    timeout(5) do
      @tcp = TCPSocket.new("timeoutable", 23)
    end
  end
end
a.establish

It raises a Celluloid::Task::TimeoutError, which makes sense (but it kills the actor).

Now, other case:

Celluloid A
  include Celluloid::IO
  def establish
    Kernel.send(:timeout, 5) do
      @tcp = TCPSocket.new("timeoutable", 23)
    end
  end
end
a.establish

This one raises an "execution expired" exception coming from the reactor. And kills the actor.

So, there are two issues for me. Let's say I have an actor whose task is to perform a task on x remote devices. Being that for some I cannot connect on time, how can I rescue the connection timeout and not kill the actor in the process, but simple handle the logic, mark the remote location as unreachable and move on with my actor life? The way I tested, I can rescue Errno::ETIMEDOUT and Celluloid::Task::TimeoutError on the method and keep the actor alive. But I cannot rescue the third one, because that exception is thrown after the actor has been killed, so no way to keep him alive.

The second is regarding that last actor. As you have seen, I have used Kernel.timeout inside the reactor loop, which clearly doesn't play along very well. I know your first question is "uh, why you used Kernel.timeout instead of Actor#timeout as in the second example?", but I'm using a wrapper whom I pass a proxy. This wrapper doesn't know anything about Celluloid, hence it calls normal. timeout. i don't have any influence. As I've seen, I can only use this timeout inside the actor (an instance method, therefore). How could I access the current actor and call the #timeout method in such a case?

I hope the second question can lead to an answer of the first one :)

@tarcieri
Copy link
Member

tarcieri commented Feb 4, 2014

Let's say I have an actor whose task is to perform a task on x remote devices. Being that for some I cannot connect on time, how can I rescue the connection timeout and not kill the actor in the process, but simple handle the logic, mark the remote location as unreachable and move on with my actor life? The way I tested, I can rescue Errno::ETIMEDOUT and Celluloid::Task::TimeoutError on the method and keep the actor alive. But I cannot rescue the third one, because that exception is thrown after the actor has been killed, so no way to keep him alive.

What do you mean by "I cannot rescue the third one"? You only named two exceptions, and really the only one you need worry about is Celluloid::Task::TimeoutError.

If we can figure out a solution to #56 (which is actually needed for the timeout-based approach to work), let me tell you a great solution to this problem: Celluloid::FSM

With Celluloid::FSM (which is, unfortunately, poorly documented) you can specify a timeout for a given state after which the FSM automatically transitions to another state, which can automatically run some code after the state transition.

This makes it easy to build workflows for attempting something which will either induce a state transition when it completes, or setting a timeout which transitions to a different state which can handle the timeout error.

No exception juggling required, just pure FSMs ;)

@HoneyryderChuck
Copy link
Author

The "third one" was the third example I described, that is the expection caused by Kernel.timeout inside the actor when it actually times out (this actually stops the reactor and kills the actor). I know you probably are against using Kernel.timeout inside the actor, but in such cases I sometimes don't have a choice, the timeout call is already there :)

I will check out the FSM solution and I'll get back at you with more info about the experience. But I still think this is an important use case, concerning tcpsocket injection in an already existing library. Imagine such a library like Net::SSH or Net::Telnet have some logic inside using Kernel. timeout. They might be using the Celluloid::IO::TCPSocket, but apart from that, they're using regular sleep and timeout there, right?

@tarcieri
Copy link
Member

tarcieri commented Feb 4, 2014

Okay, here's a vaporware, strawman solution ;)

https://github.com/celluloid/timeout

@tarcieri
Copy link
Member

tarcieri commented Feb 5, 2014

@TiagoCardoso1983 I'm not sure it makes sense for Celluloid to try to handle Timeout::Error internally. I think the better approach is to try to get some changes into the upstream libraries that can use a more extensible API, which is backed by Kernel#timeout by default, but supports other backends.

That's probably not what you want to hear, but that's my idea for the vaporware timeout gem. I'd like to extend the Timeout module with pluggable backends, one of which can be purpose-made for Celluloid.

@HoneyryderChuck
Copy link
Author

Yup, I am from the same opinion as you. From what I understood the Timeout::timeout call runs the block in a separate thread, and that is not very compatible with the reactor/mailbox/scratchoneofthem.

Timeout::timeout receives as second parameter the exception to raise. My first short term solution proposal was to raise Celluloid::TimeoutError instead of TimeoutError. Execution stop at the libev level :S :

 ruby: ../libev/ev.c:3274: ev_run: Assertion `("libev: ev_loop recursion during release detected", ((loop)->loop_done) â
!= 0x80)' failed.

So yes, that will also not help... I guess the best approach is, like you said, to "overwrite" the timeout primitive, that is, port the Celluloid::Actor#timeout code to the separate timeout gem and the kernel. Basically it has to check whether it is within the actor or not, and then fire the appropriate timeout, the only difference being, instead of calling Kernel.timeout, it calls Timeout.timeout. I'll try that one, but that'd be my proposal for the vaporware.

@tarcieri
Copy link
Member

tarcieri commented Feb 5, 2014

That's actually a decent enough workaround, and the same approach Celluloid uses elsewhere (e.g. Celluloid::IO implicitly hijacks TCPSocket)

@tarcieri
Copy link
Member

tarcieri commented Feb 5, 2014

Oh wait, you're redefining Kernel.timeout. Yeah, I take that back ;)

This is why I was proposing the timeout gem, although I was hoping to avoid core extensions, and see if we can get other gems to adopt a different API that supports pluggable timeout backends.

@HoneyryderChuck
Copy link
Author

Eheh, it seems redefining Kernel.timeout doesn't work as expected, it is included in the global namespace before and doesn't take the desired effect, hence I'm redefining the global namespace timeout, which is used everywhere.

Yes, I'm from your opinion, and an awful lot of gems already use Timeout::timeout. But Net::Telnet doesn't, and it is core ruby...

The concept I'm trying to work on is: on load, define my new timeout primitive, hijack the global namespace timeout, insert it in my primitive namespace and make it work as a fallback. After that, redefine the global namespace timeout as my new timeout primitive. Haven't been quite successful on that yet. Seems that, by unbinding a method and binding it somewhere else, it still suffer from changes you make to the original object. Is there something such as method duplicates?

@HoneyryderChuck
Copy link
Author

Here's my second approach:

require 'timeout'
old_timeout = method(:timeout).unbind

# first step: define new implementation of timeout
module  Celluloid
  module Timeout
    @old_timeout = method(:timeout).to_proc
    # celluloid-aware timeout implementation, falls back to the latest implementation of it
    def self.celluloid_timeout(duration, klass=nil, &blk)
      Celluloid::actor? ? 
      Thread.current[:celluloid_actor].timeout(duration, &blk) : 
       @old_timeout.call(duration, klass, &blk)
    end
  end
end

# second step: redefine Kernel.timeout
private
def timeout(*args, &blk)
  Celluloid::Timeout.celluloid_timeout(*args, &blk)
end

So, I take the global namespace on load and use it as the fallback when not inside celluloid actor. By making it a proc, I assure that by redefinitions it will not affect this instance. And then I (gulp...) redefine global namespace timeout. What do you think? I know, I know, redefining timeout sucks, but how can you get all gem APIs to abide to a new namespace in short notice? Celluloid would need this solution now, right?

@tarcieri
Copy link
Member

tarcieri commented Feb 7, 2014

I don't want Celluloid itself to define Kernel#timeout, but I wouldn't be opposed to the vaporware "timeout" gem doing it in a generic way that things like Celluloid can plug into

@HoneyryderChuck
Copy link
Author

Exactly, my implementation is a possible implementation of the vaporware "timeout" gem. How could a timeout-functionalitz plug in API would look like?

@tarcieri
Copy link
Member

tarcieri commented Mar 4, 2014

I'd probably do something like this:

class Thread
  attr_accessor :timeout_handler
end

def timeout(time, &block)
  if timeout_handler = Thread.current.timeout_handler
    timeout_handler.call(timeout, &block)
  else
    Timeout.timeout(time, &block)
  end
end

@HoneyryderChuck
Copy link
Author

Does indeed look interesting. I assume you would pass the routine:

Celluloid::actor? ? 
Thread.current[:celluloid_actor].timeout(duration, &blk) : 
@old_timeout.call(duration, klass, &blk)

or something similar to the timeout_handler then. When would that make sense? here?

@tarcieri
Copy link
Member

tarcieri commented Mar 5, 2014

Celluloid can integrate with the (optional) gem directly, and initialize a custom timeout handler if the gem is loaded. I don't want to make it an explicit dependency though... more something someone like you who knows what they're doing can optionally include to solve this class of problems.

@HoneyryderChuck
Copy link
Author

I like the sound of it. The gem should definitely be an optional dependency. The gem should therefore "inject" the integration (in the snippet I mentioned, or maybe somewhere else) in Celluloid itself.

@tarcieri
Copy link
Member

tarcieri commented Mar 7, 2014

It could work that way if Celluloid had a hook for this stuff. Right now it doesn't

@HoneyryderChuck
Copy link
Author

Somehow related: #101

I guess, even using Celluloid::Actor#timeout, the descriptors are not being removed from the reactor selector.

@ioquatix
Copy link
Contributor

Is this still a problem?

@HoneyryderChuck
Copy link
Author

yup, at least until the timeout extensions are integrated in celluloid.

@digitalextremist digitalextremist modified the milestone: 0.18.0 Aug 9, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants