-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with long-lasting RPC calls #37
Comments
I got a chance to pull this closer to what I'm experiencing. Server: import zerorpc
import time
from gevent import monkey; monkey.patch_all()
class Foo(object):
def wait(self, x):
print "Start waitin %s" % x
time.sleep(15)
print "End waitin %s" % x
return "derp %s" % x
s = zerorpc.Server(Foo(), pool_size=2)
s.bind("tcp://0.0.0.0:3333")
s.run() Client: import zerorpc
c = zerorpc.Client("tcp://0.0.0.0:3333", timeout=3000)
work = ["a", "b", "c", "d"]
futures = [c.wait(x, async=True) for x in work]
[future.get() for future in futures] Result: Same errors as above, but an additional "LostRemote: Lost remote after 10s heartbeat" on the client side before the server can complete and send the result, even though time is monkeypatched! Also, the server keeps continuing on to c and d even though the error comes on the client side after a and b:
So it looks like at least the tasks are getting submitted correctly, but the server is oblivious of the client disconnection. UPDATE: While messing around, I found that removing async fixes the problem, I figured it was because now the code does only one request at a time. Then, this led me to mess around with pool size and it turns out that if I leave async in but remove the pool size limit, the LostRemote doesn't happen! Am I missing some basic assumption about how this is supposed to work? |
Hi, So as you said in your first example, using time.sleep() without monkey patching, sleep the whole process, thus freezing it to for any asynchronous IO activities. The client then doesn't receive any heartbeat from the server anymore and complain after 10s (default heartbeat frequency is 5s, and it abort after 2 missing heartbeats). If you really want to do stuff like, you can disable the heartbeat on both sides (heartbeat=None) in both the constructor's parameters of the server and client. But this is probably not what you want, you want to use a monkey patched or more specifically, a gevent compliant version of subprocess. I cant help you much with gevent 1.0 since zerorpc was never tested against it (still using the version available on pypi). I believe gevent 1.0 ship a gevent friendly version of subprocess. Else, you can also try with [pip install gevent_subprocess - https://github.com/bombela/gevent_subprocess] for gevent < 1.0. (pip install gevent_subprocess). In your second comment, initially you limited the pool_size to 2 possible concurrent requests. When you call 4 times wait() in a quick sequence, only two call can be processed right away, and both take 15s to process. Meanwhile, the two pending calls are still not connected, and the client's heartbeat give up after 10s. Intuitively, it would make more sens for the 2 pending requests to wait until the server can accept a new requests (at least, until the 30 timeout kick in). But because for the moment there is one heartbeat per requests (!!! backward compatibly !!!), the server will not start heart-beating until a request is being processed. This discrepancy will be fixed at some point, we anyway need to fix it here at dotCloud to be able to go further in load-balancing strategy (and it will be fixed with respect to backward-compatibility, so everything will be able to speak happily to each others). For the moment, if its a really big problem for you, you can disable the heartbeat on both sides (but then streaming can't be used anymore). Regards, |
Hey, Yeah, I'd expected the server to have a task queue of sorts and the heartbeat being per-server instead of per-request. As a workaround, I'll just have to limit how many tasks I send at a time instead of dumping all the tasks in at once and letting the server handle the limiting. Meanwhile, I'll leave the report open if that sounds fine to you. If you need a hand at fixing the way heartbeats work, I'd be glad to have a go at it at some point. Cheers, |
I am also looking forward into this. But zeromq seems to be slowing down in development . Not any sensible update for a month already. Real updates are 2 months ago.. ZeroRPC is major product and thing that making dotcloud proud right? |
If it makes you feel better, I took over the maintenance of zerorpc recently. Hopefully I can keep it maintained, and eventually move it forward. |
Test case for Reproduction:
and the client:
The result for me on the server is:
while on the client I get "derp" fine. Without the monkey patch, predictably, the connection is lost due to lack of heartbeat because sleep blocks.
This is a simplification of a larger bug I've been having when dealing with a bunch of workers that all call subprocess to spawn an external executable and do something that takes a while. (With the newest gevent from trunk, there is a subprocess monkey patch).
Any ideas? Zerorpc 0.2.1, gevent 1.0.
The text was updated successfully, but these errors were encountered: