Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server becomes unresponsive #61

Open
dhm116 opened this issue May 15, 2013 · 20 comments
Open

Server becomes unresponsive #61

dhm116 opened this issue May 15, 2013 · 20 comments
Labels

Comments

@dhm116
Copy link

dhm116 commented May 15, 2013

I'm running into some really strange behavior between a python zerorpc server and a zerorpc-node client - there are no logs or errors to help me diagnose why this happens, but when I use the streaming feature, after the client receives all of the messages, the server will simply stop accepting any new requests. Clients can connect fine and attempt to send messages, but they will always time out until I restart the server.

I had to end up using the streaming feature because there are some potentially (very) long running processes, however this isn't a simple timeout issue, although I am also occasionally encountering the same issue mentioned in #37.

I don't see any obvious starting points to help troubleshoot this other than putting log statements throughout the zerorpc code, but I thought I'd open this issue in case this has been encountered before and there is a solution.

@bombela
Copy link
Member

bombela commented May 17, 2013

Hello,

Can you try to reproduce the problem with the least amount of code? And in this case, can you share it with us?

When you say that the server will stop accepting new requests, it sounds to me like your process is waiting in some coroutine (gevent green thread blabla) forever, thus never switching to the other coroutine, for example, the one used by zerorpc to handle new connections.

So make sure that the task that you are executing are not blocking the process, but only YIELDING coroutines. Do you have any disk IOs or some non-gevent compatible database driver for example?

Best,
fx

@dhm116
Copy link
Author

dhm116 commented May 22, 2013

I really apologize for opening an issue without being able to provide a repeatable example of the issue - I knew it was a long shot.

I ended up just writing a simple process monitor that will restart the ZeroRPC server if it becomes unresponsive. It's not an ideal fix, as I'd still like to discover what was going wrong in the first place, however it seems to serve its purpose quite well so far.

@dhm116 dhm116 closed this as completed May 22, 2013
@dschwertfeger
Copy link

dschwertfeger commented Jan 10, 2018

I realize this is an old issue but I've come across the same problem and I do have a minimal reproducible example.

# server.py

import zerorpc


class MyServer(object):

    @zerorpc.stream
    def streaming_range(self, fr, to, step=1):
        return range(fr, to, step)


if __name__ == '__main__':
    server = zerorpc.Server(MyServer())
    server.bind('tcp://127.0.0.1:1234')
    server.run()
# client.py

import zerorpc

client = zerorpc.Client()
client.connect("tcp://127.0.0.1:1234")

for item in client.streaming_range(0, 200):
    print(item)
// client.js

const zerorpc = require('zerorpc');

const client = new zerorpc.Client();
client.connect('tcp://127.0.0.1:1234');

client.invoke('streaming_range', 0, 200, (error, res, more) => {
  console.log(res);
});

Try this first:

  • python server.py
  • python client.py
  • python client.py
  • ...

You can do this as many times as you want. No problem there.

The problem seems to be specific to the node-client:

  • python server.py
  • node client.py -> works fine
  • node client.py -> don't get any results,HeartbeatError :(

It works exactly once. No response from the server for any additional requests.

Interestingly, requesting a smaller range through the node-client works just fine. Change the range in client.js to, say, (0, 100) and try this:

  • python server.py
  • node client.py -> works
  • node client.py -> works
  • node client.py -> works

But for such small datasets we wouldn't need streaming, would we?

Conversely, I can crank up the range in the python-client without any issues.

Any hints to what's going on and how to fix this would be greatly appreciated!

@dschwertfeger
Copy link

dschwertfeger commented Jan 16, 2018

Hi @bombela,

Does it make sense to discuss this (see above) in this old and closed issue or should I open a new one?

@bombela bombela reopened this Jan 16, 2018
@bombela
Copy link
Member

bombela commented Jan 16, 2018

thank you for this reproducible test. It fails exactly as you described. I will allocate some time into the problem.

@dschwertfeger
Copy link

Thanks for the update, @bombela. Did you get a chance to look into this yet?

@dhm116
Copy link
Author

dhm116 commented Jan 23, 2018

Is it possible that the single-threaded nature of node is causing the heartbeats to miss and the server consider this client as disconnected?

@bombela
Copy link
Member

bombela commented Jan 23, 2018 via email

@dschwertfeger
Copy link

@bombela, Thanks for looking into this.

I'd really need to be able to use the node-client in combination with streaming responses but I don't have the resources to dig deeper into this myself.

  • Is this something that you could easily fix?
  • Does this need more investigation?
  • Is this an issue for other people, too? (I'd imagine I'm not the only one using the node-client + streaming)

How do you think we should proceed from here?

@bombela
Copy link
Member

bombela commented Feb 1, 2018

It has to be a regression, because in 2012 I was using infinite streams from python -> node with no problem. The cross-language integration testing are also not failing. And it fails after a specific number of events, that changes between your machine and mine.

The only thing I can think of doing, is spend few evening on it, debugging carefully until I understand exactly what is happening! Will try to look into it this week. But as usual, no promise.

@dschwertfeger
Copy link

Hi @bombela, just floating this to the top of your inbox in case you forgot about it. Thanks, David

@bombela
Copy link
Member

bombela commented Mar 11, 2018

I looked at it 2 weeks ago, but I couldn't figure out the problem. This looks like a nasty interaction between many things. Could be the zmq python layer, zmq itself, some logic in either zerorpc-python or zerorpc-node...

@dschwertfeger
Copy link

As the maintainer, what do you suggest to do next?

@bombela
Copy link
Member

bombela commented Mar 12, 2018

I suggest the maintainer moves his ass and fixes the problem :D

Joke aside, I spent some time this weekend on it, and found out the following:

As soon as a nodejs client consuming a stream terminates, the python server is frozen.

This means, I can start a nodejs client that streams for a while. Then connect as many python client streaming as I want. They can all go theirs merry way, disconnect, connect back and so on. Until the first nodejs client terminates. Then everything is frozen.

I am going to look into head of line blocking in the zmq router socket on the server.

@umsiw
Copy link

umsiw commented Apr 20, 2018

Any update on this issue? is this still a problem?
thanks before

@bombela
Copy link
Member

bombela commented Apr 20, 2018 via email

@Prgrmman
Copy link

Prgrmman commented Jan 29, 2019

I was able to come up with a workaround to this problem @bombela .

My team hit the problem earlier today. We had a very long streaming request in our python server implemented using an iterator.

I added gevent.sleep(0) after each yield statement, and that seemed to solve the problem.

For python functions that return especially long iterators, I think the streaming code tries to return each response to the client without giving up control to other gevent code (Greenlets)

This blocks other RPCs, and also blocks heartbeat messages.

@bombela bombela added the bug label Mar 24, 2023
@KGKnutson
Copy link

@Prgrmman, Where did you add the gevent.sleep(0)? I think I am encountering this same issue.

@Prgrmman
Copy link

Gosh haha 4 years ago and I moved to a different project.
What I did was define a custom decorator that wrapped the zerorpc.stream decorator.
The function passed in returns a generator, so basically I create my own generator by yielding each request from the supply generator but between each request I put a sleep statement.

@KGKnutson
Copy link

Thanks! I'll experiment with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants