Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master hanging on "write" sys call #208

Closed
ninjaahhh opened this issue Nov 21, 2018 · 2 comments · Fixed by #542
Closed

master hanging on "write" sys call #208

ninjaahhh opened this issue Nov 21, 2018 · 2 comments · Fixed by #542
Assignees
Labels
bug Something isn't working

Comments

@ninjaahhh
Copy link
Contributor

when running the cluster (testnet201.bootstrap), after mining for a certain period of time, the jsonrpc server became unresponsive for all requests. strace shows following results

$ ps aux | grep py
...
root     15480  0.3  5.0 2049216 1605800 ?     Sl   Nov20   8:17 pypy3 master.py --cluster_config=/code/cluster_config.json
root     15481  0.8  1.5 788468 484300 ?       Sl   Nov20  22:27 pypy3 slave.py --cluster_config=/code/cluster_config.json --node_id=S0
...
$ sudo strace -p 15480 -s 10000
strace: Process 15480 attached
write(2, "I1121 00:26:53.642705 jsonrpc.py:408] {\"jsonrpc\": \"2.0\", \"method\": \"getWork\", \"params\": [\"0x1\"], \"id\": 433}\n", 108
<ctrl+c>
$ sudo strace -p 15481 -s 10000
strace: Process 15481 attached
epoll_wait(3,
<ctrl+c>

so slave server looks fine, but master is hanging on writing the jsonrpc log to strerr (file descriptor 2).

it also points to line 408

async def __handle(self, request):
request = await request.text()
Logger.info(request)

where in turn it calls

@classmethod
def info(cls, msg):
cls.check_logger_set()
cls._qkc_logger.info(msg)

need to find out why the write call is blocked.

@ninjaahhh ninjaahhh added the bug Something isn't working label Nov 21, 2018
@ninjaahhh ninjaahhh self-assigned this Nov 21, 2018
@ninjaahhh
Copy link
Contributor Author

ninjaahhh commented Nov 21, 2018

PROGRESS:

doing some searches online

  1. https://stackoverflow.com/questions/37385755/freeze-when-writing-to-stderr
  2. Process in container stuck when writing log to stderr moby/moby#31540 (comment)

and sudo tail -f /proc/15480/fd/2 helps unblock.

@qcdll
Copy link
Contributor

qcdll commented Nov 21, 2018

https://github.com/QuarkChain/pyquarkchain/blob/master/quarkchain/cluster/cluster.py#L37
stderr also goes to stdout, and they should all be handled by cluster.py print_output and printed to stdout... as we discussed one suspect may be the docker supervisor failing to tail it

ninjaahhh added a commit that referenced this issue Apr 30, 2019
1. When writing logs using `Logger.info` (say, logging JSONRPC
   requests)
2. Master or slave process calls `asyncio.StreamReader.readline` to get
   the log out
3. However the stream has a buffer limit 64k and will throw exception if
   reached, hence the printing coroutine will crash
4. As a result, the process's pipe may get full because no one is
   consuming, leading the process hang

Fixes #208
ninjaahhh added a commit that referenced this issue Apr 30, 2019
1. When writing logs using `Logger.info` (say, logging JSONRPC
   requests)
2. Master or slave process calls `asyncio.StreamReader.readline` to get
   the log out
3. However the stream has a buffer limit 64k and will throw exception if
   reached, hence the printing coroutine will crash
4. As a result, the process's pipe may get full because no one is
   consuming, leading the process hang

Fixes #208
Belgarion pushed a commit to Belgarion/pyquarkchain_cuda that referenced this issue Sep 8, 2020
1. When writing logs using `Logger.info` (say, logging JSONRPC
   requests)
2. Master or slave process calls `asyncio.StreamReader.readline` to get
   the log out
3. However the stream has a buffer limit 64k and will throw exception if
   reached, hence the printing coroutine will crash
4. As a result, the process's pipe may get full because no one is
   consuming, leading the process hang

Fixes QuarkChain#208
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants