-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.0.0.62 network unreliability, node goes unresponsive/offline following socket timeouts #139
Comments
Happened again to 2 nodes overnight. 2 other nodes hosted on the same infrastructure stayed running OK. |
Happened to 4 nodes again since yesterday.
This node managed to answer 1 RPC but locked up on the next one.
Second debug.log ends with
Third ends with
and the fourth with
This one accepted a createcontract RPC but didn't timeout or complete the request.
|
Another one bites the dust
Refuses to respond to CLI RPCs, same gdb as above. |
v0.0.0.69 collateral wallet experienced the 1201 timeouts after restarting following a crown-qt crash #168 Probably before the timeouts, the peers were
and chaintips included
CPU use of crownd pegged at 100% of 1 vCPU despite not actually doing anything useful. Asking it to reconsider the last good block didn't help.
similar, but not identical to the earlier example. |
Had 2 more instances of the 1201 timeouts since yesterday afternoon. This was after they were moved to different servers in the Crown infra. They had to be |
Had another instance overnight, here's the debug.log and a coredump of the hung node |
Had at least 2 more instances overnight. This is not resolved, cannot be closed. |
And now ? |
Let's work on the basis that (most of) the 1201s timeouts are a consequence of getting stuck in NodeMinter, so keep #195 open, and close this one. |
Nodes sometimes go unavailable following a bunch of socket timeouts.
Expected behavior
Nodes should stay up
Actual behavior
Nodes sometimes go unavailable following a bunch of socket timeouts.
To reproduce
Start a node and wait and see.
System information
v0.0.0.62
debug.log ends with
the node is still running, just not talking to anything, including RPC requests from
crown-cli
gdb shows
The text was updated successfully, but these errors were encountered: