New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block sync silently stalls on Schlesi #1740
Comments
Could you please check the |
There is nothing in the log either.
Notably, a restart allowed the node to recover and sync. |
I believe I've reproduced this. Teku was running inside docker on my Mac when I disconnected the ethernet cable so networking had to switch to wifi. In this case there were a number of error messages in the log like:
@Nashatyrev any chance you could look into this from the jvm-libp2p side and see if it's not detecting peers disconnecting because of external network changes? |
@ajsutton Ok, will try to check that on libp2p level. 👍 |
One potential reason is that in similar cases TCP connection might not be closed on OS level (TCP FIN packet is not received) and hang for some long period. That's generally why on the application level some kind of @ajsutton @mbaxter from the first glance I couldn't find any matching code. Is this still to be done? |
@Nashatyrev - right, this isn't implemented yet. I can add a ticket to implement this functionality using the new |
We now periodically send PING (I think) so if we disconnect the peer if we don't get a response that should be all we need (may want to provide some leniency to only disconnect if 2 or 3 responses are missed in a row though) |
Now it's getting more complicated, I keep losing peers on Schlesi:
Check the peercount. Also, it barely receives blocks in real-time. Edit: master @ 2c03585 |
One of my bootnodes can't synchronize anymore.
[...]
|
Thanks, my nodes recovered after applying #1779 ! |
Description
As an User, I want to sync Schlesi.
Steps to Reproduce (Bug)
I have a hunch that this might be similar to sigp/lighthouse#949
Because around 5 am my local IP is changed by my ISP.
Expected behavior: [What you expect to happen]
It should behave. It should not stop receiving new blocks. It should maybe do some checks on the connections, I have no clue what it could be.
Actual behavior: [What actually happens]
Block processing silently stalls:
Frequency: [What percentage of the time does it occur?]
Seen this once
Versions (Add all that apply)
The text was updated successfully, but these errors were encountered: