Skip to content

Discovery on Windows is Broken With Clock Drifts #8144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nisdas opened this issue Dec 17, 2020 · 12 comments · Fixed by #14487
Closed

Discovery on Windows is Broken With Clock Drifts #8144

nisdas opened this issue Dec 17, 2020 · 12 comments · Fixed by #14487
Labels
Bug Something isn't working Networking P2P related items Priority: High High priority item Windows Anything related only to Windows OS

Comments

@nisdas
Copy link
Contributor

nisdas commented Dec 17, 2020

🐞 Bug Report

Description

Currently there have been a non trivial amount of users complaining on Windows about their number of peers
slowly reducing to zero. v1.05 has had some important changes with regards to stream management and subnet search.
However none of these components should have had any material impact on discovery. Windows machines being more
susceptible to clock drifts suddenly stop being able to find new peers for some reason, being minutes ahead/behind with respect to the network time should by right not have any effect in finding new peers.

Has this worked before in a previous version?

This seems to have been introduced in v1.05, previous versions were perfectly fine in regards to discovery.

🔬 Minimal Reproduction

Run a windows node for a few hours on v1.05

🔥 Error

Peer count slowly starts decreasing, and eventually bottoms out at zero.

🌍 Your Environment

Operating System:

Windows

What version of Prysm are you running? (Which release)

v1.05

@nisdas nisdas added Bug Something isn't working Priority: High High priority item Networking P2P related items Windows Anything related only to Windows OS labels Dec 17, 2020
@yv989c
Copy link

yv989c commented Jan 4, 2021

I haven't experienced this yet as a Windows user running a beacon node for over 15 days now. I turned on the option that keeps the system clock in sync using NTP. I'm limiting my peers to 5 via p2p-max-peers in case this is relevant.
Beacon node: v1.0.5
OS: Windows 10 Pro (2004)

@nisdas
Copy link
Contributor Author

nisdas commented Jan 4, 2021

Ah k thanks for the report, @yv989c I am assuming you are not running any validators here ? I haven't been able to reproduce on my windows machine yet, but it does seem like an issue for a non trivial amount of windows users. I will be investigating it this week.

@yv989c
Copy link

yv989c commented Jan 7, 2021

Hi @nisdas . My validator became active yesterday, everything was working fine for about 9 hours, then efficiency started dropping (was getting less rewards). After inspecting the validator console, I can see the that in the Previous epoch voting summary the correctlyVotedHead=false message becomes more frequent over time, which I believe is the reason for lower rewards (+0.00002 ETH vs +0.00005 ETH per attestation). If that's the case, do you know what condition causes correctlyVotedHead=false?

Thanks.

@nisdas
Copy link
Contributor Author

nisdas commented Jan 7, 2021

@yv989c Are you running with 5 peers as a max ?

@yv989c
Copy link

yv989c commented Jan 7, 2021

Hi @nisdas it was but I increased it to 25 peers max after I wrote the above message just to see if that makes any difference (I also noticed that my clock was 1 second behind, I fixed it). About one hour after the change I started getting subsequent +0.00005 ETH attestations in beaconcha.in and my effectiveness went from 75% to 92% as of this morning (running for about 8 hours), and now is at 84% getting subsequent +0.00002 ETH per attestation.

Any guidance on troubleshooting this will be appreciated. Thanks.

@yv989c
Copy link

yv989c commented Jan 8, 2021

This may be related to my system's clock. Now I found it 2 seconds behind, so it's drifting fairly quickly in a day and what I thought was supposed to keep it in sync is not working.

I wrote a script to ensure it syncs against a NTP authority every hour. Hopefully this solves my issue. I'll keep you posted.

@yv989c
Copy link

yv989c commented Jan 8, 2021

So far so good! It's doing perfect now. It seems the problem was the drifting clock. I need to read more about the protocol, it's interesting to see how just a couple of seconds off had such a dramatic effect on attestation.

@nisdas
Copy link
Contributor Author

nisdas commented Mar 23, 2021

Closing this as this issue hasn't been reported in a while in our last few releases.

@defeedme
Copy link

defeedme commented Jan 12, 2024

yup this is exactly what's happening to me windows 10.. after about 2 hours , slowly looses all peers.. version 4.2 hasn't fixed anything for windows.. i'm using a time sync program.. maybe that could be the issue?

@defeedme
Copy link

defeedme commented Jan 15, 2024

it appears the time sync was the issue. You need to use the google time servers with an interval of 10 min ..
update: I spoke to soon.. it was perfect for 2 days then same issue started happening again.

@defeedme
Copy link

So far so good! It's doing perfect now. It seems the problem was the drifting clock. I need to read more about the protocol, it's interesting to see how just a couple of seconds off had such a dramatic effect on attestation.

way less than a couple of seconds - mine was milliseconds and had a dramtic effect!

@defeedme
Copy link

defeedme commented May 4, 2024

I spoke too soon, worked perfect for 2 days then failed.. back to the drawing board..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Networking P2P related items Priority: High High priority item Windows Anything related only to Windows OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants