New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph.conf: mon_clock_drift_allowed .5 -> 1.0 #1146
Conversation
All of the errors I see seem to be between .5 and .9s. Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas I actually (finally) set up our own physical NTP servers. We just need to cut the testnodes over to them. I can do that today. |
All the testnodes are using our own internal NTP servers now. Let's maybe keep an eye on http://pulpito.ceph.com/sage-2018-01-31_18:27:33-rados-wip-sage-testing-2018-01-31-1051-distro-basic-smithi/ and see if any clock skew errors pop up before merging this. |
Still seeing failures (although fewer of them!). For example, http://pulpito.ceph.com/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143176 |
The clock skew warnings that are causing the job to fail are at the very beginning of the cluster being spun up. I've always seen MONs disagree on time right after the daemons start.
I'm fine with this PR if we're okay with acknowledging it takes a bit (1min 14sec in this instance) for the MONs to get get in sync. I suppose it's sort of like whitelisting certain warnings like we do with SELinux. |
Yeah i dont think it is necessary, since the cluster has come to HEALTH_OK state and not failed due to time sync issue, As @djgalloway pointed out, its just the initial bootstrap that is taking time, we could actually whitelist this without much impact? 2018-02-02T22:54:38.279 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_OK |
@liewegas What do you want to do?
If the first, I'll just merge and we can be done with these errors. |
Let's just merge this. If we think the issues are fixed/better later we can revert and see how things shake out |
jenkins test this please |
All of the errors I see seem to be between .5 and .9s.
Signed-off-by: Sage Weil sage@redhat.com