Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph.conf: mon_clock_drift_allowed .5 -> 1.0 #1146

Merged
merged 1 commit into from Feb 9, 2018

Conversation

liewegas
Copy link
Member

All of the errors I see seem to be between .5 and .9s.

Signed-off-by: Sage Weil sage@redhat.com

All of the errors I see seem to be between .5 and .9s.

Signed-off-by: Sage Weil <sage@redhat.com>
@djgalloway
Copy link

@liewegas I actually (finally) set up our own physical NTP servers. We just need to cut the testnodes over to them. I can do that today.

@djgalloway
Copy link

All the testnodes are using our own internal NTP servers now. Let's maybe keep an eye on http://pulpito.ceph.com/sage-2018-01-31_18:27:33-rados-wip-sage-testing-2018-01-31-1051-distro-basic-smithi/ and see if any clock skew errors pop up before merging this.

@liewegas
Copy link
Member Author

liewegas commented Feb 6, 2018

Still seeing failures (although fewer of them!). For example, http://pulpito.ceph.com/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143176

@djgalloway
Copy link

djgalloway commented Feb 6, 2018

The clock skew warnings that are causing the job to fail are at the very beginning of the cluster being spun up. I've always seen MONs disagree on time right after the daemons start.

2018-02-02T22:53:24.972 INFO:tasks.ceph:Waiting until ceph cluster ceph is healthy...
2018-02-02T22:53:24.972 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:53:25.071 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:24.660 7f60351f8700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:25.094 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:24.684 7f60351f8700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:25.219 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:25.219 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:32.220 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:53:32.349 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:31.940 7f959a527700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:32.364 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:31.956 7f959a527700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:32.483 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:32.483 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:39.485 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:53:39.727 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:39.328 7fb8d671f700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:39.746 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:39.348 7fb8d671f700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:39.903 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:39.903 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:46.916 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:53:47.043 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:46.647 7f9a094b0700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:47.061 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:46.667 7f9a094b0700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:47.201 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:47.202 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:54.204 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:53:54.332 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:53.939 7fbc8d878700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:54.347 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:53:53.955 7fbc8d878700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:53:54.516 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:53:54.517 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:01.519 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:01.646 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:01.259 7f7e48aa4700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:01.660 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:01.267 7f7e48aa4700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:01.823 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:01.824 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:08.826 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:08.955 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:08.570 7fc1bf9e5700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:08.974 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:08.590 7fc1bf9e5700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:09.105 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:09.106 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:16.108 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:16.242 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:15.858 7f104d0ef700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:16.257 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:15.874 7f104d0ef700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:16.402 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:16.403 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:23.404 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:23.526 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:23.514 7f29295b7700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:23.550 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:23.538 7f29295b7700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:23.713 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:23.713 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:30.718 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:30.848 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:30.838 7f8e36bb5700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:30.863 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:30.850 7f8e36bb5700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:31.026 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:31.026 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN clock skew detected on mon.b
2018-02-02T22:54:38.028 INFO:teuthology.orchestra.run.smithi077:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --cluster ceph health'
2018-02-02T22:54:38.154 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:38.142 7f6e9308e700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:38.173 INFO:teuthology.misc.health.smithi077.stderr:2018-02-02 22:54:38.162 7f6e9308e700 -1 WARNING: all dangerous and experimental features are enabled.
2018-02-02T22:54:38.279 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_OK
2018-02-02T22:54:38.279 DEBUG:teuthology.misc:Ceph health: HEALTH_OK

I'm fine with this PR if we're okay with acknowledging it takes a bit (1min 14sec in this instance) for the MONs to get get in sync. I suppose it's sort of like whitelisting certain warnings like we do with SELinux.

@vasukulkarni
Copy link
Contributor

Yeah i dont think it is necessary, since the cluster has come to HEALTH_OK state and not failed due to time sync issue, As @djgalloway pointed out, its just the initial bootstrap that is taking time, we could actually whitelist this without much impact?

2018-02-02T22:54:38.279 INFO:teuthology.misc.health.smithi077.stdout:HEALTH_OK
2018-02-02T22:54:38.279 DEBUG:teuthology.misc:Ceph health: HEALTH_OK

@djgalloway
Copy link

@liewegas What do you want to do?

  • Increase mon_clock_drift_allowed
  • Capture new OS images and see if it helps
  • Whitelist clock skew warnings?

If the first, I'll just merge and we can be done with these errors.

@liewegas
Copy link
Member Author

liewegas commented Feb 9, 2018

Let's just merge this. If we think the issues are fixed/better later we can revert and see how things shake out

@djgalloway
Copy link

jenkins test this please

@djgalloway djgalloway merged commit f9de24e into master Feb 9, 2018
@djgalloway djgalloway deleted the wip-allow-more-drift branch February 9, 2018 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants