New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hammer: clock skew report is incorrect by ceph health detail command #8051
Conversation
@jecluis Please review |
@xiaoxichen The commits are missing the "(cherry-picked from . . .)" line - this is added automatically which you provide the |
cherry-pick seems not cleanly, will fix it. |
bc16a2a
to
c8d7761
Compare
@xiaoxichen: Could you add to the first commit message a description of the conflicts and how you resolved them? This if debugging/forensics are necessary later, such descriptions can save a lot of time. |
When in the presence of a clock skew, adjust the checking interval according to how many rounds have gone by since the last clean check. If a skew is detected, instead of waiting an additional 300 seconds we will perform the check more frequently, gradually backing off the frequency if the skew is still in place (up to a maximum of 'mon_timecheck_interval', default: 300s). This will help with transient skews. Signed-off-by: Joao Eduardo Luis <joao@suse.de> (cherry pick from commit 45e16d0) Conflicts: src/common/config_opts.h Merge the change line. src/mon/Monitor.h handle_timecheck_leader(MonOpRequestRef op) was replaced with handle_timecheck_leader(MTimeCheck *m) also for handle_timecheck_peon and handle_timecheck.
By weighting the reports we were making it really hard to get rid of a clock skew warning once the cause had been fixed. Instead, as soon as we get a clean bill of health, let's run a new round and soon as possible and ascertain whether that was a transient fix or for realsies. That should be better than the alternative of waiting for an hour or something (for a large enough skew) for the warning to go away - and with it, the admin's sanity ("WHAT AM I DOING WRONG???"). Fixes: ceph#14175 Signed-off-by: Joao Eduardo Luis <joao@suse.de> (cherry pick from commit 17d8ff4)
c8d7761
to
01672b4
Compare
@smithfarm see if it works for you. |
…eph health detail command Reviewed-by: Loic Dachary <ldachary@redhat.com>
@liewegas does this backport look good to merge ? It passed the hammer upgrade suite ( http://tracker.ceph.com/issues/14692#note-52 ) and the rados suite http://tracker.ceph.com/issues/14692#note-51). |
http://tracker.ceph.com/issues/15024