Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hammer: clock skew report is incorrect by ceph health detail command #8051

Merged
merged 2 commits into from Apr 6, 2016

Conversation

xiaoxichen
Copy link
Contributor

@xiaoxichen xiaoxichen self-assigned this Mar 11, 2016
@xiaoxichen xiaoxichen added this to the hammer milestone Mar 11, 2016
@smithfarm
Copy link
Contributor

@jecluis Please review

@smithfarm
Copy link
Contributor

@xiaoxichen The commits are missing the "(cherry-picked from . . .)" line - this is added automatically which you provide the -x option to git cherry-pick. Sorry for the nit - can you please fix?

@xiaoxichen xiaoxichen changed the title hammer: clock skew report is incorrect by ceph health detail command DMN:hammer: clock skew report is incorrect by ceph health detail command Mar 12, 2016
@xiaoxichen
Copy link
Contributor Author

cherry-pick seems not cleanly, will fix it.

@xiaoxichen xiaoxichen changed the title DMN:hammer: clock skew report is incorrect by ceph health detail command DNM:hammer: clock skew report is incorrect by ceph health detail command Mar 12, 2016
@xiaoxichen xiaoxichen force-pushed the wip-15024-hammer branch 2 times, most recently from bc16a2a to c8d7761 Compare March 12, 2016 02:18
@xiaoxichen xiaoxichen changed the title DNM:hammer: clock skew report is incorrect by ceph health detail command hammer: clock skew report is incorrect by ceph health detail command Mar 12, 2016
@smithfarm
Copy link
Contributor

@xiaoxichen: Could you add to the first commit message a description of the conflicts and how you resolved them? This if debugging/forensics are necessary later, such descriptions can save a lot of time.

When in the presence of a clock skew, adjust the checking interval
according to how many rounds have gone by since the last clean check.

If a skew is detected, instead of waiting an additional 300 seconds we
will perform the check more frequently, gradually backing off the
frequency if the skew is still in place (up to a maximum of
'mon_timecheck_interval', default: 300s). This will help with transient
skews.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>

(cherry pick from commit 45e16d0)

Conflicts:
	src/common/config_opts.h
            Merge the change line.
	src/mon/Monitor.h
            handle_timecheck_leader(MonOpRequestRef op) was replaced with handle_timecheck_leader(MTimeCheck *m)
            also for handle_timecheck_peon and handle_timecheck.
By weighting the reports we were making it really hard to get rid of a
clock skew warning once the cause had been fixed.

Instead, as soon as we get a clean bill of health, let's run a new round
and soon as possible and ascertain whether that was a transient fix or
for realsies. That should be better than the alternative of waiting for
an hour or something (for a large enough skew) for the warning to go
away - and with it, the admin's sanity ("WHAT AM I DOING WRONG???").

Fixes: ceph#14175

Signed-off-by: Joao Eduardo Luis <joao@suse.de>

(cherry pick from commit 17d8ff4)
@xiaoxichen
Copy link
Contributor Author

@smithfarm see if it works for you.

ghost pushed a commit that referenced this pull request Apr 5, 2016
…eph health detail command

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@ghost
Copy link

ghost commented Apr 6, 2016

@liewegas does this backport look good to merge ? It passed the hammer upgrade suite ( http://tracker.ceph.com/issues/14692#note-52 ) and the rados suite http://tracker.ceph.com/issues/14692#note-51).

@ghost ghost assigned liewegas and unassigned xiaoxichen Apr 6, 2016
@liewegas liewegas merged commit 815760d into ceph:hammer Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants