Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nautilus: mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35798

Merged
merged 1 commit into from Jul 14, 2020

Conversation

sseshasa
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/46228


backport of #35490
parent tracker: https://tracker.ceph.com/issues/45943

this backport was staged using ceph-backport.sh version 15.1.1.389
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

…shold.

Reset the grace hearbeat period if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following:
 - get_grace_interval_threshold():
    Calculates and returns the grace interval threshold value.
 - grace_interval_threshold_exceeded(int):
    Checks if grace interval threshold is exceeded based on the last
    down stamp.
 - set_default_laggy_params(int):
     Resets the laggy_probability and laggy_interval in the
     new_xinfo structure maintained within pending_inc to be applied
     eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,
 - encode_pending() - If an existing osd is experiencing failure
   after an interval exceeding the failure threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 9f1d4c1)
@sseshasa sseshasa added this to the nautilus milestone Jun 26, 2020
@sseshasa sseshasa added the core label Jun 26, 2020
@smithfarm smithfarm added the mon label Jun 26, 2020
@yuriw
Copy link
Contributor

yuriw commented Jul 13, 2020

@yuriw yuriw merged commit ab2f235 into ceph:nautilus Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants