mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35490

sseshasa · 2020-06-08T19:33:10Z

Reset the grace hearbeat timer if there have been no failures since the
set threshold value (48 Hrs). The mon_osd_laggy_halflife value is
leveraged to calculate the threshold.

A couple of helper functions do the following

get_grace_interval_threshold():
Calculates and returns the grace interval threshold value.
grace_interval_threshold_exceeded(int):
Checks if grace interval threshold is exceeded based on the last
down stamp.
set_default_laggy_params(int):
Resets the laggy_probability and laggy_interval in the
new_xinfo structure maintained within pending_inc to be applied
eventually as part of update from paxos.

The threshold value is checked and the laggy parameters are reset at the
following point,

encode_pending() - If an existing osd that is experiencing failure
after an interval exceeding the grace threshold period.

Fixes: https://tracker.ceph.com/issues/45943
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

sseshasa · 2020-06-08T19:38:34Z

@jdurgin @neha-ojha PTAL. Please check if the approach to reset the grace interval is acceptable. I will take some more time to test this. Please suggest/add reviewers that you think are necessary. Thanks!

jdurgin

the approach looks good to me, will take a closer look later this week

jdurgin · 2020-06-16T02:33:09Z

src/mon/OSDMonitor.cc

+      if (grace_interval_threshold_exceeded(last_failure)) {
+        set_default_laggy_params(target_osd);
+      }
+    }
    double halflife = (double)g_conf()->mon_osd_laggy_halflife;
    decay_k = ::log(.5) / halflife;


double-checked the math and the half-life decay should be effective, assuming the timestamps are accurate -- decay is 0.5 after one halflife, 0.25 after 2, etc. If the failure interval is inaccurate here, that would explain the extra grace period not going away. Are you still seeing more MOSDFailures sent after the osd is brought back in your testing?

Similarly, in the code setting the interval on boot, if the issue you're seeing with down_stamp not updating is present, the interval would always be set to mon_osd_max_laggy_interval, which would explain why rebooting was not helping the user who ran into this.

Your approach here of resetting things entirely is a nice fallback to avoid any further bugs like this.

@jdurgin Apparently my previous fix was introduced in the wrong place. From my investigation and testing, the correct place to check for the threshold breach is when an osd is marked down and its state is being updated/encoded within the pending
incremental map.

When the osd is marked down, its state is set to CEPH_OSD_UP in
OSDMonitor::check_failure(). Eventually, when the incremental map is being encoded before applying it, a check is made to verify if the grace period has exceeding the set
threshold if the new state is set to CEPH_OSD_UP prior to resetting the laggy params.
The down_stamp for the osd is also updated as part of the reset.

neha-ojha

few nits, otherwise looks good.

src/mon/OSDMonitor.cc

neha-ojha · 2020-06-23T22:31:24Z

retest this please

…shold. Reset the grace hearbeat period if there have been no failures since the set threshold value (48 Hrs). The mon_osd_laggy_halflife value is leveraged to calculate the threshold. A couple of helper functions do the following: - get_grace_interval_threshold(): Calculates and returns the grace interval threshold value. - grace_interval_threshold_exceeded(int): Checks if grace interval threshold is exceeded based on the last down stamp. - set_default_laggy_params(int): Resets the laggy_probability and laggy_interval in the new_xinfo structure maintained within pending_inc to be applied eventually as part of update from paxos. The threshold value is checked and the laggy parameters are reset at the following point, - encode_pending() - If an existing osd is experiencing failure after an interval exceeding the failure threshold period. Fixes: https://tracker.ceph.com/issues/45943 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

sseshasa · 2020-06-25T08:29:55Z

Pulpito Run:
http://pulpito.ceph.com/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/

There were 14 failures out of 295 tests. Checked all the failed ones and confirmed that none of them are related to this change.

sseshasa added core needs-qa mon labels Jun 8, 2020

sseshasa requested review from jdurgin and neha-ojha June 8, 2020 19:33

jdurgin reviewed Jun 9, 2020

View reviewed changes

sseshasa force-pushed the wip-grace-interval-timer-reset branch from c961a71 to f51e08d Compare June 15, 2020 13:26

jdurgin reviewed Jun 16, 2020

View reviewed changes

sseshasa force-pushed the wip-grace-interval-timer-reset branch from f51e08d to 49ad6c6 Compare June 22, 2020 04:52

neha-ojha reviewed Jun 22, 2020

View reviewed changes

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved

src/mon/OSDMonitor.cc Outdated Show resolved Hide resolved

sseshasa force-pushed the wip-grace-interval-timer-reset branch from 49ad6c6 to 0339eef Compare June 23, 2020 06:35

sseshasa added the wip-sseshasa-testing label Jun 23, 2020

sseshasa force-pushed the wip-grace-interval-timer-reset branch from 0339eef to dfb24f1 Compare June 23, 2020 19:34

sseshasa changed the title ~~mon/OSDMonitor: Reset grace hearbeat period if it exceeds a threshold.~~ mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. Jun 23, 2020

neha-ojha approved these changes Jun 23, 2020

View reviewed changes

jdurgin approved these changes Jun 23, 2020

View reviewed changes

sseshasa force-pushed the wip-grace-interval-timer-reset branch from dfb24f1 to ca39db7 Compare June 24, 2020 10:25

sseshasa force-pushed the wip-grace-interval-timer-reset branch from ca39db7 to 9f1d4c1 Compare June 24, 2020 13:25

tchaikov added the wip-kefu-testing label Jun 25, 2020

sseshasa merged commit 8b33a29 into ceph:master Jun 26, 2020

This was referenced Jun 26, 2020

nautilus: mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35798

Merged

octopus: mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35799

Merged

sseshasa removed the wip-sseshasa-testing label May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35490

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35490

sseshasa commented Jun 8, 2020 •

edited

sseshasa commented Jun 8, 2020

jdurgin left a comment

jdurgin Jun 16, 2020

sseshasa Jun 22, 2020 •

edited

neha-ojha left a comment

neha-ojha commented Jun 23, 2020

sseshasa commented Jun 25, 2020

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35490

mon/OSDMonitor: Reset grace period if failure interval exceeds a threshold. #35490

Conversation

sseshasa commented Jun 8, 2020 • edited

Checklist

sseshasa commented Jun 8, 2020

jdurgin left a comment

Choose a reason for hiding this comment

jdurgin Jun 16, 2020

Choose a reason for hiding this comment

sseshasa Jun 22, 2020 • edited

Choose a reason for hiding this comment

neha-ojha left a comment

Choose a reason for hiding this comment

neha-ojha commented Jun 23, 2020

sseshasa commented Jun 25, 2020

sseshasa commented Jun 8, 2020 •

edited

sseshasa Jun 22, 2020 •

edited