Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: support osd_scrub_extended_sleep #29342

Merged
merged 1 commit into from
Aug 13, 2019

Conversation

Jeegn-Chen
Copy link
Contributor

@Jeegn-Chen Jeegn-Chen commented Jul 26, 2019

  1. always take osd_scrub_sleep for manually initiated
    scrubs
  2. when scrub_time_permit() return true for scheduled
    ones, the existing osd_scrub_sleep is used
  3. when scrub_time_permit() return false for scheduled
    ones, there may be 2 scenarios
    3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
    let's take osd_scrub_sleep
    3.2 otherwise, let's take osd_scrub_extended_sleep

Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen jeegnchen@tencent.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Copy link
Contributor

@dzafman dzafman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scrub sleep code in PG::scrub() happens after scrub_time_permit() has already found to be true. The sleep code there is just a way to throttle scrubbing as it is happening since it happens before each chunk.

I happen to have #29360 written last month. It wasn't merged because I was still working out some issues.

@dzafman
Copy link
Contributor

dzafman commented Jul 26, 2019

Looking at the tracker I see what you are trying to accomplish. Couldn't the administrator just make sure that the settings for osd_scrub_begin_week_day, osd_scrub_end_week_day, osd_scrub_begin_hour and osd_scrub_end_hour include enough of a cushion of time that all scrubbing will finish before critical time periods?

Copy link
Contributor

@dzafman dzafman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting ramification of this change is that if an administrator asks for a scrub, it could run slowly if not being initiated during permitted hours. I sort of understand the behavior if a scrub starting while permitted happens to overrun the allowed times. But if an administrator initiates a scrub manually it isn't clear that extended sleep is what he/she would want.

@Jeegn-Chen
Copy link
Contributor Author

Jeegn-Chen commented Jul 29, 2019

Looking at the tracker I see what you are trying to accomplish. Couldn't the administrator just make sure that the settings for osd_scrub_begin_week_day, osd_scrub_end_week_day, osd_scrub_begin_hour and osd_scrub_end_hour include enough of a cushion of time that all scrubbing will finish before critical time periods?

When OSDs are on HDD and PGs have a huge number of objects, it make take hours to complete a scrub. And is some scenarios, the busy period per day may be far over 10 hours. If we further give too large cushion of time, in the long run OSD may not have enough time for timely scrub of all PGs. So I think giving as short period as possible may be preferred in those scenarios.

@Jeegn-Chen
Copy link
Contributor Author

An interesting ramification of this change is that if an administrator asks for a scrub, it could run slowly if not being initiated during permitted hours. I sort of understand the behavior if a scrub starting while permitted happens to overrun the allowed times. But if an administrator initiates a scrub manually it isn't clear that extended sleep is what he/she would want.

What about additional check on scrubber.must_scrub and scrubber.must_repair?

@dzafman
Copy link
Contributor

dzafman commented Jul 29, 2019

What about additional check on scrubber.must_scrub and scrubber.must_repair?

Yes, that is what I was thinking. You only need to check must_scrub, since it is always set when a scrub, deep scrub or repair is requested. The must_deep_scrub and must_repair are added when appropriate.

@dzafman
Copy link
Contributor

dzafman commented Jul 30, 2019

@Jeegn-Chen Another way a scrub could happen even with scrub_time_permit() returns false would be when the scrub deadline is reached (scrub_max_interval). So a scrub could even start when it is not normally permitted if there hasn't been enough time to scrub even once during the scrub_min_interval due to load or time restrictions.

@Jeegn-Chen
Copy link
Contributor Author

retest this please

@Jeegn-Chen
Copy link
Contributor Author

@dzafman I've updated the implementation and the test. Could you take a look?

src/osd/OSD.cc Outdated Show resolved Hide resolved
src/osd/OSD.cc Outdated Show resolved Hide resolved
1. always take osd_scrub_sleep for manually initiated
   scrubs
2. when scrub_time_permit() return true for scheduled
   ones, the existing osd_scrub_sleep is used
3. when scrub_time_permit() return false for scheduled
   ones, there may be 2 scenarios
   3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
       let's take osd_scrub_sleep
   3.2 otherwise, let's take osd_scrub_extended_sleep

Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
@Jeegn-Chen
Copy link
Contributor Author

@tchaikov Thanks for the comments on the coding style. I've updated the PR.

@tchaikov tchaikov merged commit f13c7c8 into ceph:master Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants