-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: support osd_scrub_extended_sleep #29342
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scrub sleep code in PG::scrub() happens after scrub_time_permit() has already found to be true. The sleep code there is just a way to throttle scrubbing as it is happening since it happens before each chunk.
I happen to have #29360 written last month. It wasn't merged because I was still working out some issues.
Looking at the tracker I see what you are trying to accomplish. Couldn't the administrator just make sure that the settings for osd_scrub_begin_week_day, osd_scrub_end_week_day, osd_scrub_begin_hour and osd_scrub_end_hour include enough of a cushion of time that all scrubbing will finish before critical time periods? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting ramification of this change is that if an administrator asks for a scrub, it could run slowly if not being initiated during permitted hours. I sort of understand the behavior if a scrub starting while permitted happens to overrun the allowed times. But if an administrator initiates a scrub manually it isn't clear that extended sleep is what he/she would want.
When OSDs are on HDD and PGs have a huge number of objects, it make take hours to complete a scrub. And is some scenarios, the busy period per day may be far over 10 hours. If we further give too large cushion of time, in the long run OSD may not have enough time for timely scrub of all PGs. So I think giving as short period as possible may be preferred in those scenarios. |
What about additional check on scrubber.must_scrub and scrubber.must_repair? |
Yes, that is what I was thinking. You only need to check must_scrub, since it is always set when a scrub, deep scrub or repair is requested. The must_deep_scrub and must_repair are added when appropriate. |
@Jeegn-Chen Another way a scrub could happen even with scrub_time_permit() returns false would be when the scrub deadline is reached (scrub_max_interval). So a scrub could even start when it is not normally permitted if there hasn't been enough time to scrub even once during the scrub_min_interval due to load or time restrictions. |
a71c847
to
fa615db
Compare
retest this please |
@dzafman I've updated the implementation and the test. Could you take a look? |
1. always take osd_scrub_sleep for manually initiated scrubs 2. when scrub_time_permit() return true for scheduled ones, the existing osd_scrub_sleep is used 3. when scrub_time_permit() return false for scheduled ones, there may be 2 scenarios 3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep, let's take osd_scrub_sleep 3.2 otherwise, let's take osd_scrub_extended_sleep Fixes: http://tracker.ceph.com/issues/40955 Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
fa615db
to
3bfb5c2
Compare
@tchaikov Thanks for the comments on the coding style. I've updated the PR. |
scrubs
ones, the existing osd_scrub_sleep is used
ones, there may be 2 scenarios
3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
let's take osd_scrub_sleep
3.2 otherwise, let's take osd_scrub_extended_sleep
Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen jeegnchen@tencent.com