Skip to content

Conversation

@ronen-fr
Copy link
Contributor

@ronen-fr ronen-fr commented Jun 20, 2024

This is a follow-up to the previous commits in this series of changes
to the scrub scheduling mechanism.

In this step:

  • the scrub queue now holds a copy of the ScrubJob
    and not a shared pointer to it.
    The PG's entry in the scrub queue and the scrub-job object that
    is part of the scrubber are now separate entities, that can be
    modified separately - each with its own locking mechanism.
  • the interaction between OsdScrub (the OSD's scrub scheduler
    object) and the scrub queue during scrub initiation is simplified:
    instead of fetching from the queue a list of scrub targets
    possibly ready to be scrubbed, the OsdScrub now picks only one
    candidate - the top of the queue.
  • the scrub registration process - initiated by
    schedule_scrub_with_osd() - is simplified.
  • the recalculation of the scrub schedule following scrub completion
    or an abort is improved.

The next step (next PR):

  • the scrub queue maintains two scheduling targets per each PG -
    one for the next shallow scrub, and one for the next deep scrub.
  • the 'planned scrub' flags are replaced with specific attributes
    in the scheduling target.

@github-actions github-actions bot added the core label Jun 20, 2024
@athanatos
Copy link
Contributor

I think this seems like a reasonable direction, though it's not entirely clear from this PR what the end goal is. It might be easier to understand with the followup changes alluded to here, but on balance I don't think it's worth making this PR larger. Let me know when you need a review.

@ronen-fr
Copy link
Contributor Author

@athanatos

I think this seems like a reasonable direction, though it's not entirely clear from this PR what the end goal is. It might be easier to understand with the followup changes alluded to here, but on balance I don't think it's worth making this PR larger. Let me know when you need a review.

Thanks a lot. Will do.

@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch from b6b510d to cfc0240 Compare June 21, 2024 13:30
@github-actions github-actions bot added the tests label Jun 26, 2024
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch 12 times, most recently from 42004a6 to f8b5d19 Compare July 2, 2024 14:23
@ronen-fr ronen-fr changed the title Wip rf targets j13 osd/scrub: no shared scrub-job ownership between PGs and the scrub queue Jul 2, 2024
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch 2 times, most recently from ee92ae4 to fa665a9 Compare July 2, 2024 15:20
@ronen-fr ronen-fr marked this pull request as ready for review July 2, 2024 15:21
@ronen-fr ronen-fr requested a review from a team as a code owner July 2, 2024 15:21
@ronen-fr ronen-fr requested review from athanatos and neha-ojha July 2, 2024 15:22
@ronen-fr
Copy link
Contributor Author

ronen-fr commented Jul 3, 2024


CI build fails.
The problem is with the Clang++ 14 used by the CI (Why?), which uses an old stdc++ version.

Update:
caused by issue: https://github.com/llvm/llvm-project/issues/44178
which boils down to this:

don't use <ranges> for clang<=15 & libstdc++
(the words of Eric Niebler - https://github.com/NVIDIA/stdexec/pull/1003)

@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch from fd0685c to 1604cd5 Compare July 5, 2024 14:34
@athanatos
Copy link
Contributor

make check seems to fail

*/
void update_job(
Scrub::ScrubJobRef sjob,
Scrub::ScrubJob& sjob,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have neither implementation nor callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed...
Forgot to rebase with the '.h' changes when removing from the '.cc'.
Fixed.

* To be used if we are no longer the PG's primary, or if the PG is removed.
*/
void remove_from_osd_queue(Scrub::ScrubJobRef sjob);
void remove_from_osd_queue(Scrub::ScrubJob& sjob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_from_osd_queue only needs the pgid, not a mutable reference to a whole ScrubJob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*/
void delay_on_failure(
Scrub::ScrubJobRef sjob,
Scrub::ScrubJob& sjob,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't obviously make sense to me. There appears to be only one caller:

/*
 * note: arbitrary delay used in this early version of the
 * scheduler refactoring.
 */
void PgScrubber::penalize_next_scrub(Scrub::delay_cause_t cause)
{
  m_osds->get_scrub_services().delay_on_failure(
      *m_scrub_job, 5s, cause, ceph_clock_now());
}

The implementation

void OsdScrub::delay_on_failure(
      Scrub::ScrubJob& sjob,
      std::chrono::seconds delay,
      Scrub::delay_cause_t delay_cause,
      utime_t now_is)
{
  m_queue.delay_on_failure(sjob, delay, delay_cause, now_is);
}

simply invokes ScrubQueue::delay_on_failure

void ScrubQueue::delay_on_failure(
    Scrub::ScrubJob& sjob,
    std::chrono::seconds delay,
    Scrub::delay_cause_t delay_cause,
    utime_t now_is)
{
  dout(10) << fmt::format(
		  "pg[{}] delay_on_failure: delay:{} now:{:s}",
		  sjob.pgid, delay, now_is)
	   << dendl;
  sjob.delay_on_failure(delay, delay_cause, now_is);
}

which finally invokes ScrubJob::delay_on_failure. This seems like a holdover from the previous scheme. PgScrubber::requeue_penalized, by contrast, seems to be perfectly happy invoking ScrubJob::delay_on_failure directly:

void PgScrubber::requeue_penalized(Scrub::delay_cause_t cause)
{
  /// \todo fix the 5s' to use a cause-specific delay parameter
  m_scrub_job->delay_on_failure(5s, cause, ceph_clock_now());
  m_osds->get_scrub_services().enqueue_target(*m_scrub_job);
}

Let's remove OsdScrub::delay_on_failure and ScrubQueue::delay_on_failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and PgScrubber::penalize_next_scrub() actually has no further callers since PgScrubber::requeue_penalized calls ScrubJob::delay_on_failure and OsdScrub::enqueue_target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the whole chain was irrelevant after the changes. Removed.

void register_with_osd(
Scrub::ScrubJobRef sjob,
const Scrub::sched_params_t& suggested);
void enqueue_target(Scrub::ScrubJob& sjob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes a mutable reference because the implementation updates target_queued --ScrubQueue::enqueue_target actually asserts !target_queued. The comment here does not mention this behavior or that the caller is responsible for ensuring that target_queued is cleared. This is particularly confusing as pg_scrubber appears to be responible for clearing this flag. I'd really prefer that components outside of pg simply queue the ScrubJob and ignore any fields they don't specifically need to access for that purpose. External interfaces like this one should take const references to ScrubJob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

dout(10) << fmt::format("pg[{}] not scrub-able by this OSD", sjob.pgid) << dendl;
return;
}
ceph_assert(!sjob.target_queued);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, let's not look at members irrelevant to this component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove. But note: it did serve a purpose in detecting omissions during development.
I'll add the check to all callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these checks to the point in PgScrubber where it does the scheduling would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought: I see this differently. The ScrubQueue object maintains the queue, and should
be allowed to access what it queues.
Anyway - will remove it at this phase, as the element queued is changed in the next phase.

* To be used if we are no longer the PG's primary, or if the PG is removed.
*/
void remove_from_osd_queue(Scrub::ScrubJobRef sjob);
void remove_from_osd_queue(const Scrub::ScrubJob& sjob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only pass pgid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


void delay_on_failure(
Scrub::ScrubJobRef sjob,
Scrub::ScrubJob& sjob,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch 2 times, most recently from 86754ad to efabc04 Compare July 6, 2024 15:26
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch from e6757ea to a01feb5 Compare July 7, 2024 12:58
@ronen-fr
Copy link
Contributor Author

ronen-fr commented Jul 7, 2024

make check seems to fail

added a commit to remove the irrelevant offending unit-test

@ronen-fr
Copy link
Contributor Author

ronen-fr commented Jul 7, 2024

jenkins test api

@ronen-fr ronen-fr requested a review from athanatos July 7, 2024 18:18
@Svelar
Copy link
Member

Svelar commented Jul 9, 2024

jenkins test make check arm64

@ronen-fr ronen-fr added the DNM label Jul 11, 2024
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch 2 times, most recently from 372ffd7 to 8311fb0 Compare July 11, 2024 15:30
@ronen-fr ronen-fr removed the DNM label Jul 12, 2024
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch from 8311fb0 to 0479028 Compare July 12, 2024 13:22
ronen-fr added 11 commits July 16, 2024 09:19
to convey the change in the scheduling data ownership: no longer
a scrub-job object shared between the Scrubber and the OSD
scrub queue. Instead - the scrub queue holds a copied snapshot
of the scheduling data.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
... and not a shared pointer to it.
The PG's entry in the scrub queue and the scrub-job object that
is part of the scrubber are now separate entities, that can be
modified separately - each with its own locking mechanism.

the interaction between OsdScrub (the OSD's scrub scheduler
object) and the scrub queue during scrub initiation is simplified:
instead of fetching from the queue a list of scrub targets
possibly ready to be scrubbed, the OsdScrub now picks only one
candidate - the top of the queue.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
following the change to the queue into holding a copy of the
ScrubJob, the registration process - initiated by
schedule_scrub_with_osd() - can now be simplified.

adjust_target_time() is relocated as is. It will be refactored
in the next commit.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
... to use populate_config_params() output

Also: fix determine_scrub_time() to use the same already-available
data. determine_scrub_time() renamed to determine_initial_schedule(),
to better reflect its purpose.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
moving the scrubbed sjob copy thru the scrubber - from scrub
session initiation to its termination (or abort - where we use the
handed "old version" of the sjob to update the new one).

Note that in this version - not all the information that was used to
determine the specifics of the initiated scrub is passed to the
scrubber and back. In this half-baked stage of the refactoring,
the resulting implementation handling of corner cases, still using
the "planned scrub" flags, is less than optimal.

The next step (dual targets, replacing the 'planned scrub' flags with
specific attributes in the scheduling target) fixes this.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
...and on_scrub_schedule_input_change()
by accepting a parameter to control whether the scheduled time for
periodic scrubs that are already due should be updated.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
or after an aborted scrub.

To note: one of the important changes in this commit:
merging the functionality of adjust_target_time() &
update_schedule() into a single function - adjust_schedule().

Regarding the handling of aborts:

Most of the time - all that is required following a scrub abort is to
requeue the scrub job - the one that triggered the aborted scrub -
with just a delay added to its n.b..

But we must take into account scenarios where "something" caused the
parameters prepared for the *next* scrub to show higher urgency or
priority. "Something" - as in an operator command requiring immediate
scrubbing, or a change in the pool/cluster configuration.
In such cases - the current requested flags and the parameters of
the aborted scrub must be merged.

Note that the current implementation is a temporary solution, to be
replaced by a per-level updating of the relevant target.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
following changes in scrub code

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Removing the use of C++ ranges, as
Clang versions below 16 are buggy when libstd++ ranges are used.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
removing this unit-test, as it must be rewritten to match the
changes in the implementation of the scrub scheduling mechanism,
and that implementation is still in flux.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Previous commits handled the following two cases correctly:
- requeueing a scrub job while the OSD is still the primary, and
- not restoring the scrub job to the queue if the PG is not there;
Here we handle the missed scenario: the PG is there (we were able to
lock it), but is no longer the primary.

Also - a configuration change must not cause a re-queue of a
scrub-job for a PG that is in the middle of scrubbing.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
@ronen-fr ronen-fr force-pushed the wip-rf-targets-j13 branch from 0479028 to 699dd28 Compare July 16, 2024 14:20
@ronen-fr
Copy link
Contributor Author

jenkins retest this please

@ronen-fr
Copy link
Contributor Author

jenkins test make check

@ronen-fr
Copy link
Contributor Author

jenkins test submodules

@ronen-fr
Copy link
Contributor Author

Merging based on my Teuthology tests

@ronen-fr ronen-fr merged commit 832ba50 into ceph:main Jul 18, 2024
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Aug 1, 2024
osd/scrub: no shared scrub-job ownership between PGs and the scrub queue

Reviewed-by: Samuel Just <sjust@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants