src/osd/PrimaryLogPG.cc: add check for norecover flag in maybe_kick_r… by amathuria · Pull Request #54212 · ceph/ceph

amathuria · 2023-10-26T13:15:40Z

…ecovery

Adds a check to see if norecover flag is set in maybe_kick_recovery(). maybe_kick_recovery() starts recovery for a particular object if there is an attempt to read a missing object or write to a degraded object. However, this particular work-flow currently does not check the norecover flag before starting recovery for an object.

Fixes: bz#2134786

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

…ecovery Adds a check to see if norecover flag is set in maybe_kick_recovery(). maybe_kick_recovery() starts recovery for a particular object if there is an attempt to read a missing object or write to a degraded object. However, this particular work-flow currently does not check the norecover flag before starting recovery for an object. Fixes: bz#2134786 Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>

ronen-fr · 2023-10-26T13:44:54Z

src/osd/PrimaryLogPG.cc

    return;

+  if (osd->recovery_is_paused()) {
+    dout(10) << "In maybe_kick_recovery: NORECOVER flag is set" << dendl;


How frequent is the call to maybe_kick_recovery()? will this spam the logs?

athanatos · 2023-10-26T15:53:56Z

Hmm, maybe_kick_recovery() is used mainly in the case where an IO has arrived and is blocked on a degraded or missing object. It's really not clear to me that norecover should apply in that situation. The goal of norecover is typically to mitigate short term thoughput/latency issues with client IO. With this change, client IOs on those objects would hang indefinitely, which seems worse.

amathuria · 2023-10-27T11:06:44Z

Hmm, maybe_kick_recovery() is used mainly in the case where an IO has arrived and is blocked on a degraded or missing object. It's really not clear to me that norecover should apply in that situation. The goal of norecover is typically to mitigate short term thoughput/latency issues with client IO. With this change, client IOs on those objects would hang indefinitely, which seems worse.

@athanatos I have added some more details in this tracker: https://tracker.ceph.com/issues/63334 in addition to the BZ. Basically, the issue was that recovery was happening while norecover flag was set and client IO was going on. From the logs it was evident that there were degraded objects because the autoscaler had kicked in and PG splitting was taking place which was causing maybe_kick_recovery() to schedule recovery.
I think what you're saying makes sense, so should we let things be the way they are and maybe document this? Or do you have another suggestion on how this can be addressed?

athanatos · 2023-10-27T22:42:44Z

It really doesn't make sense to increase the number of pgs with norecover or nobackfill set. Each newly split PG will need to be backfilled to a new location. It will also trigger peering and therefore a few degraded objects that were being written to when the map change happened.

If there is documentation that recommends setting nobackfill or norecover while increasing pg_num, remove it. We should probably prevent pg_num from being changed while nobackfill or norecover are set.
The documentation for increasing pg_num should explicitly warn that it will trigger backfill/recovery IO proportional to the number of affected PGs. To limit impact on client IO, pg_num should be changed in very small increments waiting until cluster is active+clean each time.
We should document that IOs on degraded/missing objects will still trigger IO even with norecover set -- the alternative would be for IO to hang completely.

amathuria · 2023-10-30T14:43:35Z

It really doesn't make sense to increase the number of pgs with norecover or nobackfill set. Each newly split PG will need to be backfilled to a new location. It will also trigger peering and therefore a few degraded objects that were being written to when the map change happened.

1. If there is documentation that recommends setting nobackfill or norecover while increasing pg_num, remove it.  We should probably prevent pg_num from being changed while nobackfill or norecover are set.

2. The documentation for increasing pg_num should explicitly warn that it will trigger backfill/recovery IO proportional to the number of affected PGs.  To limit impact on client IO, pg_num should be changed in _very small_ increments waiting until cluster is active+clean each time.

3. We should document that IOs on degraded/missing objects will still trigger IO even with norecover set -- the alternative would be for IO to hang completely.

Thanks for these points! In the case this issue was raised, the auto-scaler was kicking in and increasing the number of PGs. Does it make sense to mention that as well?

athanatos · 2023-10-30T16:33:23Z

@amathuria Ah, nice observation. The autoscaler should not run if norecover or nobackfill are set. If it does, that's the most important thing to fix.

amathuria · 2023-11-03T13:27:00Z

@athanatos thanks for clearing that up. I will close this PR and open a new one for the autoscaler change and probably another one for all the potential documentation changes.

amathuria requested a review from a team as a code owner October 26, 2023 13:15

github-actions bot added the core label Oct 26, 2023

ronen-fr reviewed Oct 26, 2023

View reviewed changes

amathuria closed this Nov 3, 2023

amathuria mentioned this pull request Nov 29, 2023

mgr/pg_autoscaler: add check for norecover flag #54708

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/osd/PrimaryLogPG.cc: add check for norecover flag in maybe_kick_r…#54212

src/osd/PrimaryLogPG.cc: add check for norecover flag in maybe_kick_r…#54212
amathuria wants to merge 1 commit intoceph:mainfrom
amathuria:wip-amathuria-bz-2134786-fix-norecover

amathuria commented Oct 26, 2023 •

edited

Loading

Uh oh!

ronen-fr Oct 26, 2023

Uh oh!

athanatos commented Oct 26, 2023

Uh oh!

amathuria commented Oct 27, 2023

Uh oh!

athanatos commented Oct 27, 2023

Uh oh!

amathuria commented Oct 30, 2023

Uh oh!

athanatos commented Oct 30, 2023 •

edited

Loading

Uh oh!

amathuria commented Nov 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amathuria commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

ronen-fr Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

athanatos commented Oct 26, 2023

Uh oh!

amathuria commented Oct 27, 2023

Uh oh!

athanatos commented Oct 27, 2023

Uh oh!

amathuria commented Oct 30, 2023

Uh oh!

athanatos commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amathuria commented Nov 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amathuria commented Oct 26, 2023 •

edited

Loading

athanatos commented Oct 30, 2023 •

edited

Loading