src/osd/PrimaryLogPG.cc: add check for norecover flag in maybe_kick_r…#54212
src/osd/PrimaryLogPG.cc: add check for norecover flag in maybe_kick_r…#54212
Conversation
…ecovery Adds a check to see if norecover flag is set in maybe_kick_recovery(). maybe_kick_recovery() starts recovery for a particular object if there is an attempt to read a missing object or write to a degraded object. However, this particular work-flow currently does not check the norecover flag before starting recovery for an object. Fixes: bz#2134786 Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
| return; | ||
|
|
||
| if (osd->recovery_is_paused()) { | ||
| dout(10) << "In maybe_kick_recovery: NORECOVER flag is set" << dendl; |
There was a problem hiding this comment.
How frequent is the call to maybe_kick_recovery()? will this spam the logs?
|
Hmm, maybe_kick_recovery() is used mainly in the case where an IO has arrived and is blocked on a degraded or missing object. It's really not clear to me that norecover should apply in that situation. The goal of norecover is typically to mitigate short term thoughput/latency issues with client IO. With this change, client IOs on those objects would hang indefinitely, which seems worse. |
@athanatos I have added some more details in this tracker: https://tracker.ceph.com/issues/63334 in addition to the BZ. Basically, the issue was that recovery was happening while norecover flag was set and client IO was going on. From the logs it was evident that there were degraded objects because the autoscaler had kicked in and PG splitting was taking place which was causing maybe_kick_recovery() to schedule recovery. |
|
It really doesn't make sense to increase the number of pgs with norecover or nobackfill set. Each newly split PG will need to be backfilled to a new location. It will also trigger peering and therefore a few degraded objects that were being written to when the map change happened.
|
Thanks for these points! In the case this issue was raised, the auto-scaler was kicking in and increasing the number of PGs. Does it make sense to mention that as well? |
|
@amathuria Ah, nice observation. The autoscaler should not run if norecover or nobackfill are set. If it does, that's the most important thing to fix. |
|
@athanatos thanks for clearing that up. I will close this PR and open a new one for the autoscaler change and probably another one for all the potential documentation changes. |
…ecovery
Adds a check to see if norecover flag is set in maybe_kick_recovery(). maybe_kick_recovery() starts recovery for a particular object if there is an attempt to read a missing object or write to a degraded object. However, this particular work-flow currently does not check the norecover flag before starting recovery for an object.
Fixes: bz#2134786
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows