New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src/osd/PG.cc: remove redundant call to trim_log() #23354
Conversation
@xiexingguo The failure seen in http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log, should be fixed with this PR. |
The pg log limits are being backported all the way to luminous? |
@liewegas yes. |
@liewegas the backport trackers have been opened, but the backports are yet to be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that the race is due to MOSDPGTrim going through the strict queue as a peering message vs regular ops going through the non-strict queue.
This change is motived by the failure tracked in https://tracker.ceph.com/issues/25198. The failure highlights a case, when a call to trim_log() after the PG has recovered, races with the previous op, on a replica OSD. Since the previous operation has not completed, the last_complete value for that OSD is not valid, when we try to trim the log. It is also worth noting that the race is due to MOSDPGTrim going through the strict queue as a peering message vs regular ops going through the non-strict queue. During the investigation of this bug, we noticed that, with https://tracker.ceph.com/issues/23979, we allow pg log trimming to happen on the primary and replicas, whenever we cross the upper bound of the pg log. This also ensures that pg log trimming happens while processing any new op. Therefore, the function trim_log(), which earlier served the purpose of trimming logs on the primary and replicas, just before the PG went into the Recovered state, is no more required. This acted like a last line of defense to trim logs, when we did not need the logs any more. But, this call seems redundant now, because, we are limiting the pg log length at all times. Signed-off-by: Neha Ojha <nojha@redhat.com>
fcd0e48
to
283b0bd
Compare
@jdurgin added it to the commit message. |
retest this please |
@liewegas one round of testing referenced here #23354 (comment) |
Oh yeah. Should be good to merge then! |
* refs/pull/23354/head: src/osd/PG.cc: remove redundant call to trim_log() Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log.
During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.
Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.
Fixes: https://tracker.ceph.com/issues/25198
Signed-off-by: Neha Ojha nojha@redhat.com