src/osd/PG.cc: remove redundant call to trim_log() #23354

neha-ojha · 2018-07-31T18:14:53Z

This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log.

During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.

Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.

Fixes: https://tracker.ceph.com/issues/25198
Signed-off-by: Neha Ojha nojha@redhat.com

neha-ojha · 2018-07-31T18:16:49Z

@xiexingguo The failure seen in http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log, should be fixed with this PR.

neha-ojha · 2018-07-31T18:17:20Z

Rados run: http://pulpito.ceph.com/nojha-2018-07-31_04:04:03-rados-wip-remove-trimlog-distro-basic-smithi/

liewegas · 2018-07-31T18:23:09Z

The pg log limits are being backported all the way to luminous?

neha-ojha · 2018-07-31T18:24:18Z

@liewegas yes.

neha-ojha · 2018-07-31T18:25:18Z

@liewegas the backport trackers have been opened, but the backports are yet to be done.

jdurgin

Worth noting that the race is due to MOSDPGTrim going through the strict queue as a peering message vs regular ops going through the non-strict queue.

This change is motived by the failure tracked in https://tracker.ceph.com/issues/25198. The failure highlights a case, when a call to trim_log() after the PG has recovered, races with the previous op, on a replica OSD. Since the previous operation has not completed, the last_complete value for that OSD is not valid, when we try to trim the log. It is also worth noting that the race is due to MOSDPGTrim going through the strict queue as a peering message vs regular ops going through the non-strict queue. During the investigation of this bug, we noticed that, with https://tracker.ceph.com/issues/23979, we allow pg log trimming to happen on the primary and replicas, whenever we cross the upper bound of the pg log. This also ensures that pg log trimming happens while processing any new op. Therefore, the function trim_log(), which earlier served the purpose of trimming logs on the primary and replicas, just before the PG went into the Recovered state, is no more required. This acted like a last line of defense to trim logs, when we did not need the logs any more. But, this call seems redundant now, because, we are limiting the pg log length at all times. Signed-off-by: Neha Ojha <nojha@redhat.com>

neha-ojha · 2018-07-31T18:46:11Z

@jdurgin added it to the commit message.

neha-ojha · 2018-07-31T19:25:14Z

retest this please

neha-ojha · 2018-07-31T19:58:33Z

@liewegas one round of testing referenced here #23354 (comment)

liewegas · 2018-07-31T21:23:55Z

Oh yeah. Should be good to merge then!

* refs/pull/23354/head: src/osd/PG.cc: remove redundant call to trim_log() Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>

neha-ojha added the core label Jul 31, 2018

liewegas added the bug-fix label Jul 31, 2018

neha-ojha requested review from jdurgin, liewegas and xiexingguo July 31, 2018 18:23

jdurgin approved these changes Jul 31, 2018

View reviewed changes

neha-ojha force-pushed the wip-remove-trimlog branch from fcd0e48 to 283b0bd Compare July 31, 2018 18:45

liewegas added wip-sage-testing needs-qa labels Jul 31, 2018

liewegas approved these changes Jul 31, 2018

View reviewed changes

liewegas merged commit 283b0bd into ceph:master Jul 31, 2018

liewegas added a commit that referenced this pull request Jul 31, 2018

Merge PR #23354 into master

0aba0f4

* refs/pull/23354/head: src/osd/PG.cc: remove redundant call to trim_log() Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/osd/PG.cc: remove redundant call to trim_log() #23354

src/osd/PG.cc: remove redundant call to trim_log() #23354

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018 •

edited

neha-ojha commented Jul 31, 2018

liewegas commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

jdurgin left a comment

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

liewegas commented Jul 31, 2018

src/osd/PG.cc: remove redundant call to trim_log() #23354

src/osd/PG.cc: remove redundant call to trim_log() #23354

Conversation

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018 • edited

neha-ojha commented Jul 31, 2018

liewegas commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

jdurgin left a comment

Choose a reason for hiding this comment

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

neha-ojha commented Jul 31, 2018

liewegas commented Jul 31, 2018

neha-ojha commented Jul 31, 2018 •

edited