Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/osd/PG.cc: remove redundant call to trim_log() #23354

Merged
merged 1 commit into from Jul 31, 2018

Conversation

neha-ojha
Copy link
Member

This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log.

During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.

Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.

Fixes: https://tracker.ceph.com/issues/25198
Signed-off-by: Neha Ojha nojha@redhat.com

@neha-ojha neha-ojha added the core label Jul 31, 2018
@neha-ojha
Copy link
Member Author

neha-ojha commented Jul 31, 2018

@neha-ojha
Copy link
Member Author

@liewegas
Copy link
Member

The pg log limits are being backported all the way to luminous?

@neha-ojha
Copy link
Member Author

@liewegas yes.

@neha-ojha
Copy link
Member Author

@liewegas the backport trackers have been opened, but the backports are yet to be done.

Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that the race is due to MOSDPGTrim going through the strict queue as a peering message vs regular ops going through the non-strict queue.

This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log. It is also worth noting that the race is due to MOSDPGTrim going through
the strict queue as a peering message vs regular ops going through the
non-strict queue.

During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.

Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.

Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha
Copy link
Member Author

@jdurgin added it to the commit message.

@neha-ojha
Copy link
Member Author

retest this please

@neha-ojha
Copy link
Member Author

@liewegas one round of testing referenced here #23354 (comment)

@liewegas
Copy link
Member

Oh yeah. Should be good to merge then!

@liewegas liewegas merged commit 283b0bd into ceph:master Jul 31, 2018
liewegas added a commit that referenced this pull request Jul 31, 2018
* refs/pull/23354/head:
	src/osd/PG.cc: remove redundant call to trim_log()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants