New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: advance the pg log pointer during recovery even the object is no… #6266
Conversation
Hi @athanatos , does this look correct? |
if (missing.missing.empty()) { | ||
log.complete_to = log.log.end(); | ||
info.last_complete = info.last_update; | ||
} | ||
while (log.complete_to != log.log.end()) { | ||
if (missing.missing[missing.rmissing.begin()->second].need <= | ||
log.complete_to->version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change we run teh risk of (less efficiently) running through this loop to advance to the end of the log on the last recovered object. Perhaps we should leave this hunk behind and just add the second (nearly duplicate) case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yeah.. I will update the commit.
Other than other, does the analysis make sense for this problem?
…t in missing list If we mark the unfound objects as lost, there is a chance that the missing list in PG log is empty but the log pointer does not reflect that properly, which result in crash. This patch move the log pointer manipulate part non-conditional. Fixes: 13468 Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Hi @liewegas , updated according to the review comments, please help to review. Thanks! |
I think it'll cover the case. What I'm not sure is whether it is better to patch things up here or deal with it explicitly in the mark unfound code path.. it seems like it might better there. Is this the recover_got() call in recover_primary() in the case pg_log_entry_t::LOST_REVERT? Why not put it there and leave recover_got alone? |
the recover_got was called with the following stack trace: 2015-10-10 08:13:34.208791 7f20c2c5f700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_got(hobject_t, eversion_t)' thread 7f20c2c5f700 time 2015-10-10 08:13:34.184066 |
That object is removed from missing (coming from the mark found as lost commands): And here we don't check/update the last_complete pointer. The other way to fix looks like to update the pointer at pg_missing_t::add_next_event after calling rm? Thanks @liewegas for the review. |
Hi @liewegas , with the above trace, it looks like doing it from within recover_got is more appropriate for this case? Thanks. |
Yeah, I think you're right! |
@athanatos can you take a look? |
passed testing. |
I would like to merge this with a ceph-qa-suite test. |
@guangyy still need a teuthology test for this before we can merge |
@guangyy Please reach out in #sepia and #ceph-devel if you need help developing a test. |
Hi @athanatos @liewegas , sorry for the delayed response, I worked on something else over the last month so a bit delay for this one. I already have #sepia access, let me add a test case this week. |
Sounds good, let us know when the test is ready (link to the ceph-qa-suite PR). |
// raise last_complete? | ||
if (missing.missing.empty()) { | ||
log.complete_to = log.log.end(); | ||
info.last_complete = info.last_update; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unnecessary whitespace change?
Looks like #6841 is a better fix on root cause, close this one and use that for further tracking. |
…t in missing list
If we mark the unfound objects as lost, there is a chance that the missing list in PG log
is empty but the log pointer does not reflect that properly, which result in crash. This
patch move the log pointer manipulate part non-conditional.
Fixes: 13468
Signed-off-by: Guang Yang yguang@yahoo-inc.com