Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include write error codes in the pg log #10170

Merged
merged 15 commits into from Jul 13, 2016
Merged

include write error codes in the pg log #10170

merged 15 commits into from Jul 13, 2016

Conversation

jdurgin
Copy link
Member

@jdurgin jdurgin commented Jul 7, 2016

This depends on #9489 - if you'd like to see more granular history check out the wip-pg-log-errors-10 branch.

There is at least one issue left to fix - proper cleanup of the repop on shutdown, since the LogUpdateCtx has a ref to it that isn't accounted for yet. Valgrind catches this.

rollbacker.apply(this, &t);
info.last_update = pg_log.get_head();
if (handle_missing) {
PGLogEntryHandler rollbacker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be easier to just update append_new_log_entries to do the right thing. Also, append_new_log_entries could indicate whether stats should be invalidated based on the log entries. That would avoid needing to update the message.

jdurgin added 15 commits July 8, 2016 18:33
The only caller was removed in e7edf20

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Added but never used in e7edf20

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
The one place this was set was removed by e7edf20

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This will store write error codes for use in dup op detection.
A few places use checks assuming is_update() or is_delete() are
opposites - fix those to ignore or consider errors, as appropriate.

Refs: http://tracker.ceph.com/issues/14468
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Errors should only be used for dup detection.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Dup detection only needs them indexed by version, and keeping them out
of the object index prevents error entries from contributing to the
missing set during recovery.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This is required to prevent re-ordering of guarded writes or deletes
in the presence of network failures and resends.

Use the existing submit_log_entries() method to initiate a repop that
only updates the pg log.

Keep the write error semantics close to the existing implementation -
if we have a buffer, return it, but do not persist the buffer for now.

Refs: http://tracker.ceph.com/issues/14468
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This is only needed for the lost/unfound use of submit_log_entries() etc.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This prevents reordering guarded writes or deletes.

Without this, the following sequence:

delete foo -> -ENOENT
write foo -> success
(client connection fails)
resend delete foo -> success, object deleted
resend write foo -> success - dup op, so no write performed

results in the object not existing, instead of containing data. After
this change, both delete and write are detected as dups and the
original ordering is preserved.

Fixes: http://tracker.ceph.com/issues/14468
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This is needed to turn on persisting write errors, since older OSDs
won't be able to handle them.

Other features for kraken could potentially use this as well.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
…rors in the pg log

Older OSDs can't handle the error entries.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This way individual tests or testcases can change settings

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This only works reliably with the
objecter_retry_writes_after_first_reply setting, so make it part of
the test setup.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This prevents leaking repops that are referenced by LogUpdateCtx for
updates that were in flight.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
@jdurgin jdurgin changed the title [DNM] include write error codes in the pg log include write error codes in the pg log Jul 13, 2016
@jdurgin
Copy link
Member Author

jdurgin commented Jul 13, 2016

this is ready to go now, fixed the mem leak on shutdown and the latest rados run has the same failures as master

@athanatos athanatos merged commit b4144fb into master Jul 13, 2016
@athanatos athanatos deleted the wip-pg-log-errors-11 branch July 13, 2016 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants