include write error codes in the pg log #10170

jdurgin · 2016-07-07T02:31:12Z

This depends on #9489 - if you'd like to see more granular history check out the wip-pg-log-errors-10 branch.

There is at least one issue left to fix - proper cleanup of the repop on shutdown, since the LogUpdateCtx has a ref to it that isn't accounted for yet. Valgrind catches this.

athanatos · 2016-07-07T16:48:35Z

src/osd/PG.cc

-  rollbacker.apply(this, &t);
-  info.last_update = pg_log.get_head();
+  if (handle_missing) {
+    PGLogEntryHandler rollbacker;


It seems like it would be easier to just update append_new_log_entries to do the right thing. Also, append_new_log_entries could indicate whether stats should be invalidated based on the log entries. That would avoid needing to update the message.

The only caller was removed in e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

Added but never used in e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

The one place this was set was removed by e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This will store write error codes for use in dup op detection. A few places use checks assuming is_update() or is_delete() are opposites - fix those to ignore or consider errors, as appropriate. Refs: http://tracker.ceph.com/issues/14468 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

Errors should only be used for dup detection. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

Dup detection only needs them indexed by version, and keeping them out of the object index prevents error entries from contributing to the missing set during recovery. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This is required to prevent re-ordering of guarded writes or deletes in the presence of network failures and resends. Use the existing submit_log_entries() method to initiate a repop that only updates the pg log. Keep the write error semantics close to the existing implementation - if we have a buffer, return it, but do not persist the buffer for now. Refs: http://tracker.ceph.com/issues/14468 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This is only needed for the lost/unfound use of submit_log_entries() etc. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This prevents reordering guarded writes or deletes. Without this, the following sequence: delete foo -> -ENOENT write foo -> success (client connection fails) resend delete foo -> success, object deleted resend write foo -> success - dup op, so no write performed results in the object not existing, instead of containing data. After this change, both delete and write are detected as dups and the original ordering is preserved. Fixes: http://tracker.ceph.com/issues/14468 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This is needed to turn on persisting write errors, since older OSDs won't be able to handle them. Other features for kraken could potentially use this as well. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

…rors in the pg log Older OSDs can't handle the error entries. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This way individual tests or testcases can change settings Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This only works reliably with the objecter_retry_writes_after_first_reply setting, so make it part of the test setup. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

This prevents leaking repops that are referenced by LogUpdateCtx for updates that were in flight. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

jdurgin · 2016-07-13T00:18:52Z

this is ready to go now, fixed the mem leak on shutdown and the latest rados run has the same failures as master

jdurgin added bug-fix core labels Jul 7, 2016

jdurgin assigned athanatos Jul 7, 2016

athanatos reviewed Jul 7, 2016
View reviewed changes

jdurgin added 15 commits July 8, 2016 18:33

ReplicatedPG: remove unused mark_object_lost()

2609dd9

The only caller was removed in e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

ReplicatedPG: removed unused OnComplete struct

62864be

Added but never used in e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

ReplicatedPG: removed unused obc check

b675726

The one place this was set was removed by e7edf20 Signed-off-by: Josh Durgin <jdurgin@redhat.com>

PGLog: ignore error entries when constructing the missing set

ea66e5c

Errors should only be used for dup detection. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

PGLog: skip indexing errors by object

41861fa

Dup detection only needs them indexed by version, and keeping them out of the object index prevents error entries from contributing to the missing set during recovery. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

ReplicatedPG: skip stat invalidation when recording write errors

ed33ffc

This is only needed for the lost/unfound use of submit_log_entries() etc. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

OSDMonitor: add kraken feature bit

112eab0

This is needed to turn on persisting write errors, since older OSDs won't be able to handle them. Other features for kraken could potentially use this as well. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

ReplicatedPG: require kraken feature bit on osdmap to record write er…

fa8aff9

…rors in the pg log Older OSDs can't handle the error entries. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

Objecter: add option for testing osd dup handling

6faa449

Signed-off-by: Josh Durgin <jdurgin@redhat.com>

test/librados: add a way to pass ceph config options

56f7115

This way individual tests or testcases can change settings Signed-off-by: Josh Durgin <jdurgin@redhat.com>

test/librados: add test that requires correct dup error detection

fbf3b79

This only works reliably with the objecter_retry_writes_after_first_reply setting, so make it part of the test setup. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

ReplicatedPG: clear log update waiters during shutdown

becdbe2

This prevents leaking repops that are referenced by LogUpdateCtx for updates that were in flight. Signed-off-by: Josh Durgin <jdurgin@redhat.com>

jdurgin force-pushed the wip-pg-log-errors-11 branch from 7d1aa7d to becdbe2 Compare July 9, 2016 01:36

jdurgin changed the title ~~[DNM] include write error codes in the pg log~~ include write error codes in the pg log Jul 13, 2016

athanatos merged commit b4144fb into master Jul 13, 2016

athanatos deleted the wip-pg-log-errors-11 branch July 13, 2016 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include write error codes in the pg log #10170

include write error codes in the pg log #10170

jdurgin commented Jul 7, 2016

athanatos Jul 7, 2016

jdurgin commented Jul 13, 2016

include write error codes in the pg log #10170

include write error codes in the pg log #10170

Conversation

jdurgin commented Jul 7, 2016

athanatos Jul 7, 2016

Choose a reason for hiding this comment

jdurgin commented Jul 13, 2016