Handle EC recovery read errors #9304

dzafman · 2016-05-24T23:22:32Z

I end up with active+degraded pg which doesn't repair if I remove an object on all nodes in a way where it gets recognized as missing. With this fix the OSD doesn't crash. Need to figure out how to get recovery to finish. I assume I can have an error which will have to be discovered later by scrubbing.

tchaikov · 2016-05-25T14:41:04Z

@dzafman is this supposed to be a (yet completed) fix for http://tracker.ceph.com/issues/13937?

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman · 2016-05-25T16:51:26Z

@tchaikov I updated the proper commit to show that this pull request does indeed try to address 13937.

dzafman · 2016-05-25T19:14:20Z

@athanatos Any suggestions on how to move forward is appreciated.

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman · 2016-05-26T00:25:27Z

@athanatos I updated to make failed_push() common. I did the following:

created scenario
pg state active+degraded
./rados -p ecpool rm benchmark_data_TrustyTahr_30314_object286 (lost object)
^C
ceph pg #.# scrub (waiting for clean)
DIDN'T START
rados -p ecpool cleanup (hung or was very slow)
^C
ceph pg #.# mark_unfound_lost delete
LOG: pg has 1 objects unfound and apparently lost marking
PG state active+clean+scrubbing

There are 9 removes "waiting for scrub" some removes completed even on the pg going through this
The remove of the lost object itself is "waiting for missing object" but it should have been cleaned up by "mark_unfound_lost delete"

athanatos · 2016-05-26T15:14:24Z

@dzafman I'm not sure what you mean by the last part. Do you mean that the requests stayed hung?

dzafman · 2016-05-26T15:40:11Z

@athanatos Yes, they are coming out of the dump_ops_in_flight

athanatos · 2016-05-26T17:51:26Z

@dzafman Sounds like a bug in master to me, probably worth tracking down.

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

Fixes: #9304 Signed-off-by: David Zafman <dzafman@redhat.com>

athanatos · 2016-07-22T02:41:20Z

src/osd/ECBackend.cc

@@ -500,6 +500,8 @@ void ECBackend::continue_recovery_op(
 	assert(!op.recovery_progress.first);
 	dout(10) << __func__ << ": canceling recovery op for obj " << op.hoid
 		 << dendl;
+	// XXX: Doing this on the read error handling just loops in recovery
+	// How would this case ever actually happen?


You can't just use cancel_pull() in the read error handling because you know something the parent doesn't: that some of the sources are faulty. Here, I think we are failing because a source went down (not in the up or acting set, or it would have triggered peering to restart) and we no longer have enough copies. I think it's ok to just cancel the recovery op here because the parent would already have updated its missing_loc etc when it got the log update. Either the object later becomes recoverable again and recovery will retry or not in which case it will remain unfound.

You added this code in d9106ce to fix tracker 8161. I tried to use an equivalent version in OnRecoveryReadComplete::finish() only to discovery that it didn't work. I'll have to look into whether I can do something like my new _failed_push() to handle this case too.

Yep, that's what I was talking about. Like I said, here, it's ok because ReplicatedPG/PG already knows that the source died and won't retry until the object is found again. You need to call something which updates missing_loc (which I think you do earlier in this series with _failed_push).

Is this code causing a bug?

I get it now. I didn't see this code fail in case. I'll remove the comment.

K. It probably wouldn't hurt to add an explanatory comment if you want to. I had to spend 20 minutes trying to figure out why it didn't get handled by check_recovery_sources...

tchaikov · 2016-07-22T14:42:50Z

reviewed.

tchaikov · 2016-10-24T14:56:58Z

src/osd/ReplicatedPG.cc

 {
  assert(recovering.count(soid));
  recovering.erase(soid);
-  missing_loc.remove_location(soid, from);
+  for (list<pg_shard_t>::const_iterator i = from.begin(); i != from.end() ; ++i)


could use range-based loop.

tchaikov · 2016-10-24T14:57:04Z

src/osd/ECBackend.cc

+  recovery_ops.erase(hoid);
+
+  list<pg_shard_t> fl;
+  for (map<pg_shard_t, int>::iterator i = res.errors.begin();


could use range-based loop.

athanatos · 2016-10-25T16:42:00Z

I think this looks ok to me

This reverts commit 5bc5533. Conflicts: src/test/Makefile.am (no longer exists) src/test/erasure-code/Makefile.am (no longer exists) Signed-off-by: David Zafman <dzafman@redhat.com>

Remove unused struct Signed-off-by: David Zafman <dzafman@redhat.com>

Signed-off-by: David Zafman <dzafman@redhat.com>

Fixes: http://tracker.ceph.com/issues/13937 Signed-off-by: David Zafman <dzafman@redhat.com>

Signed-off-by: David Zafman <dzafman@redhat.com>

Caused by: 70e000a Signed-off-by: David Zafman <dzafman@redhat.com>

Low space broke test, saw "flags nearfull,pauserd,pausewr...." Signed-off-by: David Zafman <dzafman@redhat.com>

Fix for broken test-erasure-code.sh and test-erasure-eio.sh Signed-off-by: David Zafman <dzafman@redhat.com>

ghost · 2016-10-28T08:52:29Z

Reviewed-by: Loic Dachary <ldachary@redhat.com>

tchaikov · 2016-10-28T09:17:44Z

lgtm also.

tchaikov · 2016-10-28T09:22:05Z

@dzafman shall we backport this to jewel?

smithfarm · 2016-11-20T14:07:35Z

@dzafman Ping re: jewel backport. Users are asking about it.

dzafman force-pushed the wip-13937 branch from 1e70548 to de97801 Compare May 25, 2016 16:20

dzafman added a commit to dzafman/ceph that referenced this pull request May 25, 2016

osd: Handle recovery read errors

de97801

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman force-pushed the wip-13937 branch from de97801 to e069e05 Compare May 26, 2016 00:11

dzafman added a commit to dzafman/ceph that referenced this pull request May 26, 2016

osd: Handle recovery read errors

e069e05

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

tchaikov self-assigned this Jun 3, 2016

dzafman force-pushed the wip-13937 branch from e069e05 to c900ed9 Compare June 3, 2016 14:54

dzafman added a commit to dzafman/ceph that referenced this pull request Jun 3, 2016

osd: Handle recovery read errors

02172db

Fixes: ceph#9304 Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman force-pushed the wip-13937 branch from c900ed9 to 6716d5a Compare June 3, 2016 17:38

dzafman added bug-fix core wip-zafman-testing labels Jun 3, 2016

dzafman added a commit that referenced this pull request Jun 9, 2016

osd: Handle recovery read errors

bc4dc66

Fixes: #9304 Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman force-pushed the wip-13937 branch 2 times, most recently from 3ae9b08 to 48e034b Compare June 9, 2016 20:05

dzafman mentioned this pull request Jun 15, 2016

DNM: test: Handle object removals in a non-racey way #9707

Closed

dzafman force-pushed the wip-13937 branch from 61a5ae5 to a3456a9 Compare July 21, 2016 00:00

athanatos reviewed Jul 22, 2016
View reviewed changes

dzafman force-pushed the wip-13937 branch 2 times, most recently from 9b0191f to 82cdb93 Compare October 12, 2016 23:33

tchaikov reviewed Oct 24, 2016

View reviewed changes

dzafman force-pushed the wip-13937 branch from 82cdb93 to 95c622b Compare October 27, 2016 23:19

dzafman mentioned this pull request Oct 27, 2016

DNM: test: Remove check for "load: jerasure.*lrc" #11501

Closed

dzafman added 9 commits October 27, 2016 22:40

Revert "test: Disable tests due to recovery race"

1fceb34

This reverts commit 5bc5533. Conflicts: src/test/Makefile.am (no longer exists) src/test/erasure-code/Makefile.am (no longer exists) Signed-off-by: David Zafman <dzafman@redhat.com>

osd: CLEANUP: Remove unused pending_read member

36fd68c

Remove unused struct Signed-off-by: David Zafman <dzafman@redhat.com>

osd: Fix log messages

b40ec3f

Signed-off-by: David Zafman <dzafman@redhat.com>

osd: Handle recovery read errors

c51d70e

Fixes: http://tracker.ceph.com/issues/13937 Signed-off-by: David Zafman <dzafman@redhat.com>

osd: Fix hang on unfound object after mark_unfound_lost is done

73a2753

Signed-off-by: David Zafman <dzafman@redhat.com>

test: Handle object removals in a non-racey way

bfe3ebd

Signed-off-by: David Zafman <dzafman@redhat.com>

test: Remove extra objectstore_tool call which causes a recovery

6904529

Caused by: 70e000a Signed-off-by: David Zafman <dzafman@redhat.com>

test.sh: Make check for flags more robust

b4c080b

Low space broke test, saw "flags nearfull,pauserd,pausewr...." Signed-off-by: David Zafman <dzafman@redhat.com>

global: Always log loading erasure code for tests

c6cf05b

Fix for broken test-erasure-code.sh and test-erasure-eio.sh Signed-off-by: David Zafman <dzafman@redhat.com>

dzafman force-pushed the wip-13937 branch from 95c622b to c6cf05b Compare October 28, 2016 05:42

dzafman assigned ghost Oct 28, 2016

dzafman changed the title ~~DNM: Handle EC recovery read errors~~ Handle EC recovery read errors Oct 28, 2016

tchaikov merged commit e7291ce into ceph:master Oct 28, 2016

dzafman deleted the wip-13937 branch November 2, 2016 00:25

dzafman mentioned this pull request Nov 20, 2016

[DNM] jewel: osd/ECBackend.cc: 201: FAILED assert(res.errors.empty()) #12088

Closed

dzafman mentioned this pull request Feb 1, 2017

jewel: osd: Scrub improvements and other fixes #13146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle EC recovery read errors #9304

Handle EC recovery read errors #9304

dzafman commented May 24, 2016

tchaikov commented May 25, 2016

dzafman commented May 25, 2016

dzafman commented May 25, 2016

dzafman commented May 26, 2016

athanatos commented May 26, 2016

dzafman commented May 26, 2016

athanatos commented May 26, 2016

athanatos Jul 22, 2016 •

edited

dzafman Jul 22, 2016

athanatos Jul 22, 2016

dzafman Jul 22, 2016

athanatos Jul 22, 2016

tchaikov commented Jul 22, 2016

tchaikov Oct 24, 2016

tchaikov Oct 24, 2016

athanatos commented Oct 25, 2016

ghost commented Oct 28, 2016

tchaikov commented Oct 28, 2016

tchaikov commented Oct 28, 2016

smithfarm commented Nov 20, 2016

Handle EC recovery read errors #9304

Handle EC recovery read errors #9304

Conversation

dzafman commented May 24, 2016

tchaikov commented May 25, 2016

dzafman commented May 25, 2016

dzafman commented May 25, 2016

dzafman commented May 26, 2016

athanatos commented May 26, 2016

dzafman commented May 26, 2016

athanatos commented May 26, 2016

athanatos Jul 22, 2016 • edited

Choose a reason for hiding this comment

dzafman Jul 22, 2016

Choose a reason for hiding this comment

athanatos Jul 22, 2016

Choose a reason for hiding this comment

dzafman Jul 22, 2016

Choose a reason for hiding this comment

athanatos Jul 22, 2016

Choose a reason for hiding this comment

tchaikov commented Jul 22, 2016

tchaikov Oct 24, 2016

Choose a reason for hiding this comment

tchaikov Oct 24, 2016

Choose a reason for hiding this comment

athanatos commented Oct 25, 2016

ghost commented Oct 28, 2016

tchaikov commented Oct 28, 2016

tchaikov commented Oct 28, 2016

smithfarm commented Nov 20, 2016

athanatos Jul 22, 2016 •

edited