New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erasure code recovery should send additional reads if necessary #17920
Conversation
retest this please |
Testing results 2 unrelated failures out of 255 rados suite jobs: |
retest this please |
sorry, i missed this PR. will review it early tomorrow. |
@@ -281,10 +289,10 @@ function TEST_rados_get_subread_eio_shard_0() { | |||
|
|||
function TEST_rados_get_subread_eio_shard_1() { | |||
local dir=$1 | |||
setup_osds || return 1 | |||
setup_osds 4 || return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 OSDs would suffice for an erasure pool of "2+1" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking 1 OSD down/out which can begin recovery to the 4th OSD. Not sure this is critical to this test, but I'd rather be using a more realistic scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see.
local -a initial_osds=($(get_osds $poolname $objname)) | ||
local last=$((${#initial_osds[@]} - 1)) | ||
# Kill OSD | ||
kill_daemons $dir TERM osd.${initial_osds[$last]} >&2 < /dev/null || return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, if we want to get the last element in a bash array,
local last_osd=${initial_osds[-1]}
would suffice. i check bash 4.3.11 shipped with ubuntu trusty. and it works also.
ceph osd out ${initial_osds[$last]} || return 1 | ||
|
||
# Cluster should recovery this object | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, remove empty line. s/recovery/recover/
src/osd/ECBackend.cc
Outdated
@@ -1204,6 +1203,7 @@ void ECBackend::handle_sub_read_reply( | |||
set<int> want_to_read, dummy_minimum; | |||
get_want_to_read_shards(&want_to_read); | |||
int err; | |||
// TODO: Should we include non-acting nodes here when for_recovery is set? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think we should.
src/osd/ECBackend.cc
Outdated
@@ -1485,19 +1484,12 @@ void ECBackend::call_write_ordered(std::function<void(void)> &&cb) { | |||
} | |||
} | |||
|
|||
int ECBackend::get_min_avail_to_read_shards( | |||
void ECBackend::get_shards_and_have( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to rename it to get_all_avail_shards()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, once the fixup commits are folded.
Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
For now it doesn't include non-acting OSDs Added test for this case Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
Fixes: http://tracker.ceph.com/issues/21382