Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erasure code recovery should send additional reads if necessary #17920

Merged
merged 4 commits into from Sep 29, 2017

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Sep 22, 2017

@dzafman
Copy link
Contributor Author

dzafman commented Sep 22, 2017

retest this please

@dzafman
Copy link
Contributor Author

dzafman commented Sep 23, 2017

Testing results
http://pulpito.ceph.com/dzafman-2017-09-22_10:23:26-rados-wip-21382-distro-basic-smithi/

2 unrelated failures out of 255 rados suite jobs:
1658401 saw valgrind issues
1658403 "2017-09-22 17:55:28.482691 mon.a mon.1 172.21.15.197:6789/0 101 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log

@dzafman
Copy link
Contributor Author

dzafman commented Sep 25, 2017

retest this please

@tchaikov
Copy link
Contributor

sorry, i missed this PR. will review it early tomorrow.

@tchaikov tchaikov self-assigned this Sep 25, 2017
@@ -281,10 +289,10 @@ function TEST_rados_get_subread_eio_shard_0() {

function TEST_rados_get_subread_eio_shard_1() {
local dir=$1
setup_osds || return 1
setup_osds 4 || return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 OSDs would suffice for an erasure pool of "2+1" .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking 1 OSD down/out which can begin recovery to the 4th OSD. Not sure this is critical to this test, but I'd rather be using a more realistic scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see.

local -a initial_osds=($(get_osds $poolname $objname))
local last=$((${#initial_osds[@]} - 1))
# Kill OSD
kill_daemons $dir TERM osd.${initial_osds[$last]} >&2 < /dev/null || return 1
Copy link
Contributor

@tchaikov tchaikov Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, if we want to get the last element in a bash array,

local last_osd=${initial_osds[-1]}

would suffice. i check bash 4.3.11 shipped with ubuntu trusty. and it works also.

ceph osd out ${initial_osds[$last]} || return 1

# Cluster should recovery this object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, remove empty line. s/recovery/recover/

@@ -1204,6 +1203,7 @@ void ECBackend::handle_sub_read_reply(
set<int> want_to_read, dummy_minimum;
get_want_to_read_shards(&want_to_read);
int err;
// TODO: Should we include non-acting nodes here when for_recovery is set?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think we should.

@@ -1485,19 +1484,12 @@ void ECBackend::call_write_ordered(std::function<void(void)> &&cb) {
}
}

int ECBackend::get_min_avail_to_read_shards(
void ECBackend::get_shards_and_have(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to rename it to get_all_avail_shards()?

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, once the fixup commits are folded.

Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
For now it doesn't include non-acting OSDs
Added test for this case

Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants