New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: k=2 m=2 for ec tests (not m=1) #16789

Merged
merged 3 commits into from Aug 3, 2017

Conversation

Projects
None yet
2 participants
@liewegas
Member

liewegas commented Aug 3, 2017

Setting m=1 is problematic in general because min_size
ends up being k+1 == k+m, which means we cannot tolerate
even a single OSD being down. This collides with
thrashosds.py tests that check map discontinuity, and
is likely to cause problems elsewhere as well.

Also, it's not something anyone would do in the real
world.

Fixes: http://tracker.ceph.com/issues/20844
Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas added this to the luminous milestone Aug 3, 2017

@liewegas liewegas requested a review from jdurgin Aug 3, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Aug 3, 2017

@jdurgin

It would be good to get more coverage of higher k+m in some tests too.

The main ec-profile used is the default one set by teuthology though, isn't it?

teuthology/ceph.conf.template: osd pool default erasure code profile = "plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd crush-failure-domain=osd"

That's used by the tasks like rados and radosbench that create their own pools

@liewegas

This comment has been minimized.

Member

liewegas commented Aug 3, 2017

Hmm, we could fix that one to use m=2 too. But most tests seem to use teuthologyprofile.

I can switch a few of these to keep m=1 if they aren't mixed with the thrasher setting that does the map discontinuity. It will be hard to test higher ones without adding new sets with more osds (going to 3 nodes per test).

liewegas added some commits Aug 3, 2017

qa/suites/rados/thrash-erasure-code-big: add k=4 m=2
Get better coverage for larger codes.

Signed-off-by: Sage Weil <sage@redhat.com>
qa/suites/rados/thrash-erasure-coe-big/clsuter: 12 osds on 3 nodes not 4
smithi have 4 nvme partitions available, not 3.

Signed-off-by: Sage Weil <sage@redhat.com>
qa/suites/rados/thrash-erasure-code: do not test map gap with m=1
We test EC profiles with m=1 here, and mapgap can lead to incomplete pgs
because it takes an osd down and waits for healthy.

Fixes: http://tracker.ceph.com/issues/20844
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Member

liewegas commented Aug 3, 2017

@jdurgin ok here is a more minimal fix. stop doing hte map gap test with ec thrash collection (which has m=1 in it). keep it in the thrash-erasure-code-big collection, and also had a larger code (k=4 m=2).

@jdurgin

jdurgin approved these changes Aug 3, 2017

@jdurgin jdurgin merged commit b172642 into ceph:master Aug 3, 2017

2 of 4 checks passed

make check running make check
Details
make check (arm64) running make check
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment