test: Thrasher: update pgp_num of all expanded pools if not yet #13367

tchaikov · 2017-02-11T10:45:27Z

…fore waiting for healthy

otherwise, the "ceph health" complains with:

all OSDs are running luminous or later but the 'require_luminous_osds'
osdmap flag is not set

"ceph.restart" task will timeout and fail at seeing this warning.

so we need to set the osdmap flag, after upgrading all OSDs. and call
"ceph.restart" again to see if the cluster is healthy or not.

Signed-off-by: Kefu Chai kchai@redhat.com

tchaikov · 2017-02-11T10:46:05Z

this fixes the failure of http://pulpito.ceph.com/kchai-2017-02-11_07:28:54-rados-wip-16091-monc-in-parallel---basic-smithi/805110/.

see also #13340 .

tchaikov · 2017-02-11T10:52:18Z

w/o this fix: http://pulpito.ceph.com/kchai-2017-02-11_10:49:52-rados-master---basic-smithi/
with this fix: http://pulpito.ceph.com/kchai-2017-02-12_06:30:42-rados-master---basic-smithi/

tchaikov · 2017-02-11T17:51:55Z

still has

2017-02-11T12:24:57.507 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172; mon.a has mon_osd_down_out_interval set to 0

liewegas · 2017-02-12T14:52:03Z

#13378 fixes the first part using hte releases/luminous.yaml convention. not sure about the other patches here?

tchaikov · 2017-02-12T15:52:16Z

@liewegas i will drop the first commit. the other commits address following warning:

2017-02-11T12:24:57.507 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172; mon.a has mon_osd_down_out_interval set to 0

please see the commit message for more info.

https://github.com/ceph/ceph/pull/13378/files#diff-e1cc51eb4bbd70baf5f2e815a28cdcebR9 addresses the

mon.a has mon_osd_down_out_interval set to 0

warning. while the last three commits do this in another way.

either way, we still need 75fa968, which handles

pool cephfs_data pg_num 182 > pgp_num 172;

otherwise wait_until_healthy will fail after timeout as seeing warning like: HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172 Signed-off-by: Kefu Chai <kchai@redhat.com>

Signed-off-by: Kefu Chai <kchai@redhat.com>

liewegas

lgtm. fwiw i avoided the down_out_interval warning by just disabling that warning via a config option in luminous.yaml. this will avoid the problem in general, though!

tchaikov · 2017-02-13T06:50:57Z

tested at http://pulpito.ceph.com/kchai-2017-02-13_01:58:04-rados-master---basic-mira/

tchaikov requested review from liewegas and yuriw February 11, 2017 10:45

tchaikov mentioned this pull request Feb 11, 2017

mon/MonClient: hunt monitors in parallel #11128

Merged

tchaikov force-pushed the wip-qa-jewel-x-singleton branch 4 times, most recently from 98d6f20 to b14149d Compare February 12, 2017 13:30

tchaikov force-pushed the wip-qa-jewel-x-singleton branch from b14149d to 8163201 Compare February 12, 2017 15:35

tchaikov force-pushed the wip-qa-jewel-x-singleton branch from 8163201 to 359d8b1 Compare February 12, 2017 16:21

tchaikov changed the title ~~qa/suites/rados/upgrade/jewel-x-singleton: "require_luminous_osds" be…~~ test: Thrasher: update pgp_num of all expanded pools if not yet Feb 12, 2017

tchaikov force-pushed the wip-qa-jewel-x-singleton branch from 359d8b1 to 04220ff Compare February 12, 2017 16:28

tchaikov added 4 commits February 13, 2017 09:25

test: Thrasher: update pgp_num of all expanded pools if not yet

136483a

otherwise wait_until_healthy will fail after timeout as seeing warning like: HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172 Signed-off-by: Kefu Chai <kchai@redhat.com>

tests: CephManager: add get_config() method

995e144

Signed-off-by: Kefu Chai <kchai@redhat.com>

tests: Thrasher: extract _set_config() method

761a1dc

Signed-off-by: Kefu Chai <kchai@redhat.com>

test: Thrasher: restore changed options after done with thrash

de59b51

Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov force-pushed the wip-qa-jewel-x-singleton branch from 04220ff to de59b51 Compare February 13, 2017 01:26

liewegas approved these changes Feb 13, 2017

View reviewed changes

liewegas added bug-fix needs-qa tests and removed needs-qa labels Feb 13, 2017

tchaikov merged commit 148d488 into ceph:master Feb 13, 2017

tchaikov deleted the wip-qa-jewel-x-singleton branch February 13, 2017 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Thrasher: update pgp_num of all expanded pools if not yet #13367

test: Thrasher: update pgp_num of all expanded pools if not yet #13367

tchaikov commented Feb 11, 2017

tchaikov commented Feb 11, 2017

tchaikov commented Feb 11, 2017 •

edited

tchaikov commented Feb 11, 2017

liewegas commented Feb 12, 2017

tchaikov commented Feb 12, 2017 •

edited

liewegas left a comment

tchaikov commented Feb 13, 2017

test: Thrasher: update pgp_num of all expanded pools if not yet #13367

test: Thrasher: update pgp_num of all expanded pools if not yet #13367

Conversation

tchaikov commented Feb 11, 2017

tchaikov commented Feb 11, 2017

tchaikov commented Feb 11, 2017 • edited

tchaikov commented Feb 11, 2017

liewegas commented Feb 12, 2017

tchaikov commented Feb 12, 2017 • edited

liewegas left a comment

Choose a reason for hiding this comment

tchaikov commented Feb 13, 2017

tchaikov commented Feb 11, 2017 •

edited

tchaikov commented Feb 12, 2017 •

edited