ceph-common: purge ceph.conf file #694

leseb · 2016-04-07T08:55:37Z

Since ##461 we have been having the ability to override ceph default
options. Previously we had to add a new line in the template and then
another variable as well. Doing a PR for one option was such a pain. As
a result, we now have tons of options that we need to maintain across
all the ceph version, yet another painful thing to do.
This commit removes all the ceph options so they are handled by ceph
directly. If you want to add a new option, feel free to to use the
ceph_conf_overrides variable of your group_vars/all.

Risks, for those who have been managing their ceph using ceph-ansible
this is not a trivial change as it will trigger a change in your
ceph.conf and then restart all your ceph services. Moreover if you did
some specific tweaks as well, prior to run ansible you should update the
ceph_conf_overrides variable to reflect your previous changes.

To avoid service restart, you need to know a bit of ansible for this,
but generally the idea would be to run ansible on a dummy host to
generate the ceph.conf, then scp this file to all your ceph hosts and
you should be good.

Closes: #693

Signed-off-by: Sébastien Han seb@redhat.com

leseb · 2016-04-07T10:13:25Z

@andrewschoen looks like we are seeing the same error again on the Ubuntu VM :(

leseb · 2016-04-07T12:07:56Z

@jdurgin, @andrewschoen, @alfredodeza feel free to chime :)

andrewschoen · 2016-04-07T15:35:52Z

test this please

andrewschoen · 2016-04-07T15:38:56Z

@leseb this looks ok to me, it certainly cleans things up. @jdurgin had mentioned to me that some of these defaults currently in ceph.conf are good for older versions of ceph, do we perhaps want to support version-specific configuration files?

andrewschoen · 2016-04-07T15:48:07Z

@leseb the trusty failure was because the CI was picking up a node without an extra device attached. I think I've got that cleared up, but we will probably need to run the tests one more time.

vasukulkarni · 2016-04-07T16:47:49Z

I think the rbd_client_log/dir should also go away, with selinux enabled this can create issues if they are not in default directories and policy exists only for default directories(for hammer/jewel), for other distro's it isn't an issue.

andrewschoen · 2016-04-07T16:57:57Z

test this please

jdurgin · 2016-04-07T19:08:02Z

looks good in terms of getting rid of settings that are suboptimal for jewel.

For other versions, I do think having a per-version config might make sense - e.g. the osd recovery settings commonly had to be adjust in hammer, which is probably where a lot of these came from. Maybe those could just go in a sample playbook as overrides or something? I'm not too familiar with the structure of things in ansible.

leseb · 2016-04-10T21:36:10Z

Ok I'll try to add an example section for those who are looking at deploying Hammer then.

andrewschoen · 2016-04-19T18:19:32Z

Test this please

leseb · 2016-04-22T17:39:39Z

test this please

andrewschoen · 2016-04-28T16:37:36Z

@leseb what else needs to be done for this one? Can I help?

leseb · 2016-04-28T16:53:14Z

@andrewschoen I just need to rebase and then add some doc to address the Hammer use case I guess.

andrewschoen · 2016-04-28T17:12:38Z

group_vars/all.sample

@@ -223,10 +209,14 @@ dummy:

 #rbd_client_log_path: /var/log/ceph
 #rbd_client_log_file: "{{ rbd_client_log_path }}/qemu-guest-$pid.log" # must be writable by QEMU and allowed by SELinux or AppArmor
+<<<<<<< HEAD


I think we want to keep rbd_client_admin_socket_path, but lose the rest of the rbd_* options here.

leseb · 2016-04-28T17:28:45Z

test this please

leseb · 2016-04-28T17:29:17Z

test this please

mattt416 · 2016-04-29T10:13:36Z

roles/ceph-common/templates/ceph.conf.j2

 [client]
-rbd cache = {{ rbd_cache }}


Looks like the defaults for rbd_cache will also need to be removed.

mattt416 · 2016-04-29T12:53:36Z

Hi @leseb,

The restarts on the back of config changes is concerning obviously, but I think what is potentially more concerning is the significant impact to behavior that is going to result from this change. I've tried to look at what will be removed and made a note where ceph-ansible and upstream ceph values differ:

Setting	ceph-ansible	ceph
cephx require signatures	true	false
cephx cluster require signatures	true	false
osd pool default pg num	128	8
osd pool default pgp num	128	8
rbd concurrent management ops	20	10
rbd default map options	rw	''
rbd default format	2	1
mon osd down out interval	600	300
mon osd min down reporters	7	1
mon clock drift allowed	0.15	0.5
mon clock drift warn backoff	30	5
mon osd report timeout	900	300
mon pg warn max per osd	0	300
mon osd allow primary affinity	true	false
filestore merge threshold	40	10
filestore split multiple	8	2
osd op threads	8	2
filestore op threads	8	2
osd recovery max active	5	15
osd max backfills	2	10
osd recovery op priority	2	63
osd recovery max chunk	1048576	8 << 20
osd scrub sleep	0.1	0
osd disk thread ioprio class	idle	''
osd disk thread ioprio priority	0	-1
osd deep scrub stride	1048576	524288
osd scrub chunk max	5	25

These are defaults I'm still working on tracking down:

Setting	ceph-ansible	ceph
osd mkfs type	xfs
osd crush update on start	true

Obviously, we can override all these locally but this is going to have a big impact on anyone who is not watching what is happening in this repo. I think some further discussions need to be had here before this goes in.

With all that said, I do think this is the right long-term approach. :)

--Matt

leseb · 2016-04-29T21:49:01Z

Hi @mattt416,

You raising a really good point and thanks for the in-depth analysis. However I'm not really sure how to provide a compatible and non-breaking change.
What could be done for the next release of ceph-ansible, is to highlight this as a huge change.
So users getting the next ceph-ansible version will hopefully read the release note :).
The other option would be to keep previous default in override_config but I'm not such a big fan of this.

Any other idea?

alfredodeza · 2016-05-02T23:37:49Z

test this please

leseb · 2016-05-03T07:00:00Z

test this please

mattt416 · 2016-05-04T09:15:51Z

Hi @leseb,

TBH I didn't even realise you were using tags! Having those in place helps a lot because it means you can be mindful of breaking changes when the major version is bumped.

We will discuss internally what to do for rpc-openstack -- whether we override locally to bring back a similar configuration or try to run with a more vanilla setup and only override things when we have a specific need.

Thanks!

--Matt

leseb · 2016-05-04T09:46:21Z

test this please

leseb · 2016-05-06T23:06:57Z

test this please

leseb · 2016-05-07T17:21:04Z

test this please

leseb · 2016-05-09T16:28:23Z

@mattt416 how did the discussion go?
I probably need to keep some examples for Hammer and before in the meantime.

leseb · 2016-05-09T16:52:10Z

@jdurgin I'm currently documenting some of the variables that we would recommend to keep if we deploy a cluster before Jewel. Could you tag some of the variables we need highlight? (just by looking at @mattt416's table maybe).

Thanks!

Since ##461 we have been having the ability to override ceph default options. Previously we had to add a new line in the template and then another variable as well. Doing a PR for one option was such a pain. As a result, we now have tons of options that we need to maintain across all the ceph version, yet another painful thing to do. This commit removes all the ceph options so they are handled by ceph directly. If you want to add a new option, feel free to to use the `ceph_conf_overrides` variable of your `group_vars/all`. Risks, for those who have been managing their ceph using ceph-ansible this is not a trivial change as it will trigger a change in your `ceph.conf` and then restart all your ceph services. Moreover if you did some specific tweaks as well, prior to run ansible you should update the `ceph_conf_overrides` variable to reflect your previous changes. To avoid service restart, you need to know a bit of ansible for this, but generally the idea would be to run ansible on a dummy host to generate the ceph.conf, then scp this file to all your ceph hosts and you should be good. Closes: #693 Signed-off-by: Sébastien Han <seb@redhat.com>

jdurgin · 2016-05-11T06:57:04Z

@leseb I think the main ones are the recovery settings highlighted already in this pr. Others are pretty hardware-dependent, or not too important I think. You could point at the old config before this pr so folks could compare perhaps.

leseb · 2016-05-11T07:13:25Z

@jdurgin good point, will do thanks!

Highlight the variables that were used prior to this path: #694 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb force-pushed the purge-ceph-conf-options branch 2 times, most recently from 130893f to c587f0d Compare April 7, 2016 09:36

leseb force-pushed the purge-ceph-conf-options branch from c587f0d to 33d29df Compare April 19, 2016 08:14

leseb force-pushed the purge-ceph-conf-options branch from 33d29df to e755dc3 Compare April 28, 2016 16:54

andrewschoen reviewed Apr 28, 2016
View reviewed changes

leseb force-pushed the purge-ceph-conf-options branch from e755dc3 to 90f1389 Compare April 28, 2016 17:28

leseb force-pushed the purge-ceph-conf-options branch from 90f1389 to a199704 Compare April 28, 2016 17:29

mattt416 reviewed Apr 29, 2016
View reviewed changes

leseb force-pushed the purge-ceph-conf-options branch from a199704 to 47860a8 Compare May 10, 2016 14:51

leseb merged commit 52b2f1c into master May 10, 2016

leseb deleted the purge-ceph-conf-options branch May 10, 2016 16:24

leseb added a commit that referenced this pull request May 11, 2016

ceph-common: update the README for purge config

ce83315

Highlight the variables that were used prior to this path: #694 Signed-off-by: Sébastien Han <seb@redhat.com>

leseb mentioned this pull request May 11, 2016

ceph-common: update the README for purge config #773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-common: purge ceph.conf file #694

ceph-common: purge ceph.conf file #694

leseb commented Apr 7, 2016

leseb commented Apr 7, 2016

leseb commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

vasukulkarni commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

jdurgin commented Apr 7, 2016

leseb commented Apr 10, 2016

andrewschoen commented Apr 19, 2016

leseb commented Apr 22, 2016

andrewschoen commented Apr 28, 2016

leseb commented Apr 28, 2016

andrewschoen Apr 28, 2016

leseb commented Apr 28, 2016

leseb commented Apr 28, 2016

mattt416 Apr 29, 2016

mattt416 commented Apr 29, 2016 •

edited

Loading

leseb commented Apr 29, 2016 •

edited

Loading

alfredodeza commented May 2, 2016

leseb commented May 3, 2016

mattt416 commented May 4, 2016

leseb commented May 4, 2016

leseb commented May 6, 2016

leseb commented May 7, 2016

leseb commented May 9, 2016

leseb commented May 9, 2016 •

edited

Loading

jdurgin commented May 11, 2016

leseb commented May 11, 2016

ceph-common: purge ceph.conf file #694

ceph-common: purge ceph.conf file #694

Conversation

leseb commented Apr 7, 2016

leseb commented Apr 7, 2016

leseb commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

vasukulkarni commented Apr 7, 2016

andrewschoen commented Apr 7, 2016

jdurgin commented Apr 7, 2016

leseb commented Apr 10, 2016

andrewschoen commented Apr 19, 2016

leseb commented Apr 22, 2016

andrewschoen commented Apr 28, 2016

leseb commented Apr 28, 2016

andrewschoen Apr 28, 2016

Choose a reason for hiding this comment

leseb commented Apr 28, 2016

leseb commented Apr 28, 2016

mattt416 Apr 29, 2016

Choose a reason for hiding this comment

mattt416 commented Apr 29, 2016 • edited Loading

leseb commented Apr 29, 2016 • edited Loading

alfredodeza commented May 2, 2016

leseb commented May 3, 2016

mattt416 commented May 4, 2016

leseb commented May 4, 2016

leseb commented May 6, 2016

leseb commented May 7, 2016

leseb commented May 9, 2016

leseb commented May 9, 2016 • edited Loading

jdurgin commented May 11, 2016

leseb commented May 11, 2016

mattt416 commented Apr 29, 2016 •

edited

Loading

leseb commented Apr 29, 2016 •

edited

Loading

leseb commented May 9, 2016 •

edited

Loading