Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-common: purge ceph.conf file #694

Merged
merged 1 commit into from
May 10, 2016
Merged

ceph-common: purge ceph.conf file #694

merged 1 commit into from
May 10, 2016

Conversation

leseb
Copy link
Member

@leseb leseb commented Apr 7, 2016

Since ##461 we have been having the ability to override ceph default
options. Previously we had to add a new line in the template and then
another variable as well. Doing a PR for one option was such a pain. As
a result, we now have tons of options that we need to maintain across
all the ceph version, yet another painful thing to do.
This commit removes all the ceph options so they are handled by ceph
directly. If you want to add a new option, feel free to to use the
ceph_conf_overrides variable of your group_vars/all.

Risks, for those who have been managing their ceph using ceph-ansible
this is not a trivial change as it will trigger a change in your
ceph.conf and then restart all your ceph services. Moreover if you did
some specific tweaks as well, prior to run ansible you should update the
ceph_conf_overrides variable to reflect your previous changes.

To avoid service restart, you need to know a bit of ansible for this,
but generally the idea would be to run ansible on a dummy host to
generate the ceph.conf, then scp this file to all your ceph hosts and
you should be good.

Closes: #693

Signed-off-by: Sébastien Han seb@redhat.com

@leseb leseb force-pushed the purge-ceph-conf-options branch 2 times, most recently from 130893f to c587f0d Compare April 7, 2016 09:36
@leseb
Copy link
Member Author

leseb commented Apr 7, 2016

@andrewschoen looks like we are seeing the same error again on the Ubuntu VM :(

@leseb
Copy link
Member Author

leseb commented Apr 7, 2016

@jdurgin, @andrewschoen, @alfredodeza feel free to chime :)

@andrewschoen
Copy link
Contributor

test this please

@andrewschoen
Copy link
Contributor

@leseb this looks ok to me, it certainly cleans things up. @jdurgin had mentioned to me that some of these defaults currently in ceph.conf are good for older versions of ceph, do we perhaps want to support version-specific configuration files?

@andrewschoen
Copy link
Contributor

@leseb the trusty failure was because the CI was picking up a node without an extra device attached. I think I've got that cleared up, but we will probably need to run the tests one more time.

@vasukulkarni
Copy link
Contributor

I think the rbd_client_log/dir should also go away, with selinux enabled this can create issues if they are not in default directories and policy exists only for default directories(for hammer/jewel), for other distro's it isn't an issue.

@andrewschoen
Copy link
Contributor

test this please

@jdurgin
Copy link
Member

jdurgin commented Apr 7, 2016

looks good in terms of getting rid of settings that are suboptimal for jewel.

For other versions, I do think having a per-version config might make sense - e.g. the osd recovery settings commonly had to be adjust in hammer, which is probably where a lot of these came from. Maybe those could just go in a sample playbook as overrides or something? I'm not too familiar with the structure of things in ansible.

@leseb
Copy link
Member Author

leseb commented Apr 10, 2016

Ok I'll try to add an example section for those who are looking at deploying Hammer then.

@andrewschoen
Copy link
Contributor

Test this please

@leseb
Copy link
Member Author

leseb commented Apr 22, 2016

test this please

@andrewschoen
Copy link
Contributor

@leseb what else needs to be done for this one? Can I help?

@leseb
Copy link
Member Author

leseb commented Apr 28, 2016

@andrewschoen I just need to rebase and then add some doc to address the Hammer use case I guess.

@@ -223,10 +209,14 @@ dummy:

#rbd_client_log_path: /var/log/ceph
#rbd_client_log_file: "{{ rbd_client_log_path }}/qemu-guest-$pid.log" # must be writable by QEMU and allowed by SELinux or AppArmor
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to keep rbd_client_admin_socket_path, but lose the rest of the rbd_* options here.

@leseb
Copy link
Member Author

leseb commented Apr 28, 2016

test this please

@leseb
Copy link
Member Author

leseb commented Apr 28, 2016

test this please

[client]
rbd cache = {{ rbd_cache }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the defaults for rbd_cache will also need to be removed.

@mattt416
Copy link
Contributor

mattt416 commented Apr 29, 2016

Hi @leseb,

The restarts on the back of config changes is concerning obviously, but I think what is potentially more concerning is the significant impact to behavior that is going to result from this change. I've tried to look at what will be removed and made a note where ceph-ansible and upstream ceph values differ:

Setting ceph-ansible ceph
cephx require signatures true false
cephx cluster require signatures true false
osd pool default pg num 128 8
osd pool default pgp num 128 8
rbd concurrent management ops 20 10
rbd default map options rw ''
rbd default format 2 1
mon osd down out interval 600 300
mon osd min down reporters 7 1
mon clock drift allowed 0.15 0.5
mon clock drift warn backoff 30 5
mon osd report timeout 900 300
mon pg warn max per osd 0 300
mon osd allow primary affinity true false
filestore merge threshold 40 10
filestore split multiple 8 2
osd op threads 8 2
filestore op threads 8 2
osd recovery max active 5 15
osd max backfills 2 10
osd recovery op priority 2 63
osd recovery max chunk 1048576 8 << 20
osd scrub sleep 0.1 0
osd disk thread ioprio class idle ''
osd disk thread ioprio priority 0 -1
osd deep scrub stride 1048576 524288
osd scrub chunk max 5 25

These are defaults I'm still working on tracking down:

Setting ceph-ansible ceph
osd mkfs type xfs
osd crush update on start true

Obviously, we can override all these locally but this is going to have a big impact on anyone who is not watching what is happening in this repo. I think some further discussions need to be had here before this goes in.

With all that said, I do think this is the right long-term approach. :)

--Matt

@leseb
Copy link
Member Author

leseb commented Apr 29, 2016

Hi @mattt416,

You raising a really good point and thanks for the in-depth analysis. However I'm not really sure how to provide a compatible and non-breaking change.
What could be done for the next release of ceph-ansible, is to highlight this as a huge change.
So users getting the next ceph-ansible version will hopefully read the release note :).
The other option would be to keep previous default in override_config but I'm not such a big fan of this.

Any other idea?

@alfredodeza
Copy link
Contributor

test this please

1 similar comment
@leseb
Copy link
Member Author

leseb commented May 3, 2016

test this please

@mattt416
Copy link
Contributor

mattt416 commented May 4, 2016

Hi @leseb,

TBH I didn't even realise you were using tags! Having those in place helps a lot because it means you can be mindful of breaking changes when the major version is bumped.

We will discuss internally what to do for rpc-openstack -- whether we override locally to bring back a similar configuration or try to run with a more vanilla setup and only override things when we have a specific need.

Thanks!

--Matt

@leseb
Copy link
Member Author

leseb commented May 4, 2016

test this please

2 similar comments
@leseb
Copy link
Member Author

leseb commented May 6, 2016

test this please

@leseb
Copy link
Member Author

leseb commented May 7, 2016

test this please

@leseb
Copy link
Member Author

leseb commented May 9, 2016

@mattt416 how did the discussion go?
I probably need to keep some examples for Hammer and before in the meantime.

@leseb
Copy link
Member Author

leseb commented May 9, 2016

@jdurgin I'm currently documenting some of the variables that we would recommend to keep if we deploy a cluster before Jewel. Could you tag some of the variables we need highlight? (just by looking at @mattt416's table maybe).

Thanks!

Since ##461 we have been having the ability to override ceph default
options. Previously we had to add a new line in the template and then
another variable as well. Doing a PR for one option was such a pain. As
a result, we now have tons of options that we need to maintain across
all the ceph version, yet another painful thing to do.
This commit removes all the ceph options so they are handled by ceph
directly. If you want to add a new option, feel free to to use the
`ceph_conf_overrides` variable of your `group_vars/all`.

Risks, for those who have been managing their ceph using ceph-ansible
this is not a trivial change as it will trigger a change in your
`ceph.conf` and then restart all your ceph services. Moreover if you did
some specific tweaks as well, prior to run ansible you should update the
`ceph_conf_overrides` variable to reflect your previous changes.

To avoid service restart, you need to know a bit of ansible for this,
but generally the idea would be to run ansible on a dummy host to
generate the ceph.conf, then scp this file to all your ceph hosts and
you should be good.

Closes: #693

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb merged commit 52b2f1c into master May 10, 2016
@leseb leseb deleted the purge-ceph-conf-options branch May 10, 2016 16:24
@jdurgin
Copy link
Member

jdurgin commented May 11, 2016

@leseb I think the main ones are the recovery settings highlighted already in this pr. Others are pretty hardware-dependent, or not too important I think. You could point at the old config before this pr so folks could compare perhaps.

@leseb
Copy link
Member Author

leseb commented May 11, 2016

@jdurgin good point, will do thanks!

leseb added a commit that referenced this pull request May 11, 2016
Highlight the variables that were used prior to this path:
#694

Signed-off-by: Sébastien Han <seb@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants