Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ceph-mon systemd overrides to be specified #1654

Merged
merged 1 commit into from Aug 22, 2017
Merged

Conversation

andymcc
Copy link
Contributor

@andymcc andymcc commented Jul 5, 2017

ceph-mon can fail to start under certain circumstances (for example,
when running in a container) because the default systemd service
configuration causes namespace issues.

To work around this we can override the system service settings by
placing an overrides file in the ceph-mon@.service.d directory. This can
be generic so as to allow any potential changes required to the ceph-mon
service file.

The overrides file is only setup when the "ceph_mon_systemd_overrides"
config_template override variable is specified.

@leseb
Copy link
Member

leseb commented Jul 5, 2017

Don't we have the same problem for other daemons in container?

@andymcc
Copy link
Contributor Author

andymcc commented Jul 5, 2017

I imagine you would, yes! Happy to apply to each role in that case, do you think the approach is decent enough, or needs more work?

@leseb
Copy link
Member

leseb commented Jul 5, 2017

@andymcc I'm not sure, as I don't really understand what the problem is :).
Can you show me an error example?

Thanks!

@andymcc
Copy link
Contributor Author

andymcc commented Jul 5, 2017

Sure! So the issue is very similar to: #1380

As an example, without specifying "PrivateDevices=false" in the [Service] init file (on CentOS7 at least, the Ubuntu systemd files don't have that set to true) you will get the service failing as follows - because PrivateDevices is defaulted to true in the init file from the package:

~ systemctl status ceph-mon@aio1-ceph-mon-container-04c839c6.service
● ceph-mon@aio1-ceph-mon-container-04c839c6.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Wed 2017-07-05 14:52:25 UTC; 5s ago
  Process: 7833 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=226/NAMESPACE)
 Main PID: 7833 (code=exited, status=226/NAMESPACE)

Once we add the systemd override file inside /etc/systemd/system/ceph-mon@.service.d/*.conf
with PrivateDevices=false - it overrides the value in the actual systemd script for the service and so it starts normally:

~ systemctl status ceph-mon@aio1-ceph-mon-container-04c839c6.service
● ceph-mon@aio1-ceph-mon-container-04c839c6.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/ceph-mon@.service.d
           └─test.conf
   Active: active (running) since Wed 2017-07-05 14:55:18 UTC; 1s ago
 Main PID: 7870 (ceph-mon)
   CGroup: /lxc/aio1_ceph-mon_container-04c839c6/system.slice/system-ceph\x2dmon.slice/ceph-mon@aio1-ceph-mon-container-04c839c6.service
           └─7870 /usr/bin/ceph-mon -f --cluster ceph --id aio1-ceph-mon-container-04c839c6 --setuser ceph --setgroup ceph

Jul 05 14:55:18 aio1-ceph-mon-container-04c839c6 systemd[1]: Started Ceph cluster monitor daemon.
Jul 05 14:55:18 aio1-ceph-mon-container-04c839c6 systemd[1]: Starting Ceph cluster monitor daemon...
Jul 05 14:55:18 aio1-ceph-mon-container-04c839c6 ceph-mon[7870]: starting mon.aio1-ceph-mon-container-04c839c6 rank 0 at 172.29.238.125:6789/0 mon_data /var/lib/ceph/mon/ceph-aio1-cep...71435da7042
Hint: Some lines were ellipsized, use -l to show in full.

=================

TL;DR I think making it generic such that you can override the service's systemd settings if you want (and in a generic way) would mean that this solves more than just 1 use case.

I haven't deployed osds or other services inside a container but I'm guessing it would impact the other services too. The example in the defaults/main.yml is what I would use to resolve the ceph-mon inside the container issue that we're seeing.

@leseb
Copy link
Member

leseb commented Jul 27, 2017

jenkins test this please

@leseb
Copy link
Member

leseb commented Jul 27, 2017

@andymcc sorry for late reply, I understand the issue, it looks like you are running your deployment on LXC containers. It's clearer now. Would you mind working on the support for others daemons too?

Thanks!

@andymcc
Copy link
Contributor Author

andymcc commented Jul 27, 2017

Will fix up for the other daemons and resubmit!

@leseb
Copy link
Member

leseb commented Aug 1, 2017

@andymcc sorry I missed your last update, looking into this.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your modified defaults/main.yml you need to run generate_group_vars_sample.sh.

@@ -19,3 +19,13 @@ copy_admin_key: false

ceph_mds_docker_extra_env: -e CLUSTER={{ cluster }} -e MDS_NAME={{ ansible_hostname }}
ceph_config_keys: [] # DON'T TOUCH ME

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one more line, just like you did below.

@@ -5,3 +5,13 @@

ceph_mgr_docker_extra_env: -e CLUSTER={{ cluster }} -e MGR_NAME={{ ansible_hostname }}
ceph_config_keys: [] # DON'T TOUCH ME

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -120,3 +120,13 @@ ceph_mon_docker_extra_env: -e CLUSTER={{ cluster }} -e FSID={{ fsid }} -e MON_NA
mon_docker_privileged: false
mon_docker_net_host: true
ceph_config_keys: [] # DON'T TOUCH ME

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -227,3 +227,13 @@ ceph_osd_docker_prepare_env: -e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ jou
#
ceph_osd_docker_extra_env: -e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }}
ceph_osd_docker_run_script_path: "/usr/share" # script called by systemd to run the docker command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -51,3 +51,13 @@ ceph_rgw_civetweb_port: "{{ radosgw_civetweb_port }}"
ceph_rgw_docker_extra_env: -e CLUSTER={{ cluster }} -e RGW_CIVETWEB_PORT={{ ceph_rgw_civetweb_port }}
ceph_config_keys: [] # DON'T TOUCH ME
rgw_config_keys: "/" # DON'T TOUCH ME

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing now - sorry i got sidetracked and forgot about this :)

ceph services can fail to start under certain circumstances (for
example, when running in a container) because the default systemd
service configuration causes namespace issues.

To work around this we can override the system service settings by
placing an overrides file in the ceph-<service>@.service.d directory.
This can be generic so as to allow any potential changes required to
the ceph-<service> service files.

The overrides file is only setup when the
"ceph_<service>_systemd_overrides" config_template override variable is
specified.

The available service systemd override files are as follows:
ceph_mds_systemd_overrides
ceph_mgr_systemd_overrides
ceph_mon_systemd_overrides
ceph_osd_systemd_overrides
ceph_rbd_mirror_systemd_overrides
ceph_rgw_systemd_overrides
@andymcc
Copy link
Contributor Author

andymcc commented Aug 16, 2017

Should be fixed now - and rebased on the change to move out the starting of the ceph-osd services.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@leseb
Copy link
Member

leseb commented Aug 21, 2017

jenkins test this please

@leseb
Copy link
Member

leseb commented Aug 22, 2017

CI failures due to ceph/ceph-container#754. Safe to merge.

@leseb leseb merged commit 38d575c into ceph:master Aug 22, 2017
hswong3i added a commit to alvistack/ansible-role-ceph_mon that referenced this pull request May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants