ceph-ansible: "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring" #1380

gator1 · 2017-03-16T23:59:30Z

Hi,

I set up seven KVM centos CentOS-7-x86_64-Minimal-1611 VMs.
They use bridged network on the host with static ip address.

I set up a user cephuser in wheel group but ceph-ansible doesn't like that user.

Now instead I use root. I tried to set up three monitors (mon-node1--3) and three osds (osd-node1-3)

I use this all file to run ansible-playbook site.yml -u root with site.yml copied from sample unchanged.

I checked mon-node1 /etc/ceph and there is no keyring file.

Do I need certain utility for this to work? ssh-keygen works.

Please help.

all.txt

](url)

TASK [ceph-mon : wait for ceph.client.admin.keyring exists] ********************
fatal: [mon-node1]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node3]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node2]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
to retry, use: --limit @/root/ceph-ansible/site.retry

PLAY RECAP *********************************************************************
mon-node1 : ok=41 changed=1 unreachable=0 failed=1
mon-node2 : ok=39 changed=1 unreachable=0 failed=1
mon-node3 : ok=39 changed=1 unreachable=0 failed=1
osd-node1 : ok=2 changed=0 unreachable=0 failed=0
osd-node2 : ok=2 changed=0 unreachable=0 failed=0
osd-node3 : ok=2 changed=0 unreachable=0 failed=0

gator1 · 2017-03-17T00:40:53Z

monitor kind of started but ceph needs the keys on monitors.

There must be some simple issues. From the deplot node (called mgmt) I can ssh root@mon-node1 without password. Does that mean I did everything regarding to the key?

[root@mon-node1 ceph]# ceph -s
2017-03-16 17:37:42.112015 7f0f40feb700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2017-03-16 17:37:42.113360 7f0f387f8700 0 -- :/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f3c05dbf0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f3c05eeb0).fault
2017-03-16 17:37:45.113535 7f0f386f7700 0 -- :/3462189818 >> 10.145.82.103:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24001f90).fault
2017-03-16 17:37:48.113716 7f0f387f8700 0 -- :/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f240052b0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24006570).fault
2017-03-16 17:37:54.114394 7f0f385f6700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f240052b0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24004620).fault
2017-03-16 17:37:57.114570 7f0f387f8700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.103:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24002350).fault
2017-03-16 17:38:03.114993 7f0f386f7700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.103:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24007c40).fault
2017-03-16 17:38:09.115629 7f0f385f6700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f24007e10).fault
2017-03-16 17:38:12.115703 7f0f386f7700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.103:6789/0 pipe(0x7f0f240052b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f2400aa70).fault
2017-03-16 17:38:15.115913 7f0f385f6700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f2400b340).fault
2017-03-16 17:38:21.118907 7f0f385f6700 0 -- 10.145.82.101:0/3462189818 >> 10.145.82.102:6789/0 pipe(0x7f0f24000c80 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0f2400b340).fault

WingkaiHo · 2017-03-21T03:27:49Z

check group_vars/all.yml Monitor options

set the monitor_interface or monitor_address

gator1 · 2017-03-22T00:39:21Z

I used eth0 for monitor interface

WingkaiHo · 2017-03-22T02:26:39Z

Which version you select. Because the 10.2.5 the admin key build before ceph-mon start
the monitor service file like this
...
After=network-online.target local-fs.target time-sync.target ceph-create-keys@%i.service
Wants=network-online.target local-fs.target time-sync.target ceph-create-keys@%i.service
...

the version 11.xxx.xx, it will create by the ansbile-playbook, roles/ceph-mon/tasks/ceph_keys.yml
of task collect admin and bootstrap keys (for or after kraken release)

gator1 · 2017-04-04T01:26:35Z

Tried 9 and 10, not 11. I am not sure what it means "Because the 10.2.5 the admin key build before ceph-mon start the monitor service file like this...". Does it mean 10 and 9 won't work? I can try 11.

gator1 · 2017-04-04T02:05:46Z

I tried 11 and got the same results pretty much. Can't generate keyring at /etc/ceph for monitor nodes.

TASK [ceph-mon : wait for ceph.client.admin.keyring exists] ********************
fatal: [mon-node2]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node3]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node1]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}

gator1 · 2017-04-04T02:08:18Z

more output

TASK [ceph-mon : start the monitor service] ************************************
ok: [mon-node2]
ok: [mon-node3]
ok: [mon-node1]

TASK [ceph-mon : include] ******************************************************
included: /root/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml for mon-node1, mon-node2, mon-node3

TASK [ceph-mon : collect admin and bootstrap keys (for or after kraken release)] ***
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be disabled
by setting deprecation_warnings=False in ansible.cfg.
ok: [mon-node1]
ok: [mon-node2]
ok: [mon-node3]

TASK [ceph-mon : wait for ceph.client.admin.keyring exists] ********************
fatal: [mon-node2]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node3]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [mon-node1]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}

RUNNING HANDLER [ceph.ceph-common : restart ceph mons] *************************

RUNNING HANDLER [ceph.ceph-common : restart ceph osds] *************************

RUNNING HANDLER [ceph.ceph-common : restart ceph mdss] *************************

RUNNING HANDLER [ceph.ceph-common : restart ceph rgws] *************************

RUNNING HANDLER [ceph.ceph-common : restart ceph nfss] *************************
to retry, use: --limit @/root/ceph-ansible/site.retry

PLAY RECAP *********************************************************************
mon-node1 : ok=42 changed=11 unreachable=0 failed=1
mon-node2 : ok=40 changed=11 unreachable=0 failed=1
mon-node3 : ok=40 changed=11 unreachable=0 failed=1
osd-node1 : ok=2 changed=0 unreachable=0 failed=0
osd-node2 : ok=2 changed=0 unreachable=0 failed=0
osd-node3 : ok=2 changed=0 unreachable=0 failed=0

leseb · 2017-04-04T10:22:53Z

@gator1 please check if your ceph-mon are running. Look at the system logs for ceph-mon as well.

gator1 · 2017-04-04T16:56:31Z

How do I check if ceph-mon is running? I tried this on one monitor node and got this. Where I get the system logs for it? Sorry I am a newbie. [root@mon-node2 ~]# ceph -s 2017-04-04 09:51:13.217859 7f16a0b18700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory

…

On Tue, Apr 4, 2017 at 3:22 AM, Sébastien Han ***@***.***> wrote: @gator1 <https://github.com/gator1> please check if you ceph-mon are running. Look at the system logs for ceph-mon as well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1380 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATBNFVy1HW2ZkVwYL3mMRQ7yrXLqlmyKks5rshoAgaJpZM4MgBm5> .

gator1 · 2017-04-04T16:57:41Z

on a monitor node:

[root@mon-node2 ~]# ceph -s

2017-04-04 09:51:13.217859 7f16a0b18700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory

gator1 · 2017-04-04T17:03:41Z

ceph-mon is running, does ceph-ansible create user ceph?

[root@mon-node2 ceph]# ps -aux | grep ceph
root 2551 0.0 0.0 112648 956 pts/0 S+ 10:02 0:00 grep --color=auto ceph
ceph 31453 0.0 1.7 372568 66504 ? Ssl 01:46 0:06 /usr/bin/ceph-mon -f --cluster ceph --id mon-node2 --setuser ceph --setgroup ceph

gator1 · 2017-04-04T17:58:54Z

ceph-mon log from /var/log doesn't seem to have error other than

2017-04-04 01:46:46.474482 7f448d3497c0 0 set uid:gid to 167:167 (ceph:ceph)
2017-04-04 01:46:46.474516 7f448d3497c0 0 ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7), process ceph-mon, pid 31453
2017-04-04 01:46:46.474585 7f448d3497c0 0 pidfile_write: ignore empty --pid-file
2017-04-04 01:46:46.515899 7f448d3497c0 0 load: jerasure load: lrc load: isa
2017-04-04 01:46:46.516709 7f448d3497c0 1 leveldb: Recovering log #6
2017-04-04 01:46:46.518279 7f448d3497c0 1 leveldb: Delete type=0 #6

2017-04-04 01:46:46.518318 7f448d3497c0 1 leveldb: Delete type=3 #4

2017-04-04 01:46:46.519499 7f448d3497c0 0 starting mon.mon-node2 rank 1 at 10.145.82.102:6789/0 mon_data /var/lib/ceph/mon/ceph-mon-node2 fsid ba09e207-d873-4bca-ad02-ee76d76e335d
2017-04-04 01:46:46.521027 7f448d3497c0 1 mon.mon-node2@-1(probing) e0 preinit fsid ba09e207-d873-4bca-ad02-ee76d76e335d
2017-04-04 01:46:46.521119 7f448d3497c0 1 mon.mon-node2@-1(probing) e0 initial_members mon-node1,mon-node2,mon-node3, filtering seed monmap
2017-04-04 01:46:46.521212 7f448d3497c0 1 mon.mon-node2@-1(probing).mds e0 Unable to load 'last_metadata'
2017-04-04 01:46:46.521954 7f448d3497c0 0 mon.mon-node2@-1(probing) e0 my rank is now 0 (was -1)
2017-04-04 01:47:46.521511 7f4484e37700 0 mon.mon-node2@0(probing).data_health(0) update_stats avail 73% total 6334 MB, used 1666 MB, avail 4667 MB

I attach the entire log here
ceph-mon.mon-node2.txt

leseb · 2017-04-05T10:00:36Z

@gator1 try to run ceph-create-keys --cluster ceph --id <monitor hostname> on one of the monitors.

OrFriedmann · 2017-04-05T14:45:54Z

Usually this error occurs when you use wrong public address or wrong cluster address or maybe wrong monitor interface
The public address and the public address are cidr.
You can check it on the ceph conf that was generated if it is correct cidr.

gator1 · 2017-04-05T15:42:03Z

ceph-create-keys --cluster ceph --id mon-node1 and
ceph-create-keys --cluster ceph --id mon-node2 on each node keeps generating:
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'

gator1 · 2017-04-05T15:50:13Z

@OrFriedmann :

Here is the ceph.conf created.
I only has eth0 for each node so monitor interface is eth0 and public and cluster addresses are
the same, 10.145.82.0/24.

This is a network without dhcp, each node is assigned a static ip.

The set up is on a physical server (36 core, 128g RAM, 20t total ssd) running ubuntu 16.04.
All the ceph nodes are VM running centos 7. the network bridged on the host.

Please do not change this file directly since it is managed by Ansible and will
l be overwritten

[global]
fsid = ba09e207-d873-4bca-ad02-ee76d76e335d
max open files = 131072
mon initial members = mon-node1,mon-node2,mon-node3
mon host = 10.145.82.101,10.145.82.102,10.145.82.103
public network = 10.145.82.0/24
cluster network = 10.145.82.0/24

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writaa
ble by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and alloo
wed by SELinux or AppArmor

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 4096

piwi3910 · 2017-04-08T08:37:47Z

you got it fixed, seems i have the same issue, Tried it first on 4 VM's worked perfect, now the same config on HW gives the same issue you have.

piwi3910 · 2017-04-08T08:52:13Z

I got it fixed by running with fqd for the mon and making sure the mon interface is the same one that maps to the fqdn name

gator1 · 2017-04-10T16:47:40Z

I dont have fqdn, there is no dns server on my network. I finally bypassed this by following this link.
http://www.virtualtothecore.com/en/adventures-ceph-storage-part-4-deploy-the-nodes-in-the-lab/

I did everything in this link about the system, not sure which one works, maybe SELinux?

Now the ceph -s is health HEALTH_WARN. It seems to be another long battle to fix it.

leseb · 2017-07-03T12:49:13Z

No activity. Closing this, feel free to re-open.

Masber · 2017-08-22T01:34:33Z

I am having same issue.

ceph-ansible deployment fails.

TASK [ceph-mon : collect admin and bootstrap keys] *****************************
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ok: [ceph-4-3]
ok: [ceph-4-2]

TASK [ceph-mon : wait for ceph.client.admin.keyring exists] ********************
fatal: [ceph-4-2]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}
fatal: [ceph-4-3]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring"}

And I can't create keyrings:

[root@ceph-4-2 ~]# ceph-create-keys --cluster ceph --id ceph-4-2
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'

ansible hosts file:

[root@kolla ceph-ansible]# cat /etc/ansible/hosts
[mons]
ceph-4-3
ceph-4-2

[osds]
ceph-4-3
ceph-4-2

[rgws]
ceph-4-2

[mgr]
ceph-4-2

this is the ceph.conf file:

# Please do not change this file directly since it is managed by Ansible and will be overwritten

[global]
fsid = 1439d592-555b-4ebc-89f7-4d93df2ae4b0
max open files = 131072
mon initial members = ceph-4-2
mon host = 10.1.0.42

public network = 10.1.0.0/16
cluster network = 10.1.0.0/16

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 5120


[client.rgw.ceph-4-2]
host = ceph-4-2
keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-4-2/keyring
rgw socket path = /tmp/radosgw-ceph-4-2.sock
log file = /var/log/ceph/ceph-rgw-ceph-4-2.log
rgw data = /var/lib/ceph/radosgw/ceph-rgw.ceph-4-2
rgw frontends = civetweb port=192.168.20.42:8080 num_threads=100

Any ideas?

Masber · 2017-08-22T06:46:05Z

I fixed my issue by changing the number of monitors from 2 to 1.
I think it would be nice if ansible script could advise the user about this kind of errors.

leseb · 2017-08-22T17:46:23Z

@Masber normally we advise going with odd numbers when it comes to monitors. So please stick with this recommendation.

ist0ne · 2017-10-09T06:15:08Z

I got it fixed by running:

systemctl disable firewalld
systemctl stop firewalld

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

reboot

acomisario · 2018-01-10T17:59:33Z

@ist0ne that was the answer !

aizuddin85 · 2018-03-31T07:16:11Z

use this rule to allow ceph creating cluster at the same time running the firewall. These rules inclusive of OSD, MON, RADOSGW and MDS.

firewall-cmd --zone=public --add-port=6789/tcp --permanent
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
firewall-cmd --zone=public --add-port=7480/tcp
firewall-cmd --permanent --zone=public --add-port=443/tcp
firewall-cmd --permanent --zone=public --add-port=80/tcp
firewall-cmd --reload

mflannery · 2018-04-14T00:30:29Z

I ran into the same issue. Executing the firewall commands above fixed the issue. The selinux commands above that did not fix the issue. Try firewall first.

Cheers!
Mike

leseb closed this as completed Jul 3, 2017

andymcc mentioned this issue Jul 5, 2017

Allow ceph-mon systemd overrides to be specified #1654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-ansible: "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring" #1380

ceph-ansible: "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring" #1380

gator1 commented Mar 16, 2017

gator1 commented Mar 17, 2017

WingkaiHo commented Mar 21, 2017

gator1 commented Mar 22, 2017

WingkaiHo commented Mar 22, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

leseb commented Apr 4, 2017 •

edited

gator1 commented Apr 4, 2017 via email

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

leseb commented Apr 5, 2017

OrFriedmann commented Apr 5, 2017

gator1 commented Apr 5, 2017

gator1 commented Apr 5, 2017 •

edited

piwi3910 commented Apr 8, 2017

piwi3910 commented Apr 8, 2017

gator1 commented Apr 10, 2017

leseb commented Jul 3, 2017

Masber commented Aug 22, 2017 •

edited

Masber commented Aug 22, 2017

leseb commented Aug 22, 2017

ist0ne commented Oct 9, 2017

acomisario commented Jan 10, 2018

aizuddin85 commented Mar 31, 2018

mflannery commented Apr 14, 2018

ceph-ansible: "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring" #1380

ceph-ansible: "Timeout when waiting for file /etc/ceph/ceph.client.admin.keyring" #1380

Comments

gator1 commented Mar 16, 2017

gator1 commented Mar 17, 2017

WingkaiHo commented Mar 21, 2017

gator1 commented Mar 22, 2017

WingkaiHo commented Mar 22, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

leseb commented Apr 4, 2017 • edited

gator1 commented Apr 4, 2017 via email

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

gator1 commented Apr 4, 2017

leseb commented Apr 5, 2017

OrFriedmann commented Apr 5, 2017

gator1 commented Apr 5, 2017

gator1 commented Apr 5, 2017 • edited

piwi3910 commented Apr 8, 2017

piwi3910 commented Apr 8, 2017

gator1 commented Apr 10, 2017

leseb commented Jul 3, 2017

Masber commented Aug 22, 2017 • edited

Masber commented Aug 22, 2017

leseb commented Aug 22, 2017

ist0ne commented Oct 9, 2017

acomisario commented Jan 10, 2018

aizuddin85 commented Mar 31, 2018

mflannery commented Apr 14, 2018

leseb commented Apr 4, 2017 •

edited

gator1 commented Apr 5, 2017 •

edited

Masber commented Aug 22, 2017 •

edited