Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can not deploy any type of Ceph cluster #700

Open
votdev opened this issue Jun 20, 2023 · 2 comments
Open

BUG: Can not deploy any type of Ceph cluster #700

votdev opened this issue Jun 20, 2023 · 2 comments
Labels

Comments

@votdev
Copy link
Member

votdev commented Jun 20, 2023

I can not deploy a multi or single-node cluster anymore, regardless if ses6, ses7, ses7p or pacific.

E.g. the following command(s) are used to deploy the cluster:

$ sesdev create ses7 --single-node --non-interactive ses7-mini
$ sesdev create pacific --single-node --non-interactive pacific-mini
$ sesdev create ses7p --non-interactive ses7p-default

One of the following errors always appears and aborts the deployment.

    master: ++ ceph-salt status
    master: Cluster:   1 minions, 0 hosts managed by cephadm
    master: OS:        SUSE Linux Enterprise Server 15 SP2
    master: Ceph RPMs: Not installed
    master: Config:    OK
    master: ++ zypper repos --details
    master: #  | Alias              | Name                                      | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                                                            | Service
    master: ---+--------------------+-------------------------------------------+---------+-----------+---------+----------+--------+----------------------------------------------------------------------------------------------------------------+--------
    master:  1 | SUSE_CA            | SUSE Internal CA Certificate (SLE_15_SP2) | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.suse.de/ibs/SUSE:/CA/SLE_15_SP2/                                                               |
    master:  2 | base               | base                                      | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Basesystem/15-SP2/x86_64/product/                    |
    master:  3 | devel-repo-1       | devel-repo-1                              | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/Devel:/Storage:/7.0/images/repo/SUSE-Enterprise-Storage-7-POOL-x86_64-Media1/ |
    master:  4 | product            | product                                   | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://dist.suse.de/ibs/SUSE/Products/SLE-Product-SLES/15-SP2/x86_64/product/                                  |
    master:  5 | product-update     | product-update                            | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://dist.suse.de/ibs/SUSE/Updates/SLE-Product-SLES/15-SP2/x86_64/update/                                    |
    master:  6 | server-apps        | server-apps                               | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Server-Applications/15-SP2/x86_64/product/           |
    master:  7 | server-apps-update | server-apps-update                        | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Server-Applications/15-SP2/x86_64/update/             |
    master:  8 | storage            | storage                                   | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/Storage/7/x86_64/product/                                       |
    master:  9 | storage-update     | storage-update                            | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/Storage/7/x86_64/update/                                         |
    master: 10 | update             | update                                    | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Basesystem/15-SP2/x86_64/update/                      |
    master: ++ zypper info cephadm
    master: ++ grep -E '(^Repo|^Version)'
    master: Repository     : storage-update
    master: Version        : 15.2.16.99+g96ce9b152f5-150200.3.37.1
    master: ++ ceph-salt --version
    master: ceph-salt 15.2.19+1649909331.ge2933b3
    master: ++ stdbuf -o0 ceph-salt -ldebug apply --non-interactive
    master: Syncing minions with the master...
    master: Checking if minions respond to ping...
    master: Pinging 1 minions...
    master: Checking if ceph-salt formula is available...
    master: Checking if minions have functioning DNS...
    master: Running DNS lookups on 1 minions...
    master: Checking if there is an existing Ceph cluster...
    master: No Ceph cluster deployed yet
    master: Installing python3-ntplib on master.ses7-mini.test...
    master: Probing external time server pool.ntp.org (attempt 1 of 10)...
    master: Checking for FQDN environment on 1 minions...
    master: All 1 minions have non-FQDN environment. Good.
    master: Resetting execution grains...
    master: Starting...
    master: Starting the execution of: salt -G 'ceph-salt:member' state.apply ceph-salt
    master: 
    master: 
    master: Finished execution of ceph-salt formula
    master: 
    master: Summary: Total=1 Succeeded=0 Warnings=0 Failed=1
    master: "ceph-salt apply" exit code: 0
    master: ++ echo '"ceph-salt apply" exit code: 0'
    master: ++ set +x
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: Warning: Permanently added 'master.ses7-mini.test' (ECDSA) to the list of known hosts.
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
    master: ...
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
    master: ...
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
    master: ...
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh master.ses7-mini.test cephadm ls
    master: MONs in cluster (actual/expected): 1/1 (3020 seconds to timeout)
    master: MGRs in cluster (actual/expected): 1/1 (3020 seconds to timeout)
    master: +++ set +x
    master: ++ ceph status
    master: 2023-06-20T15:18:13.533+0200 7f596648b700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
    master: 2023-06-20T15:18:13.533+0200 7f596648b700 -1 AuthRegistry(0x7f596005e778) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
    master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
    master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 AuthRegistry(0x7f596648a060) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
    master: 2023-06-20T15:18:13.541+0200 7f5965489700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    master: 2023-06-20T15:18:13.541+0200 7f596648b700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
    master: [errno 13] RADOS permission denied (error connecting to the cluster)
    master: +++ err_report 635
    master: +++ local hn
    master: +++ set +x
    master: Error in provisioner script trapped!
    master: => hostname: master
    master: => script: /tmp/vagrant-shell
    master: => line number: 635
    master: Bailing out!
Command '['vagrant', 'up', '--no-destroy-on-error', '--provision']' failed: ret=1 stderr:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
    master: Populating default values...
    master: Adding master.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/cephadm add master.ses7p-default.test
    master: Adding master.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/admin add master.ses7p-default.test
    master: Adding master.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/minions add node1.ses7p-default.test
    master: Adding node1.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node1.ses7p-default.test
    master: Adding node1.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/minions add node2.ses7p-default.test
    master: Adding node2.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node2.ses7p-default.test
    master: Adding node2.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/minions add node3.ses7p-default.test
    master: Adding node3.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/cephadm add node3.ses7p-default.test
    master: Adding node3.ses7p-default.test...
    master: 1 minion added.
    master: ++ ceph-salt config /ceph_cluster/roles/bootstrap set node1.ses7p-default.test
    master: Value set.
    master: ++ ceph-salt config /cephadm_bootstrap/mon_ip set 10.20.83.201
    master: Value set.
    master: ++ ceph-salt config /ssh/ generate
    master: Key pair generated.
    master: ++ ceph-salt config /time_server/servers add master.ses7p-default.test
    master: Value added.
    master: ++ ceph-salt config /time_server/external_servers add pool.ntp.org
    master: Value added.
    master: ++ ceph-salt config /time_server/subnet set 10.20.83.0/24
    master: Value set.
    master: ++ ceph-salt config /cephadm_bootstrap/ceph_image_path set registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph
    master: Value set.
    master: ++ ceph-salt config /cephadm_bootstrap/dashboard/username set admin
    master: Value set.
    master: ++ ceph-salt config /cephadm_bootstrap/dashboard/password set admin
    master: Value set.
    master: ++ ceph-salt config /cephadm_bootstrap/dashboard/force_password_update disable
    master: Disabled.
    master: ++ ceph-salt config ls
    master: o- / ......................................................................................................................... [...]
    master:   o- ceph_cluster ............................................................................................................ [...]
    master:   | o- minions ........................................................................................................ [Minions: 4]
    master:   | | o- master.ses7p-default.test ................................................................................ [cephadm, admin]
    master:   | | o- node1.ses7p-default.test ............................................................................. [bootstrap, cephadm]
    master:   | | o- node2.ses7p-default.test ........................................................................................ [cephadm]
    master:   | | o- node3.ses7p-default.test ........................................................................................ [cephadm]
    master:   | o- roles ................................................................................................................. [...]
    master:   |   o- admin ........................................................................................................ [Minions: 1]
    master:   |   | o- master.ses7p-default.test ........................................................................ [Other roles: cephadm]
    master:   |   o- bootstrap ...................................................................................... [node1.ses7p-default.test]
    master:   |   o- cephadm ...................................................................................................... [Minions: 4]
    master:   |   | o- master.ses7p-default.test .......................................................................... [Other roles: admin]
    master:   |   | o- node1.ses7p-default.test ....................................................................... [Other roles: bootstrap]
    master:   |   | o- node2.ses7p-default.test ............................................................................... [No other roles]
    master:   |   | o- node3.ses7p-default.test ............................................................................... [No other roles]
    master:   |   o- tuned ............................................................................................................... [...]
    master:   |     o- latency .................................................................................................... [no minions]
    master:   |     o- throughput ................................................................................................. [no minions]
    master:   o- cephadm_bootstrap ....................................................................................................... [...]
    master:   | o- advanced .............................................................................................................. [...]
    master:   | o- ceph_conf ............................................................................................................. [...]
    master:   | o- ceph_image_path ................................... [registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph]
    master:   | o- dashboard ............................................................................................................. [...]
    master:   | | o- force_password_update .......................................................................................... [disabled]
    master:   | | o- password .......................................................................................................... [admin]
    master:   | | o- ssl_certificate ................................................................................................. [not set]
    master:   | | o- ssl_certificate_key ............................................................................................. [not set]
    master:   | | o- username .......................................................................................................... [admin]
    master:   | o- mon_ip ....................................................................................................... [10.20.83.201]
    master:   o- containers .............................................................................................................. [...]
    master:   | o- registries_conf ................................................................................................... [enabled]
    master:   | | o- registries ........................................................................................................ [empty]
    master:   | o- registry_auth ......................................................................................................... [...]
    master:   |   o- password ........................................................................................................ [not set]
    master:   |   o- registry ........................................................................................................ [not set]
    master:   |   o- username ........................................................................................................ [not set]
    master:   o- ssh ............................................................................................................ [Key Pair set]
    master:   | o- private_key ............................................................... [36:de:a7:d5:d8:ea:30:7b:fe:63:5b:a9:45:83:23:dc]
    master:   | o- public_key ................................................................ [36:de:a7:d5:d8:ea:30:7b:fe:63:5b:a9:45:83:23:dc]
    master:   o- time_server ......................................................................................................... [enabled]
    master:     o- external_servers ........................................................................................................ [1]
    master:     | o- pool.ntp.org ........................................................................................................ [...]
    master:     o- servers ................................................................................................................. [1]
    master:     | o- master.ses7p-default.test ........................................................................................... [...]
    master:     o- subnet ...................................................................................................... [10.20.83.0/24]
    master: ++ ceph-salt export --pretty
    master: {
    master:     "bootstrap_minion": "node1.ses7p-default.test",
    master:     "bootstrap_mon_ip": "10.20.83.201",
    master:     "container": {
    master:         "images": {
    master:             "ceph": "registry.suse.de/devel/storage/7.0/pacific/containers/ses/7.1/ceph/ceph"
    master:         },
    master:         "registries_enabled": true
    master:     },
    master:     "dashboard": {
    master:         "password": "admin",
    master:         "password_update_required": false,
    master:         "username": "admin"
    master:     },
    master:     "minions": {
    master:         "admin": [
    master:             "master.ses7p-default.test"
    master:         ],
    master:         "all": [
    master:             "node1.ses7p-default.test",
    master:             "master.ses7p-default.test",
    master:             "node2.ses7p-default.test",
    master:             "node3.ses7p-default.test"
    master:         ],
    master:         "cephadm": [
    master:             "node1.ses7p-default.test",
    master:             "master.ses7p-default.test",
    master:             "node2.ses7p-default.test",
    master:             "node3.ses7p-default.test"
    master:         ],
    master:         "latency": [],
    master:         "throughput": []
    master:     },
    master:     "ssh": {
    master:         "private_key": "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEA2Kw94bCDaxEMkoY5ieTv2t4GK8torc/yp9AqiUH31OB+3nWn\nenM3u2vGrA9PIhSS/65lgUkZTvCLXarWpXqOJiuQxvinLc12MEFhn0ZJW+MamzM2\n93yuQFF3n65TqhPpYtr6Qb76xdjLpEmEFEcc6woohNBRpjI8jnQDXABeGELKpM73\nrx+DYUXOeE8MD+mX2PjvGj4xvfZ+ENUQfNasEd+rLC6m4YZHuQtQndwvSMVxiIri\netwsheCyyvywRfL9zPhAG3S/XFmpJz1CvGOCKiqNVfLcSWEYx0VuYFgKaA27N/40\nN7424z4CgvTHzNLwlACuCbKooEwmFwjmivk+UQIDAQABAoIBABPMxqFiiNnefpp/\nsU+eeAQ1TJVRMt1KTtOCwHZXTMtbYgCYeg/kqkPCZS75Pasgu++pO02hlVJbRTMP\nruqDjOykR8hE9fsHpuyhNudwC/ldgyOKXjQmxMIsJ600CCF3TRkrZ1nddstgZLic\nPrl/J597Z8k+Q63XMqU2aQ2t22tmOQ4gGz8VomjiZ3G3VCZgA352j1HUsouYU8mU\n+DB5d563yezs6pf9cNWqpV6HpLXnRzx9GpIxfDx5GAji8/8+4CTEV4Ei4iCoX75i\n/fQ649x2s/zNUFhPHbEgc3ZEpDndxaZoy1yV4/eYNN8ySeRWdrIf1Ylo8MmVJ6U/\nOAeny1kCgYEA43aVZlDlQO9m6kJq2bPxDvIxXgdwSjH2HJWEbbudD/+9njCVZjst\niDSdrC4/ad1/zzwI6H2Dlv76cY6EY+39bFfrgTcwHT2Yy4B/nLkcUexlM13LFZic\n6WPnb+DA4rhzKcHrua0AWSwpvCnckhV1pcNaZE5U0iBzFl3Bppn3hRkCgYEA89sa\nln6pKgJlsrNDo1gUTcjssKw2nwrqC7SHtSdpxQN1gueBDrn/ZS9rDKCDYAordlAF\nDItxF1kdeLdGLaIBh/YnXsBuMUIpuG9pg+nXGXFNwBxLe+wwsLSgW9Xfv6BCEUIo\n3a+CJzIvdRFwsKtlegYupV5lJ7BfUQ3tJ/+XMfkCgYEA3lHODjXtDL2xMi/+bZAR\ngVE47TWKDAqvCRseV35zMer9Mzs7GrOmeiUrItoFAv0Kacu8zTe4QQIwWIM6ZM18\nz8NTHHWLYlkNGYIbuFu5EV1jQIRg9Ve3reoGj/P1suMjNGIketNbrsyach3cRzAQ\nUBcTJ0zkXIh41BiJKMP+CCkCgYEArFB5OzsJgnvrLRlrhDMrNcPzLOykNEJcHCVX\nd/T/0o2dLgE0uxlHlVKqjGOoMec9yv7EcpbeNSdtoe2wE3LVLiQMsfG8a+Za4M8p\nemN08a+Ux1m3JTxDM7qPThWVZC10QgnEItJwYA4gZtMKFG0o6c8Qix5m0GLbF8WF\nfawoRNECgYAcgzUDaisvnn5kGpVttqyov77Qvk8Y86Pc+Xa96TDYYWs8ndFxjVC8\nY596EgD1wP8UqA3JnDNTLeSYJ/KJm1VUFEdIomTG2dlbX5IvpO5dPmpLXxmXmtbB\nXB2rYZ1yUBnb7M7wxV7YtBD4I6voFbSxMBi2OowMNz4uMtiqBemVcg==\n-----END RSA PRIVATE KEY-----",
    master:         "public_key": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYrD3hsINrEQyShjmJ5O/a3gYry2itz/Kn0CqJQffU4H7edad6cze7a8asD08iFJL/rmWBSRlO8Itdqtaleo4mK5DG+KctzXYwQWGfRklb4xqbMzb3fK5AUXefrlOqE+li2vpBvvrF2MukSYQURxzrCiiE0FGmMjyOdANcAF4YQsqkzvevH4NhRc54TwwP6ZfY+O8aPjG99n4Q1RB81qwR36ssLqbhhke5C1Cd3C9IxXGIiuJ63CyF4LLK/LBF8v3M+EAbdL9cWaknPUK8Y4IqKo1V8txJYRjHRW5gWApoDbs3/jQ3vjbjPgKC9MfM0vCUAK4JsqigTCYXCOaK+T5R"
    master:     },
    master:     "time_server": {
    master:         "enabled": true,
    master:         "external_time_servers": [
    master:             "pool.ntp.org"
    master:         ],
    master:         "server_hosts": [
    master:             "master.ses7p-default.test"
    master:         ],
    master:         "subnet": "10.20.83.0/24"
    master:     }
    master: }
    master: ++ ceph-salt status
    master: Cluster:   4 minions, 0 hosts managed by cephadm
    master: OS:        SUSE Linux Enterprise Server 15 SP3
    master: Ceph RPMs: Not installed
    master: Config:    OK
    master: ++ zypper repos --details
    master: #  | Alias              | Name                                      | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                                                                       | Service
    master: ---+--------------------+-------------------------------------------+---------+-----------+---------+----------+--------+---------------------------------------------------------------------------------------------------------------------------+--------
    master:  1 | SUSE_CA            | SUSE Internal CA Certificate (SLE_15_SP3) | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.suse.de/ibs/SUSE:/CA/SLE_15_SP3/                                                                          |
    master:  2 | base               | base                                      | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Basesystem/15-SP3/x86_64/product/                               |
    master:  3 | devel-repo-1       | devel-repo-1                              | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/Devel:/Storage:/7.0:/Pacific/images/repo/SUSE-Enterprise-Storage-7.1-POOL-x86_64-Media1/ |
    master:  4 | product            | product                                   | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://dist.suse.de/ibs/SUSE/Products/SLE-Product-SLES/15-SP3/x86_64/product/                                             |
    master:  5 | product-update     | product-update                            | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://dist.suse.de/ibs/SUSE/Updates/SLE-Product-SLES/15-SP3/x86_64/update/                                               |
    master:  6 | server-apps        | server-apps                               | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/SLE-Module-Server-Applications/15-SP3/x86_64/product/                      |
    master:  7 | server-apps-update | server-apps-update                        | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Server-Applications/15-SP3/x86_64/update/                        |
    master:  8 | storage            | storage                                   | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Products/Storage/7.1/x86_64/product/                                                |
    master:  9 | storage-update     | storage-update                            | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/Storage/7.1/x86_64/update/                                                  |
    master: 10 | update             | update                                    | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.nue.suse.com/ibs/SUSE/Updates/SLE-Module-Basesystem/15-SP3/x86_64/update/                                 |
    master: ++ grep -E '(^Repo|^Version)'
    master: ++ zypper info cephadm
    master: Repository     : storage-update
    master: Version        : 16.2.13.66+g54799ee0666-150300.3.11.1
    master: ++ ceph-salt --version
    master: ceph-salt 16.2.4+1671578301.g6193518
    master: ++ stdbuf -o0 ceph-salt -ldebug apply --non-interactive
    master: Syncing minions with the master...
    master: Checking if minions respond to ping...
    master: Pinging 4 minions...
    master: Checking if ceph-salt formula is available...
    master: Checking if minions have functioning DNS...
    master: Running DNS lookups on 4 minions...
    master: Checking if there is an existing Ceph cluster...
    master: No Ceph cluster deployed yet
    master: Installing python3-ntplib on master.ses7p-default.test...
    master: Probing external time server pool.ntp.org (attempt 1 of 10)...
    master: Checking for FQDN environment on 4 minions...
    master: All 4 minions have non-FQDN environment. Good.
    master: Resetting execution grains...
    master: Starting...
    master: Starting the execution of: salt -G 'ceph-salt:member' state.apply ceph-salt
    master: 
    master: 
    master: Finished execution of ceph-salt formula
    master: 
    master: Summary: Total=4 Succeeded=0 Warnings=0 Failed=4
    master: "ceph-salt apply" exit code: 0
    master: ++ echo '"ceph-salt apply" exit code: 0'
    master: ++ set +x
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: Warning: Permanently added 'node1.ses7p-default.test' (ECDSA) to the list of known hosts.
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3200 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3170 seconds to timeout)
    master: ...
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3140 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3110 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3080 seconds to timeout)
    master: ...
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3050 seconds to timeout)
    master: ...
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 0/1 (3020 seconds to timeout)
    master: MGRs in cluster (actual/expected): 0/1 (3020 seconds to timeout)
    master: ...
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ jq '[ .[].name | select(startswith("mon")) ] | length'
    master: +++ set +x
    master: +++ jq '[ .[].name | select(startswith("mgr")) ] | length'
    master: +++ ssh node1.ses7p-default.test cephadm ls
    master: +++ set +x
    master: MONs in cluster (actual/expected): 1/1 (2990 seconds to timeout)
    master: MGRs in cluster (actual/expected): 1/1 (2990 seconds to timeout)
    master: ++ ceph status
    master: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
    master: +++ err_report 659
    master: +++ local hn
    master: +++ set +x
    master: Error in provisioner script trapped!
    master: => hostname: master
    master: => script: /tmp/vagrant-shell
    master: => line number: 659
    master: Bailing out!
Command '['vagrant', 'up', '--no-destroy-on-error', '--provision']' failed: ret=1 stderr:
==> master: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
@votdev votdev added the bug label Jun 20, 2023
@tserong
Copy link
Member

tserong commented Jun 21, 2023

This looks like the same thing we've hit intermittently inside sesdev CI, notably the most recent failure of #696 (output at http://see.prv.suse.net:8080/blue/organizations/jenkins/sesdev-integration/detail/PR-696/1/pipeline). The notes in #689 and #691 are also relevant.

A couple of things stand out to me here in the above output:

    master: Finished execution of ceph-salt formula
    master: 
    master: Summary: Total=1 Succeeded=0 Warnings=0 Failed=1
    master: "ceph-salt apply" exit code: 0
    master: Finished execution of ceph-salt formula
    master: 
    master: Summary: Total=4 Succeeded=0 Warnings=0 Failed=4
    master: "ceph-salt apply" exit code: 0

That summary information is produced by ceph-salt at https://github.com/ceph/ceph-salt/blob/619351846592062c245e22555fea399d8f3d5c02/ceph_salt/execute.py#L1288. The counters indicate the number of minions on which salt -G 'ceph-salt:member' state.apply ceph-salt succeeded. So in both your test runs, that state.apply failed on all the minions.

Given that all the minions failed, surely it's incorrect for ceph-salt apply to give us exit code 0 indicating success! Also, why is ceph-salt not providing any diagnostic information about what exactly failed on the minions? So I reckon that's two ceph-salt bugs right there.

As for the subsequent failures running ceph status, the first ("auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory") will be due to /tmp/ceph.client.admin/keyring not having been copied to /etc/ceph/ yet. ceph-salt is meant to copy that file. The second ("Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)" is presumably a missing /etc/ceph/ceph.conf, which should have been created by cephadm bootstrap.

Honestly, it feels to me like what's happening here is ceph-salt invokes cephadm bootstrap but somehow execution returns to ceph-salt before cephadm bootstrap has actually completed. At least, looking at the logs I have in http://see.prv.suse.net:8080/blue/organizations/jenkins/sesdev-integration/detail/PR-696/1/artifacts, I can see there's still stuff happening in cephadm.out after ceph status fails. I don't really know how that's possible, but that's my current working hunch.

@tserong
Copy link
Member

tserong commented Jun 21, 2023

@votdev can you please try something for me? Remove any of my experimental patches applied to sesdev, then rerun sesdev create, but add the --salt parameter. This will force sesdev to just directly invoke salt state.apply to apply the ceph-salt formula, rather than running everything through the ceph-salt executor. For example:

sesdev create ses7p --salt --non-interactive ses7p-default

Does that give you a successful deployment?

tserong added a commit that referenced this issue Jun 21, 2023
After looking through ceph-salt logs from previous failed Jenkins
runs, and also at Volker's issue which seems to be the same thing
(see #700), my current
suspicion is that the ceph-salt executor is correctly starting
`salt -G 'ceph-salt:member' state.apply ceph-salt` but is then
failing to pick up some (or all) of the event notifications, which
results in it returning too soon, while `cephadm bootstrap` is
still running.

Assuming I'm on the right track here, let's ask `sesdev` to run
that salt command directly, to remove the ceph-salt executor
from the picture entirely.

Signed-off-by: Tim Serong <tserong@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants