Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap process hangs up for hours #2139

Closed
cactus-ale opened this issue Aug 5, 2023 · 2 comments
Closed

Bootstrap process hangs up for hours #2139

cactus-ale opened this issue Aug 5, 2023 · 2 comments
Labels

Comments

@cactus-ale
Copy link

What happened:

After running the following bootstrap command:
cephadm bootstrap --mon-ip 192.168.1.1 --cluster-network 192.168.1.0/24
And looking at the logs with:
journalctl -f | grep -e ceph -e mon

I get stuck in the bootsrapping process for hours

Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit systemd-timesyncd.service is enabled and running
Repeating the final host check...
docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: 125533a2-339a-11ee-8709-fbd7ec17aaf0
Verifying IP 192.168.1.1 port 3300 ...
Verifying IP 192.168.1.1 port 6789 ...
Mon IP `192.168.1.1` is in CIDR network `192.168.1.0/24`
Mon IP `192.168.1.1` is in CIDR network `192.168.1.0/24`
Pulling container image quay.io/ceph/ceph:v17...
Ceph version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...

And get on the logs:

Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:14 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-9ef7732a-3362-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-4c48ec0e-3393-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-2480dcfa-3392-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: /etc/systemd/system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Aug 05 14:12:15 cactus-router systemd[1]: Created slice Slice /system/ceph-125533a2-339a-11ee-8709-fbd7ec17aaf0.
Aug 05 14:12:15 cactus-router systemd[1]: Started Ceph mon.cactus-router for 125533a2-339a-11ee-8709-fbd7ec17aaf0.
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.919+0000 7f361db0db80  0 set uid:gid to 167:167 (ceph:ceph)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.919+0000 7f361db0db80  0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process ceph-mon, pid 7
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: SST files in /var/lib/ceph/mon/ceph-cactus-router/store.db dir, Total Num: 0, files:
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: Write Ahead Log file in /var/lib/ceph/mon/ceph-cactus-router/store.db: 000004.log size: 823 ;
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb:                                 Options.wal_dir: /var/lib/ceph/mon/ceph-cactus-router/store.db
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.923+0000 7f361db0db80  4 rocksdb: [db/version_set.cc:4724] Recovering from manifest file: /var/lib/ceph/mon/ceph-cactus-router/store.db/MANIFEST-000003
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.931+0000 7f361db0db80  4 rocksdb: [db/version_set.cc:4764] Recovered from manifest file:/var/lib/ceph/mon/ceph-cactus-router/store.db/MANIFEST-000003 succeeded,manifest_file_number is 3, next_file_number is 5, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.935+0000 7f361db0db80  4 rocksdb: [file/delete_scheduler.cc:69] Deleted file /var/lib/ceph/mon/ceph-cactus-router/store.db/000004.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  0 starting mon.cactus-router rank 0 at public addrs [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] at bind addrs [v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0] mon_data /var/lib/ceph/mon/ceph-cactus-router fsid 125533a2-339a-11ee-8709-fbd7ec17aaf0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  1 mon.cactus-router@-1(???) e0 preinit fsid 125533a2-339a-11ee-8709-fbd7ec17aaf0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  0 mon.cactus-router@-1(probing) e0  my rank is now 0 (was -1)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.939+0000 7f361db0db80  1 mon.cactus-router@0(probing) e0 win_standalone_election
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  0 log_channel(cluster) log [INF] : mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting full_ratio = 0.95
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 do_prune osdmap full prune enabled
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.943+0000 7f361db0db80  1 mon.cactus-router@0(leader) e0 _apply_compatset_features enabling new quorum features: compat={},rocompat={},incompat={4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code,7=support shec erasure code}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 0..0) refresh upgraded, format 3 -> 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(probing) e1 win_standalone_election
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  0 log_channel(cluster) log [INF] : mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  0 log_channel(cluster) log [DBG] : monmap e1: 1 mons at {cactus-router=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0]} removed_ranks: {}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mgrc update_daemon_metadata mon.cactus-router metadata {addrs=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0],arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable),ceph_version_short=17.2.6,compression_algorithms=none, snappy, zlib, zstd, lz4,container_hostname=cactus-router,container_image=quay.io/ceph/ceph:v17,cpu=Intel(R) Core(TM) i5-4430S CPU @ 2.70GHz,device_ids=sda=ATA_SAMSUNG_MZ7TY256_S307NWAH817321,device_paths=sda=/dev/disk/by-path/pci-0000:00:1f.2-ata-1,devices=sda,distro=centos,distro_description=CentOS Stream 8,distro_version=8,hostname=cactus-router,kernel_description=#85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023,kernel_version=5.15.0-78-generic,mem_swap_kb=4194300,mem_total_kb=7942232,os=Linux}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting full_ratio = 0.95
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.947+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 do_prune osdmap full prune enabled
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader) e1 _apply_compatset_features enabling new quorum features: compat={},rocompat={},incompat={8=support monmap features,9=luminous ondisk layout,10=mimic ondisk layout,11=nautilus ondisk layout,12=octopus ondisk layout,13=pacific ondisk layout,14=quincy ondisk layout}
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).mds e1 new map
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).mds e1 print_map
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 0..0) refresh upgraded, format 3 -> 0
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  1 mon.cactus-router@0(leader).osd e0 _set_cache_ratios kv ratio 0.25 inc ratio 0.375 full ratio 0.375
equires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).osd e1 crush map has features 288514050185494528, adjusting msgr requires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.951+0000 7f360e562700  0 mon.cactus-router@0(leader).osd e1 crush map has features 288514050185494528, adjusting msgr requires
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.955+0000 7f360e562700  1 mon.cactus-router@0(leader).paxosservice(auth 1..1) refresh upgraded, format 0 -> 3
Aug 05 14:12:15 cactus-router bash[64494]: debug 2023-08-05T14:12:15.955+0000 7f360e562700  0 log_channel(cluster) log [DBG] : mgrmap e1: no daemons active
Aug 05 14:12:15 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.947488+0000 mon.cactus-router (mon.0) 1 : cluster [INF] mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.953648+0000 mon.cactus-router (mon.0) 2 : cluster [INF] mon.cactus-router is new leader, mons cactus-router in quorum (ranks 0)
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.953954+0000 mon.cactus-router (mon.0) 3 : cluster [DBG] monmap e1: 1 mons at {cactus-router=[v2:192.168.1.1:3300/0,v1:192.168.1.1:6789/0]} removed_ranks: {}
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.957073+0000 mon.cactus-router (mon.0) 4 : cluster [DBG] fsmap
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.960709+0000 mon.cactus-router (mon.0) 5 : cluster [DBG] osdmap e1: 0 total, 0 up, 0 in
Aug 05 14:12:16 cactus-router bash[64494]: cluster 2023-08-05T14:12:15.961244+0000 mon.cactus-router (mon.0) 6 : cluster [DBG] mgrmap e1: no daemons active
Aug 05 14:12:20 cactus-router bash[64494]: debug 2023-08-05T14:12:20.939+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1019970681 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:25 cactus-router bash[64494]: debug 2023-08-05T14:12:25.955+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020053908 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:30 cactus-router bash[64494]: debug 2023-08-05T14:12:30.955+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020054723 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408
Aug 05 14:12:35 cactus-router bash[64494]: debug 2023-08-05T14:12:35.959+0000 7f361356c700  1 mon.cactus-router@0(leader).osd e1 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408

With the last lines which look similar are repeated indefensibly.

What you expected to happen:

The bootstrap should complete in a few minutes

Environment:

  • OS: Ubuntu 22.04.2 LTS
  • Kernel: Linux 5.15.0-78-generic 85-Ubuntu SMP x86_64
  • Docker version: 20.10.21
  • Ceph version: 17.2.6
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant