Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

store-volume stuck in start-pre/auto-restart loop on deisctl start platform #2317

Closed
EvanK opened this issue Oct 29, 2014 · 11 comments
Closed

Comments

@EvanK
Copy link

EvanK commented Oct 29, 2014

I built a fresh 0.15.0 cluster from bare-metal docs, locally using the deisctl binary provided by @carmstrong (per our conversation in #2300).

Upon deisctl start platform, it hangs and loops on starting store-volume:

$ deisctl start platform
● ▴ ■
■ ● ▴ Starting Deis...
▴ ■ ●

Storage subsystem...
deis-store-monitor.service: activating/start-pre                                 
deis-store-monitor.service: active/running                                 
deis-store-daemon.service: activating/start-pre                                 
deis-store-daemon.service: active/running                                 
deis-store-metadata.service: activating/start-pre                                 
deis-store-metadata.service: active/running                                 
deis-store-gateway.service: activating/start-pre                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-gateway.service: active/running                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
deis-store-volume.service: activating/auto-restart                                 
deis-store-volume.service: activating/start-pre                                 
^C $

Following are commands run within the first node of the cluster:

core@deis-staging-node1 ~ $ nse deis-store-monitor
Error: No such image or container: deis-store-monitor
nsenter: option '--target' requires an argument

Usage:
 nsenter [options] <program> [args...]

Options:
 -t, --target <pid>     target process to get namespaces from
 -m, --mount [=<file>]  enter mount namespace
 -u, --uts   [=<file>]  enter UTS namespace (hostname etc)
 -i, --ipc   [=<file>]  enter System V IPC namespace
 -n, --net   [=<file>]  enter network namespace
 -p, --pid   [=<file>]  enter pid namespace
 -U, --user  [=<file>]  enter user namespace
 -S, --setuid <uid>     set uid in user namespace
 -G, --setgid <gid>     set gid in user namespace
 -r, --root  [=<dir>]   set the root directory
 -w, --wd    [=<dir>]   set the working directory
 -F, --no-fork          do not fork before exec'ing <program>

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see nsenter(1).
core@deis-staging-node1 ~ $ ceph -s
-bash: ceph: command not found
core@deis-staging-node1 ~ $ deisctl list
UNIT        MACHINE     LOAD  ACTIVE    SUB
deis-builder.service    43576a9c.../10.60.61.36 loaded  inactive  dead
deis-cache.service    7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-controller.service   129a0735.../10.60.61.35 loaded  inactive  dead
deis-database.service   7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-logger.service   43576a9c.../10.60.61.36 loaded  inactive  dead
deis-logspout.service   129a0735.../10.60.61.35 loaded  inactive  dead
deis-logspout.service   43576a9c.../10.60.61.36 loaded  inactive  dead
deis-logspout.service   7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-publisher.service    129a0735.../10.60.61.35 loaded  inactive  dead
deis-publisher.service    43576a9c.../10.60.61.36 loaded  inactive  dead
deis-publisher.service    7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-registry.service   129a0735.../10.60.61.35 loaded  inactive  dead
deis-router@1.service   43576a9c.../10.60.61.36 loaded  inactive  dead
deis-router@2.service   129a0735.../10.60.61.35 loaded  inactive  dead
deis-router@3.service   7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-store-daemon.service 129a0735.../10.60.61.35 loaded  active    running
deis-store-daemon.service 43576a9c.../10.60.61.36 loaded  active    running
deis-store-daemon.service 7ebe28ba.../10.60.61.37 loaded  active    running
deis-store-gateway.service  129a0735.../10.60.61.35 loaded  active    running
deis-store-metadata.service 129a0735.../10.60.61.35 loaded  active    running
deis-store-metadata.service 43576a9c.../10.60.61.36 loaded  active    running
deis-store-metadata.service 7ebe28ba.../10.60.61.37 loaded  active    running
deis-store-monitor.service  129a0735.../10.60.61.35 loaded  inactive  dead
deis-store-monitor.service  43576a9c.../10.60.61.36 loaded  inactive  dead
deis-store-monitor.service  7ebe28ba.../10.60.61.37 loaded  inactive  dead
deis-store-volume.service 129a0735.../10.60.61.35 loaded  activating  start-pre
deis-store-volume.service 43576a9c.../10.60.61.36 loaded  activating  start-pre
deis-store-volume.service 7ebe28ba.../10.60.61.37 loaded  activating  start-pre
core@deis-staging-node1 ~ $ journalctl -u deis-store-monitor
-- Logs begin at Wed 2014-10-29 19:51:29 UTC, end at Wed 2014-10-29 20:13:27 UTC. --
Oct 29 20:01:46 deis-staging-node1 systemd[1]: Starting deis-store-monitor...
Oct 29 20:01:48 deis-staging-node1 sh[992]: Unable to find image 'ubuntu-debootstrap:14.04' locally
Oct 29 20:01:49 deis-staging-node1 sh[992]: ubuntu-debootstrap:14.04: The image you are pulling has been verified
Oct 29 20:02:18 deis-staging-node1 sh[992]: Status: Downloaded newer image for ubuntu-debootstrap:14.04
Oct 29 20:02:22 deis-staging-node1 sh[1091]: Pulling repository deis/store-monitor
Oct 29 20:02:51 deis-staging-node1 sh[1091]: Status: Downloaded newer image for deis/store-monitor:v0.15.0
Oct 29 20:02:51 deis-staging-node1 systemd[1]: Started deis-store-monitor.
Oct 29 20:02:52 deis-staging-node1 sh[1158]: store-monitor: Ceph hasn't yet been deployed. Trying to deploy...
Oct 29 20:02:53 deis-staging-node1 sh[1158]: store-monitor: obtained the lock to proceed with setting up.
Oct 29 20:02:55 deis-staging-node1 sh[1158]: creating /etc/ceph/ceph.client.admin.keyring
Oct 29 20:02:55 deis-staging-node1 sh[1158]: creating /etc/ceph/ceph.mon.keyring
Oct 29 20:02:56 deis-staging-node1 sh[1158]: monmaptool: monmap file /etc/ceph/monmap
Oct 29 20:02:56 deis-staging-node1 sh[1158]: monmaptool: set fsid to 6e7e9954-b370-42e3-ac3d-7f9534b3cad0
Oct 29 20:02:56 deis-staging-node1 sh[1158]: monmaptool: writing epoch 0 to /etc/ceph/monmap (1 monitors)
Oct 29 20:02:56 deis-staging-node1 sh[1158]: store-monitor: setup complete.
Oct 29 20:02:57 deis-staging-node1 sh[1158]: creating /tmp/ceph.mon.keyring
Oct 29 20:02:57 deis-staging-node1 sh[1158]: importing contents of /etc/ceph/ceph.client.admin.keyring into /tmp/ceph.mon.keyring
Oct 29 20:02:57 deis-staging-node1 sh[1158]: importing contents of /etc/ceph/ceph.mon.keyring into /tmp/ceph.mon.keyring
Oct 29 20:02:57 deis-staging-node1 sh[1158]: ceph-mon: set fsid to 6e7e9954-b370-42e3-ac3d-7f9534b3cad0
Oct 29 20:02:58 deis-staging-node1 sh[1158]: ceph-mon: created monfs at /var/lib/ceph/mon/ceph-deis-staging-node1 for mon.deis-staging-node1
Oct 29 20:02:58 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.296119 7fd05e474800  0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-mon, pid 1
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.909360 7fd05e474800  0 starting mon.deis-staging-node1 rank 0 at 10.60.61.35:6789/0 mon_data /var/lib/ceph/mon/ceph-d
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.909580 7fd05e474800  1 mon.deis-staging-node1@-1(probing) e0 preinit fsid 6e7e9954-b370-42e3-ac3d-7f9534b3cad0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.909637 7fd05e474800  1 mon.deis-staging-node1@-1(probing) e0  initial_members deis-staging-node1, filtering seed monm
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.910183 7fd05e474800  0 mon.deis-staging-node1@-1(probing) e0  my rank is now 0 (was -1)
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.910225 7fd05e474800  1 mon.deis-staging-node1@0(probing) e0 win_standalone_election
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:58.941217 7fd05e474800  0 log [INF] : mon.deis-staging-node1@0 won leader election with quorum 0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.033994 7fd05e474800  1 mon.deis-staging-node1@0(probing) e1 win_standalone_election
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.071850 7fd05e474800  0 log [INF] : mon.deis-staging-node1@0 won leader election with quorum 0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.107422 7fd05e474800  0 log [INF] : monmap e1: 1 mons at {deis-staging-node1=10.60.61.35:6789/0}
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.197467 7fd05e474800  0 log [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.279343 7fd05e474800  0 mon.deis-staging-node1@0(leader).mds e1 print_map
Oct 29 20:02:59 deis-staging-node1 sh[1158]: epoch        1
Oct 29 20:02:59 deis-staging-node1 sh[1158]: flags        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: created        2014-10-29 20:02:59.197624
Oct 29 20:02:59 deis-staging-node1 sh[1158]: modified        2014-10-29 20:02:59.197644
Oct 29 20:02:59 deis-staging-node1 sh[1158]: tableserver        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: root        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: session_timeout        60
Oct 29 20:02:59 deis-staging-node1 sh[1158]: session_autoclose        300
Oct 29 20:02:59 deis-staging-node1 sh[1158]: max_file_size        1099511627776
Oct 29 20:02:59 deis-staging-node1 sh[1158]: last_failure        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: last_failure_osd_epoch        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: compat        compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separa
Oct 29 20:02:59 deis-staging-node1 sh[1158]: max_mds        1
Oct 29 20:02:59 deis-staging-node1 sh[1158]: in
Oct 29 20:02:59 deis-staging-node1 sh[1158]: up        {}
Oct 29 20:02:59 deis-staging-node1 sh[1158]: failed
Oct 29 20:02:59 deis-staging-node1 sh[1158]: stopped
Oct 29 20:02:59 deis-staging-node1 sh[1158]: data_pools        0
Oct 29 20:02:59 deis-staging-node1 sh[1158]: metadata_pool        1
Oct 29 20:02:59 deis-staging-node1 sh[1158]: inline_data        disabled
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.279501 7fd05e474800  0 log [INF] : mdsmap e1: 0/0/1 up
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.388875 7fd05e474800  1 mon.deis-staging-node1@0(leader).osd e1 e1: 0 osds: 0 up, 0 in
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.426147 7fd05e474800  0 mon.deis-staging-node1@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requ
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.426205 7fd05e474800  0 mon.deis-staging-node1@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requ
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.426227 7fd05e474800  0 mon.deis-staging-node1@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requ
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.426247 7fd05e474800  0 mon.deis-staging-node1@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requ
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.429233 7fd05e474800  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.615104 7fd05e474800  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.827875 7fd05e474800  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.827951 7fd05e474800  0 log [INF] : mdsmap e1: 0/0/1 up
Oct 29 20:02:59 deis-staging-node1 sh[1158]: 2014-10-29 20:02:59.827979 7fd05e474800  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Oct 29 20:03:00 deis-staging-node1 sh[1158]: 2014-10-29 20:03:00.185887 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e1 handle_command mon_command({"prefix": "mon getmap"} v 0) v
Oct 29 20:03:01 deis-staging-node1 sh[1158]: 2014-10-29 20:03:01.551380 7fd057ea5700  1 mon.deis-staging-node1@0(leader) e1  adding peer 10.60.61.37:6789/0 to list of hints
Oct 29 20:03:01 deis-staging-node1 sh[1158]: 2014-10-29 20:03:01.552321 7fd057ea5700  1 mon.deis-staging-node1@0(leader) e1  adding peer 10.60.61.37:6789/0 to list of hints
Oct 29 20:03:01 deis-staging-node1 sh[1158]: 2014-10-29 20:03:01.552872 7fd057ea5700  0 mon.deis-staging-node1@0(leader).monmap v1 adding/updating deis-staging-node3 at 10.60.61.37:6
Oct 29 20:03:01 deis-staging-node1 sh[1158]: 2014-10-29 20:03:01.622897 7fd057ea5700  0 log [INF] : mon.deis-staging-node1 calling new monitor election
Oct 29 20:03:01 deis-staging-node1 sh[1158]: 2014-10-29 20:03:01.622936 7fd057ea5700  1 mon.deis-staging-node1@0(electing).elector(2) init, last seen epoch 2
Oct 29 20:03:03 deis-staging-node1 sh[1158]: 2014-10-29 20:03:03.552883 7fd05e46f700  0 -- 10.60.61.35:6789/0 >> 10.60.61.37:6789/0 pipe(0x3348f00 sd=20 :6789 s=2 pgs=1 cs=1 l=0 c=0x
Oct 29 20:03:03 deis-staging-node1 sh[1158]: 2014-10-29 20:03:03.554090 7fd0565a1700  0 -- 10.60.61.35:6789/0 >> 10.60.61.37:6789/0 pipe(0x3348a00 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x
Oct 29 20:03:03 deis-staging-node1 sh[1158]: 2014-10-29 20:03:03.554122 7fd0565a1700  0 -- 10.60.61.35:6789/0 >> 10.60.61.37:6789/0 pipe(0x3348a00 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.642436 7fd0586a6700  1 mon.deis-staging-node1@0(electing).elector(3) init, last seen epoch 3
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.687234 7fd057ea5700  0 log [INF] : mon.deis-staging-node1@0 won leader election with quorum 0,1
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.741480 7fd057ea5700  0 log [INF] : monmap e2: 2 mons at {deis-staging-node1=10.60.61.35:6789/0,deis-staging-node3=10.
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.741523 7fd057ea5700  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.741563 7fd057ea5700  0 log [INF] : mdsmap e1: 0/0/1 up
Oct 29 20:03:06 deis-staging-node1 sh[1158]: 2014-10-29 20:03:06.741616 7fd057ea5700  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Oct 29 20:03:11 deis-staging-node1 sh[1158]: 2014-10-29 20:03:11.721553 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e2 handle_command mon_command({"prefix": "mon getmap"} v 0) v
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.274294 7fd057ea5700  1 mon.deis-staging-node1@0(leader) e2  adding peer 10.60.61.36:6789/0 to list of hints
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.275211 7fd057ea5700  1 mon.deis-staging-node1@0(leader) e2  adding peer 10.60.61.36:6789/0 to list of hints
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.433560 7fd057ea5700  1 mon.deis-staging-node1@0(leader) e2  adding peer 10.60.61.36:6789/0 to list of hints
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.434784 7fd057ea5700  0 mon.deis-staging-node1@0(leader).monmap v2 adding/updating deis-staging-node2 at 10.60.61.36:6
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.512530 7fd057ea5700  0 log [INF] : mon.deis-staging-node1 calling new monitor election
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.512697 7fd057ea5700  1 mon.deis-staging-node1@0(electing).elector(4) init, last seen epoch 4
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.566597 7fd0565a1700  0 -- 10.60.61.35:6789/0 >> 10.60.61.37:6789/0 pipe(0x3348a00 sd=21 :6789 s=2 pgs=2 cs=1 l=0 c=0x
Oct 29 20:03:12 deis-staging-node1 sh[1158]: 2014-10-29 20:03:12.568262 7fd0564a0700  0 -- 10.60.61.35:6789/0 >> 10.60.61.37:6789/0 pipe(0x3348a00 sd=21 :34408 s=1 pgs=2 cs=2 l=0 c=0
Oct 29 20:03:14 deis-staging-node1 sh[1158]: 2014-10-29 20:03:14.434353 7fd05e46f700  0 -- 10.60.61.35:6789/0 >> 10.60.61.36:6789/0 pipe(0x3348f00 sd=20 :6789 s=2 pgs=2 cs=1 l=0 c=0x
Oct 29 20:03:14 deis-staging-node1 sh[1158]: 2014-10-29 20:03:14.437502 7fd05639f700  0 -- 10.60.61.35:6789/0 >> 10.60.61.36:6789/0 pipe(0x3349e00 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x
Oct 29 20:03:14 deis-staging-node1 sh[1158]: 2014-10-29 20:03:14.437525 7fd05639f700  0 -- 10.60.61.35:6789/0 >> 10.60.61.36:6789/0 pipe(0x3349e00 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x
Oct 29 20:03:17 deis-staging-node1 sh[1158]: 2014-10-29 20:03:17.534436 7fd0586a6700  1 mon.deis-staging-node1@0(electing).elector(5) init, last seen epoch 5
Oct 29 20:03:17 deis-staging-node1 sh[1158]: 2014-10-29 20:03:17.745428 7fd057ea5700  0 log [INF] : mon.deis-staging-node1@0 won leader election with quorum 0,1,2
Oct 29 20:03:18 deis-staging-node1 sh[1158]: 2014-10-29 20:03:18.303369 7fd057ea5700  0 log [INF] : monmap e3: 3 mons at {deis-staging-node1=10.60.61.35:6789/0,deis-staging-node2=10.
Oct 29 20:03:18 deis-staging-node1 sh[1158]: 2014-10-29 20:03:18.303404 7fd057ea5700  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:18 deis-staging-node1 sh[1158]: 2014-10-29 20:03:18.303430 7fd057ea5700  0 log [INF] : mdsmap e1: 0/0/1 up
Oct 29 20:03:18 deis-staging-node1 sh[1158]: 2014-10-29 20:03:18.303469 7fd057ea5700  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Oct 29 20:03:19 deis-staging-node1 sh[1158]: 2014-10-29 20:03:19.119803 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd create"} v 0) v
Oct 29 20:03:19 deis-staging-node1 sh[1158]: 2014-10-29 20:03:19.860985 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e2 e2: 1 osds: 0 up, 0 in
Oct 29 20:03:19 deis-staging-node1 sh[1158]: 2014-10-29 20:03:19.963551 7fd057ea5700  0 log [INF] : osdmap e2: 1 osds: 0 up, 0 in
Oct 29 20:03:20 deis-staging-node1 sh[1158]: 2014-10-29 20:03:20.328879 7fd057ea5700  0 log [INF] : pgmap v3: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:20 deis-staging-node1 sh[1158]: 2014-10-29 20:03:20.929718 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd create"} v 0) v
Oct 29 20:03:21 deis-staging-node1 sh[1158]: 2014-10-29 20:03:21.762605 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e3 e3: 2 osds: 0 up, 0 in
Oct 29 20:03:21 deis-staging-node1 sh[1158]: 2014-10-29 20:03:21.802858 7fd057ea5700  0 log [INF] : osdmap e3: 2 osds: 0 up, 0 in
Oct 29 20:03:23 deis-staging-node1 sh[1158]: 2014-10-29 20:03:23.075962 7fd057ea5700  0 log [INF] : pgmap v4: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:23 deis-staging-node1 sh[1158]: 2014-10-29 20:03:23.326621 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd create"} v 0) v
Oct 29 20:03:24 deis-staging-node1 sh[1158]: 2014-10-29 20:03:24.656524 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e4 e4: 3 osds: 0 up, 0 in
Oct 29 20:03:24 deis-staging-node1 sh[1158]: 2014-10-29 20:03:24.745846 7fd057ea5700  0 log [INF] : osdmap e4: 3 osds: 0 up, 0 in
Oct 29 20:03:25 deis-staging-node1 sh[1158]: 2014-10-29 20:03:25.761992 7fd057ea5700  0 log [INF] : pgmap v5: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:26 deis-staging-node1 sh[1158]: 2014-10-29 20:03:26.493162 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:29 deis-staging-node1 sh[1158]: 2014-10-29 20:03:29.251171 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:29 deis-staging-node1 sh[1158]: 2014-10-29 20:03:29.251410 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd crush add", "ar
Oct 29 20:03:30 deis-staging-node1 sh[1158]: 2014-10-29 20:03:30.619165 7fd055d99700  0 -- 10.60.61.35:6789/0 >> 10.60.61.36:0/1255056910 pipe(0x34d7980 sd=25 :6789 s=0 pgs=0 cs=0 l=
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.242501 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e5 e5: 3 osds: 0 up, 0 in
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.431100 7fd057ea5700  0 log [INF] : osdmap e5: 3 osds: 0 up, 0 in
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.787264 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.787921 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.788082 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:31 deis-staging-node1 sh[1158]: 2014-10-29 20:03:31.872661 7fd057ea5700  0 log [INF] : pgmap v6: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.054032 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "auth get-or-create"
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.105108 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd crush add", "ar
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.668762 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e6 e6: 3 osds: 1 up, 1 in
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.713645 7fd057ea5700  0 log [INF] : osd.0 10.60.61.37:6800/1 boot
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.713920 7fd057ea5700  0 log [INF] : osdmap e6: 3 osds: 1 up, 1 in
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.779161 7fd057ea5700  0 mon.deis-staging-node1@0(leader) e3 handle_command mon_command({"prefix": "osd crush add", "ar
Oct 29 20:03:34 deis-staging-node1 sh[1158]: 2014-10-29 20:03:34.888923 7fd057ea5700  0 log [INF] : pgmap v7: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:36 deis-staging-node1 sh[1158]: 2014-10-29 20:03:36.035741 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e7 e7: 3 osds: 1 up, 1 in
Oct 29 20:03:36 deis-staging-node1 sh[1158]: 2014-10-29 20:03:36.067707 7fd057ea5700  0 log [INF] : osdmap e7: 3 osds: 1 up, 1 in
Oct 29 20:03:36 deis-staging-node1 sh[1158]: 2014-10-29 20:03:36.356612 7fd057ea5700  0 log [INF] : pgmap v8: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:40 deis-staging-node1 sh[1158]: 2014-10-29 20:03:39.713730 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e8 e8: 3 osds: 2 up, 2 in
Oct 29 20:03:40 deis-staging-node1 sh[1158]: 2014-10-29 20:03:40.857174 7fd057ea5700  0 log [INF] : osd.1 10.60.61.36:6801/1 boot
Oct 29 20:03:41 deis-staging-node1 sh[1158]: 2014-10-29 20:03:40.857851 7fd057ea5700  0 log [INF] : osdmap e8: 3 osds: 2 up, 2 in
Oct 29 20:03:42 deis-staging-node1 sh[1158]: 2014-10-29 20:03:42.875474 7fd057ea5700  0 log [INF] : pgmap v9: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Oct 29 20:03:45 deis-staging-node1 sh[1158]: 2014-10-29 20:03:45.378339 7fd057ea5700  0 log [INF] : pgmap v10: 192 pgs: 192 creating; 0 bytes data, 5474 MB used, 152 MB / 5820 MB ava
Oct 29 20:03:52 deis-staging-node1 sh[1158]: 2014-10-29 20:03:52.487545 7fd057ea5700  1 mon.deis-staging-node1@0(leader).osd e9 e9: 3 osds: 3 up, 3 in
Oct 29 20:03:52 deis-staging-node1 sh[1158]: 2014-10-29 20:03:52.538201 7fd057ea5700  0 log [INF] : osd.2 10.60.61.35:6801/1 boot
Oct 29 20:03:53 deis-staging-node1 sh[1158]: 2014-10-29 20:03:52.538435 7fd057ea5700  0 log [INF] : osdmap e9: 3 osds: 3 up, 3 in
Oct 29 20:03:59 deis-staging-node1 sh[1158]: 2014-10-29 20:03:57.516726 7fd057ea5700  0 log [INF] : pgmap v11: 192 pgs: 192 creating; 0 bytes data, 5474 MB used, 152 MB / 5820 MB ava
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053693 7fd0586a6700  0 mon.deis-staging-node1@0(leader).data_health(6) update_stats avail 1% total 5960684 used 56655
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053770 7fd0586a6700 -1 mon.deis-staging-node1@0(leader).data_health(6) reached critical levels of available space on 
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053772 7fd0586a6700  0 ** Shutdown via Data Health Service **
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053821 7fd056ea3700 -1 mon.deis-staging-node1@0(leader) e3 *** Got Signal Interrupt ***
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053834 7fd056ea3700  1 mon.deis-staging-node1@0(leader) e3 shutdown
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054000 7fd056ea3700  0 quorum service shutdown
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054002 7fd056ea3700  0 mon.deis-staging-node1@0(shutdown).health(6) HealthMonitor::service_shutdown 1 services
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054065 7fd056ea3700  0 quorum service shutdown
Oct 29 20:04:01 deis-staging-node1 docker[1860]: Error response from daemon: No such container: deis-store-monitor
Oct 29 20:04:01 deis-staging-node1 docker[1860]: 2014/10/29 20:04:01 Error: failed to remove one or more containers
core@deis-staging-node1 ~ $ 
@carmstrong
Copy link
Contributor

@EvanK It looks like the monitors are crashing due to not having enough disk space available:

Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053693 7fd0586a6700  0 mon.deis-staging-node1@0(leader).data_health(6) update_stats avail 1% total 5960684 used 56655
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053770 7fd0586a6700 -1 mon.deis-staging-node1@0(leader).data_health(6) reached critical levels of available space on 
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053772 7fd0586a6700  0 ** Shutdown via Data Health Service **
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053821 7fd056ea3700 -1 mon.deis-staging-node1@0(leader) e3 *** Got Signal Interrupt ***
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053834 7fd056ea3700  1 mon.deis-staging-node1@0(leader) e3 shutdown
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054000 7fd056ea3700  0 quorum service shutdown
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054002 7fd056ea3700  0 mon.deis-staging-node1@0(shutdown).health(6) HealthMonitor::service_shutdown 1 services
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054065 7fd056ea3700  0 quorum service shutdown

@bacongobbler
Copy link
Member

The logs are cut off, but it seems like one of the monitors are crashing due to limited disk space.

edit: jinx

@EvanK
Copy link
Author

EvanK commented Oct 29, 2014

Huh, that does look a likely culprit:

core@deis-staging-node1 ~ $ sudo df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs          5.7G  5.5G   84M  99% /
devtmpfs        2.0G     0  2.0G   0% /dev
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           2.0G  484K  2.0G   1% /run
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda9       5.7G  5.5G   84M  99% /
/dev/sda3      1008M  288M  670M  31% /usr
tmpfs           2.0G     0  2.0G   0% /media
tmpfs           2.0G     0  2.0G   0% /tmp
/dev/sda6       108M   52K   99M   1% /usr/share/oem
/dev/sda9       5.7G  5.5G   84M  99% /var/lib/docker/btrfs

These machines were provisioned with 8G disk each, should I be aiming for 16/32?

@carmstrong
Copy link
Contributor

These machines were provisioned with 8G disk each, should I be aiming for 16/32?

Yep, I think each OSD will provision 5GB.

@carmstrong
Copy link
Contributor

@EvanK I'm going to close this as I don't think this is an issue with store. If you have trouble once you remedy the disk space issue, let us know. Thanks!

@codyaray
Copy link

I'm seeing this same issue with the deis-store-metadata.service unable to start for some reason:

deis-store-metadata.service: activating/start-pre
deis-store-metadata.service: activating/auto-restart

Right now I'm just providing the same information that EvanK provided. I'm (very) new to Deis/CoreOS/etc, so let me know if you need any more information.

core@ip-10-21-1-98:/# deisctl list
-bash: deisctl: command not found

core@ip-10-21-1-98 ~ $ nse deis-store-monitor
groups: cannot find name for group ID 11

root@ip-10-21-1-98:/# ceph -s
    cluster f118f0e8-f0bd-4ce9-bacb-b493bfeb306d
     health HEALTH_OK
     monmap e5: 5 mons at {ip-10-21-1-97=10.21.1.97:6789/0,ip-10-21-1-98=10.21.1.98:6789/0,ip-10-21-1-99=10.21.1.99:6789/0,ip-10-21-2-69=10.21.2.69:6789/0,ip-10-21-2-70=10.21.2.70:6789/0}, election epoch 12, quorum 0,1,2,3,4 ip-10-21-1-97,ip-10-21-1-98,ip-10-21-1-99,ip-10-21-2-69,ip-10-21-2-70
     osdmap e9: 5 osds: 5 up, 5 in
      pgmap v55: 192 pgs, 3 pools, 0 bytes data, 0 objects
            28700 MB used, 459 GB / 488 GB avail
                 192 active+clean

core@ip-10-21-1-98 ~ $ sudo df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs           98G  5.6G   92G   6% /
devtmpfs        3.7G     0  3.7G   0% /dev
tmpfs           3.7G     0  3.7G   0% /dev/shm
tmpfs           3.7G  276K  3.7G   1% /run
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
/dev/xvda9       98G  5.6G   92G   6% /
/dev/xvda3     1008M  286M  671M  30% /usr
tmpfs           3.7G     0  3.7G   0% /tmp
tmpfs           3.7G     0  3.7G   0% /media
/dev/xvda6      108M   56K   99M   1% /usr/share/oem
/dev/xvda9       98G  5.6G   92G   6% /var/lib/docker/btrfs

-- Logs begin at Tue 2014-12-16 01:58:24 UTC, end at Tue 2014-12-16 02:56:22 UTC. --
Dec 16 02:25:10 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Starting deis-store-monitor...
Dec 16 02:25:10 ip-10-21-1-98.us-west-2.compute.internal sh[1089]: Unable to find image 'ubuntu-debootstrap:14.04' locally
Dec 16 02:25:10 ip-10-21-1-98.us-west-2.compute.internal sh[1089]: Pulling repository ubuntu-debootstrap
Dec 16 02:25:21 ip-10-21-1-98.us-west-2.compute.internal sh[1175]: Pulling repository deis/store-monitor
Dec 16 02:27:52 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Started deis-store-monitor.
Dec 16 02:27:52 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: store-monitor: Ceph hasn't yet been deployed. Trying to deploy...
Dec 16 02:27:53 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: store-monitor: obtained the lock to proceed with setting up.
Dec 16 02:27:54 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: creating /etc/ceph/ceph.client.admin.keyring
Dec 16 02:27:54 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: creating /etc/ceph/ceph.mon.keyring
Dec 16 02:27:55 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: monmaptool: monmap file /etc/ceph/monmap
Dec 16 02:27:55 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: monmaptool: set fsid to f118f0e8-f0bd-4ce9-bacb-b493bfeb306d
Dec 16 02:27:55 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: monmaptool: writing epoch 0 to /etc/ceph/monmap (1 monitors)
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: store-monitor: setup complete.
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: creating /tmp/ceph.mon.keyring
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: importing contents of /etc/ceph/ceph.client.admin.keyring into /tmp/ceph.mon.keyring
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: importing contents of /etc/ceph/ceph.mon.keyring into /tmp/ceph.mon.keyring
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: ceph-mon: set fsid to f118f0e8-f0bd-4ce9-bacb-b493bfeb306d
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ip-10-21-1-98 for mon.ip-10-21-1-98
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.718503 7fcdd7d0b800  0 ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae), process ceph-mon, pid 1
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.738510 7fcdd7d0b800  0 starting mon.ip-10-21-1-98 rank 0 at 10.21.1.98:6789/0 mon_data /var/lib/ceph/mon/ceph-ip-10-21-1-98 fsid f118f0e8-f0bd-4ce9-bac
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.738731 7fcdd7d0b800  1 mon.ip-10-21-1-98@-1(probing) e0 preinit fsid f118f0e8-f0bd-4ce9-bacb-b493bfeb306d
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.738845 7fcdd7d0b800  1 mon.ip-10-21-1-98@-1(probing) e0  initial_members ip-10-21-1-98, filtering seed monmap
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.740575 7fcdd7d0b800  0 mon.ip-10-21-1-98@-1(probing) e0  my rank is now 0 (was -1)
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.740618 7fcdd7d0b800  1 mon.ip-10-21-1-98@0(probing) e0 win_standalone_election
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.743322 7fcdd7d0b800  0 log [INF] : mon.ip-10-21-1-98@0 won leader election with quorum 0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.751617 7fcdd7d0b800  1 mon.ip-10-21-1-98@0(probing) e1 win_standalone_election
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.753731 7fcdd7d0b800  0 log [INF] : mon.ip-10-21-1-98@0 won leader election with quorum 0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.756577 7fcdd7d0b800  0 log [INF] : monmap e1: 1 mons at {ip-10-21-1-98=10.21.1.98:6789/0}
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.763517 7fcdd7d0b800  0 log [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.774972 7fcdd7d0b800  0 mon.ip-10-21-1-98@0(leader).mds e1 print_map
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: epoch        1
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: flags        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: created        2014-12-16 02:27:56.763570
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: modified        2014-12-16 02:27:56.763599
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: tableserver        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: root        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: session_timeout        60
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: session_autoclose        300
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: max_file_size        1099511627776
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: last_failure        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: last_failure_osd_epoch        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: compat        compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned enc
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: max_mds        1
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: in
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: up        {}
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: failed
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: stopped
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: data_pools        0
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: metadata_pool        1
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: inline_data        disabled
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.775097 7fcdd7d0b800  0 log [INF] : mdsmap e1: 0/0/1 up
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.785792 7fcdd7d0b800  1 mon.ip-10-21-1-98@0(leader).osd e1 e1: 0 osds: 0 up, 0 in
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.794148 7fcdd7d0b800  0 mon.ip-10-21-1-98@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requires
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.794159 7fcdd7d0b800  0 mon.ip-10-21-1-98@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requires
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.794163 7fcdd7d0b800  0 mon.ip-10-21-1-98@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requires
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.794166 7fcdd7d0b800  0 mon.ip-10-21-1-98@0(leader).osd e1 crush map has features 1107558400, adjusting msgr requires
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.797009 7fcdd7d0b800  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.811738 7fcdd7d0b800  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.832671 7fcdd7d0b800  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.832724 7fcdd7d0b800  0 log [INF] : mdsmap e1: 0/0/1 up
Dec 16 02:27:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:27:56.832768 7fcdd7d0b800  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.414479 7fcdd1797700  0 mon.ip-10-21-1-98@0(leader) e1 handle_command mon_command({"prefix": "mon getmap"} v 0) v1
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.555584 7fcdd1797700  1 mon.ip-10-21-1-98@0(leader) e1  adding peer 10.21.2.69:6789/0 to list of hints
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.556915 7fcdd1797700  1 mon.ip-10-21-1-98@0(leader) e1  adding peer 10.21.2.69:6789/0 to list of hints
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.558135 7fcdd1797700  0 mon.ip-10-21-1-98@0(leader).monmap v1 adding/updating ip-10-21-2-69 at 10.21.2.69:6789/0 to monitor cluster
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.569846 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98 calling new monitor election
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.569909 7fcdd1797700  1 mon.ip-10-21-1-98@0(electing).elector(2) init, last seen epoch 2
Dec 16 02:28:01 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:01.592602 7fcdd1797700  0 mon.ip-10-21-1-98@0(electing) e2 handle_command mon_command({"prefix": "mon getmap"} v 0) v1
Dec 16 02:28:03 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:03.558690 7fcdcfe93700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304ea00 sd=21 :6789 s=2 pgs=1 cs=1 l=0 c=0x301e940).fault, initiating reconnect
Dec 16 02:28:03 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:03.560898 7fcdcfa8f700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f680 sd=23 :6789 s=0 pgs=0 cs=0 l=0 c=0x301eec0).accept connect_seq 0 vs exi
Dec 16 02:28:03 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:03.560912 7fcdcfa8f700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f680 sd=23 :6789 s=0 pgs=0 cs=0 l=0 c=0x301eec0).accept peer reset, then tri
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.578043 7fcdd1f98700  1 mon.ip-10-21-1-98@0(electing).elector(3) init, last seen epoch 3
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.602640 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98@0 won leader election with quorum 0,1
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.615977 7fcdd1797700  0 log [INF] : monmap e2: 2 mons at {ip-10-21-1-98=10.21.1.98:6789/0,ip-10-21-2-69=10.21.2.69:6789/0}
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.616042 7fcdd1797700  0 log [INF] : pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.616105 7fcdd1797700  0 log [INF] : mdsmap e1: 0/0/1 up
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.616179 7fcdd1797700  0 log [INF] : osdmap e1: 0 osds: 0 up, 0 in
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.726003 7fcdd1797700  0 mon.ip-10-21-1-98@0(leader) e2 handle_command mon_command({"prefix": "mon getmap"} v 0) v1
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.824446 7fcdd1797700  1 mon.ip-10-21-1-98@0(leader) e2  adding peer 10.21.1.97:6789/0 to list of hints
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.825312 7fcdd1797700  1 mon.ip-10-21-1-98@0(leader) e2  adding peer 10.21.1.97:6789/0 to list of hints
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.860145 7fcdd1797700  1 mon.ip-10-21-1-98@0(leader) e2  adding peer 10.21.1.97:6789/0 to list of hints
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.860989 7fcdd1797700  0 mon.ip-10-21-1-98@0(leader).monmap v2 adding/updating ip-10-21-1-97 at 10.21.1.97:6789/0 to monitor cluster
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.870273 7fcdd1797700  0 mon.ip-10-21-1-98@0(leader) e3  my rank is now 1 (was 0)
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.871784 7fcdcfc91700  0 -- 10.21.1.98:6789/0 >> 10.21.1.97:6789/0 pipe(0x304fe00 sd=20 :6789 s=0 pgs=0 cs=0 l=0 c=0x301f700).accept connect_seq 2 vs exi
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.871798 7fcdcfc91700  0 -- 10.21.1.98:6789/0 >> 10.21.1.97:6789/0 pipe(0x304fe00 sd=20 :6789 s=0 pgs=0 cs=0 l=0 c=0x301f700).accept we reset (peer sent
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.873622 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98 calling new monitor election
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.873670 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing).elector(4) init, last seen epoch 4
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.878402 7fcdcfb90700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f180 sd=21 :38834 s=2 pgs=4 cs=1 l=0 c=0x301e520).fault, initiating reconnec
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.880846 7fcdcf98e700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f900 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x301f5a0).accept connect_seq 0 vs exi
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.880861 7fcdcf98e700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f900 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x301f5a0).accept peer reset, then tri
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.881547 7fcdcfa8f700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f180 sd=20 :38836 s=4 pgs=4 cs=0 l=0 c=0x301e520).connect got RESETSESSION b
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.915089 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e3  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.915310 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e3  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:06 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:06.946611 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e3  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:08 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:08.861399 7fcdcfc91700  0 -- 10.21.1.98:6789/0 >> 10.21.1.97:6789/0 pipe(0x3050080 sd=22 :57341 s=2 pgs=4 cs=1 l=0 c=0x301de40).fault, initiating reconnec
Dec 16 02:28:08 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:08.863578 7fcdd7d07700  0 -- 10.21.1.98:6789/0 >> 10.21.1.97:6789/0 pipe(0x3050080 sd=22 :57345 s=1 pgs=4 cs=2 l=0 c=0x301de40).connect got RESETSESSION
Dec 16 02:28:08 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:08.865302 7fcdcfe93700  0 -- 10.21.1.98:6789/0 >> 10.21.1.97:6789/0 pipe(0x3050f80 sd=23 :6789 s=0 pgs=0 cs=0 l=0 c=0x301ec00).accept connect_seq 0 vs exi
Dec 16 02:28:08 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:08.946996 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e3  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:10 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:10.947109 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e3  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:11 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:11.875963 7fcdd1f98700  1 mon.ip-10-21-1-98@1(electing).elector(5) init, last seen epoch 5
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:11.979152 7fcdd1797700  0 mon.ip-10-21-1-98@1(peon) e3 handle_command mon_command({"prefix": "mon getmap"} v 0) v1
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.089747 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon) e3  adding peer 10.21.1.99:6789/0 to list of hints
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.090649 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon) e3  adding peer 10.21.1.99:6789/0 to list of hints
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.126691 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon) e3  adding peer 10.21.1.99:6789/0 to list of hints
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.139511 7fcdcf98e700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x304f900 sd=21 :6789 s=2 pgs=6 cs=1 l=0 c=0x301e520).fault with nothing to send,
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.142160 7fcdcf88d700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x3050800 sd=24 :6789 s=0 pgs=0 cs=0 l=0 c=0x301fb20).accept connect_seq 0 vs exi
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.142174 7fcdcf88d700  0 -- 10.21.1.98:6789/0 >> 10.21.2.69:6789/0 pipe(0x3050800 sd=24 :6789 s=0 pgs=0 cs=0 l=0 c=0x301fb20).accept peer reset, then tri
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.144016 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98 calling new monitor election
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.144849 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing).elector(6) init, last seen epoch 6
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.947325 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e4  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:12 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:12.948268 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e4  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:14 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:14.120802 7fcdd7d07700  0 -- 10.21.1.98:6789/0 >> 10.21.1.99:6789/0 pipe(0x3050a80 sd=22 :6789 s=2 pgs=1 cs=1 l=0 c=0x301fc80).fault, initiating reconnect
Dec 16 02:28:14 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:14.123299 7fcdcf98e700  0 -- 10.21.1.98:6789/0 >> 10.21.1.99:6789/0 pipe(0x3050580 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x3236840).accept connect_seq 0 vs exi
Dec 16 02:28:14 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:14.123353 7fcdcf98e700  0 -- 10.21.1.98:6789/0 >> 10.21.1.99:6789/0 pipe(0x3050580 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x3236840).accept peer reset, then tri
Dec 16 02:28:14 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:14.123584 7fcdcfd92700  0 -- 10.21.1.98:6789/0 >> 10.21.1.99:6789/0 pipe(0x3050a80 sd=21 :51837 s=4 pgs=1 cs=0 l=0 c=0x301fc80).connect got RESETSESSION b
Dec 16 02:28:14 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:14.948663 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e4  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:16 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:16.948844 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing) e4  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:18 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:18.949044 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon) e4  adding peer 10.21.2.70:6789/0 to list of hints
Dec 16 02:28:18 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:18.971019 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98 calling new monitor election
Dec 16 02:28:18 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:18.971121 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing).elector(8) init, last seen epoch 8
Dec 16 02:28:20 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:20.950189 7fcdcfa8f700  0 -- 10.21.1.98:6789/0 >> 10.21.2.70:6789/0 pipe(0x304f400 sd=20 :6789 s=2 pgs=2 cs=1 l=0 c=0x301f440).fault, initiating reconnect
Dec 16 02:28:20 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:20.952438 7fcdcfd92700  0 -- 10.21.1.98:6789/0 >> 10.21.2.70:6789/0 pipe(0x304f900 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x32366e0).accept connect_seq 0 vs exi
Dec 16 02:28:20 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:20.952453 7fcdcfd92700  0 -- 10.21.1.98:6789/0 >> 10.21.2.70:6789/0 pipe(0x304f900 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x32366e0).accept peer reset, then tri
Dec 16 02:28:24 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:24.982479 7fcdd1797700  0 log [INF] : mon.ip-10-21-1-98 calling new monitor election
Dec 16 02:28:24 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:24.982573 7fcdd1797700  1 mon.ip-10-21-1-98@1(electing).elector(10) init, last seen epoch 10
Dec 16 02:28:34 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:34.887792 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e2 e2: 1 osds: 0 up, 0 in
Dec 16 02:28:36 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:36.028116 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e3 e3: 4 osds: 0 up, 0 in
Dec 16 02:28:37 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:37.198632 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e4 e4: 4 osds: 0 up, 0 in
Dec 16 02:28:37 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:37.228379 7fcdd1797700  0 mon.ip-10-21-1-98@1(peon) e5 handle_command mon_command({"prefix": "osd crush add", "args": ["root=default", "host=ip-10-21-2-70
Dec 16 02:28:37 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:37.336740 7fcdd1797700  0 mon.ip-10-21-1-98@1(peon) e5 handle_command mon_command({"prefix": "osd crush add", "args": ["root=default", "host=ip-10-21-1-97
Dec 16 02:28:37 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:37.773735 7fcdd1797700  0 mon.ip-10-21-1-98@1(peon) e5 handle_command mon_command({"prefix": "osd create"} v 0) v1
Dec 16 02:28:38 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:38.242589 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e5 e5: 5 osds: 1 up, 1 in
Dec 16 02:28:39 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:39.283132 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e6 e6: 5 osds: 4 up, 4 in
Dec 16 02:28:39 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:39.623625 7fcdd1797700  0 mon.ip-10-21-1-98@1(peon) e5 handle_command mon_command({"prefix": "osd crush add", "args": ["root=default", "host=ip-10-21-1-98
Dec 16 02:28:40 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:40.440852 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e7 e7: 5 osds: 4 up, 4 in
Dec 16 02:28:41 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:41.550343 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e8 e8: 5 osds: 5 up, 5 in
Dec 16 02:28:42 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:42.633855 7fcdd1797700  1 mon.ip-10-21-1-98@1(peon).osd e9 e9: 5 osds: 5 up, 5 in
Dec 16 02:28:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:28:56.740259 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5867096 avail 96401672
Dec 16 02:29:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:29:56.740574 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5867916 avail 96400916
Dec 16 02:30:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:30:56.740867 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5868096 avail 96400768
Dec 16 02:31:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:31:56.741208 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5868536 avail 96400328
Dec 16 02:32:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:32:56.741511 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5867544 avail 96401224
Dec 16 02:33:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:33:56.741810 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5868236 avail 96400564
Dec 16 02:34:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:34:56.742103 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5868812 avail 96400020
Dec 16 02:35:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:35:56.742419 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5869520 avail 96399344
Dec 16 02:36:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:36:56.742714 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5869964 avail 96398900
Dec 16 02:37:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:37:56.743017 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5870652 avail 96398244
Dec 16 02:38:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:38:56.743312 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5871252 avail 96397708
Dec 16 02:39:56 ip-10-21-1-98.us-west-2.compute.internal sh[1250]: 2014-12-16 02:39:56.743605 7fcdd1f98700  0 mon.ip-10-21-1-98@1(peon).data_health(12) update_stats avail 94% total 102429676 used 5872468 avail 96396524

Where would more logs be stored?

-Cody

@codyaray
Copy link

Oh, also this is 1.1.0 on ec2 using the contrib/ec2 provisioner setup.

codyaray@mbpro:bin$ deisctl --version
1.1.0

@carmstrong
Copy link
Contributor

@codyaray Your cluster is perfectly healthy. Can you get the logs from deis-store-metadata on the machine where it's crashing? journalctl -fu deis-store-metadata.

@codyaray
Copy link

Here they are

core@ip-10-21-1-98 ~ $ journalctl -fu deis-store-metadata
-- Logs begin at Tue 2014-12-16 01:58:24 UTC. --
Dec 16 05:00:05 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Failed to start deis-store-metadata.
Dec 16 05:00:05 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Unit deis-store-metadata.service entered failed state.
Dec 16 05:00:10 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: deis-store-metadata.service holdoff time over, scheduling restart.
Dec 16 05:00:10 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Stopping deis-store-metadata...
Dec 16 05:00:10 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: Starting deis-store-metadata...
Dec 16 05:00:11 ip-10-21-1-98.us-west-2.compute.internal sh[18605]: Pulling repository deis/store-metadata
Dec 16 05:00:11 ip-10-21-1-98.us-west-2.compute.internal sh[18605]: 2014/12/16 05:00:11 Tag v0.13.1 not found in repository deis/store-metadata
Dec 16 05:00:11 ip-10-21-1-98.us-west-2.compute.internal systemd[1]: deis-store-metadata.service: control process exited, code=exited status=1

So it sounds like some components are running an old version then? I installed this a long time ago and thought I had updated it.

-Cody

@carmstrong
Copy link
Contributor

So it sounds like some components are running an old version then?

Yep, sounds like it. store-metadata didn't even exist until Deis v0.15, so I'm not surprised it can't find it :)

I'm guessing you used a more recent deisctl to deisctl install, even though the components you were running were older. I actually don't think you can successfully upgrade Deis v0.13 to the latest without reprovisioning your machines (or letting CoreOS update), as you'll need Ceph FS support in the kernel.

@carmstrong
Copy link
Contributor

@codyaray You may want to try following the upgrade documentation again, because it sounds like you missed a step. If you run into trouble, please open a fresh ticket. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants