fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

ChristopheSchmitz · 2016-04-05T05:15:14Z

Hi There,

I successfully installed etcd on a 3-node debian cluster (etcd-v2.2.0-linux-amd64) and run it on each machines with something like:

./etcd --name rollup-bX \
     --initial-advertise-peer-urls http://172.17.3.20X:2380 \
     --listen-peer-urls http://172.17.3.20X:2380 \
     --listen-client-urls http://172.17.3.20X:2379,http://127.0.0.1:2379 \
     --advertise-client-urls http://172.17.3.20X:2379 \
     --initial-cluster-token etcd-cluster-4 \
     --initial-cluster rollup-b1=http://172.17.3.201:2380,rollup-b2=http://172.17.3.202:2380,rollup-b3=http://172.17.3.203:2380 \
     --initial-cluster-state new

where X=1 for first node, X=2 for second node, X=3 for third node.

It seems to work fine, for example:

vagrant@rollup-box-01:~/etcd-v2.2.0-linux-amd64$ ./etcdctl cluster-health
member 1d8e4c9184f09415 is healthy: got healthy result from http://172.17.3.201:2379
member 5e0ed77c2d33c7ef is healthy: got healthy result from http://172.17.3.203:2379
member d768a871f39c51f6 is healthy: got healthy result from http://172.17.3.202:2379
cluster is healthy

Now I am trying to run fleetd (tag v0.11.5 on git) on each of the 3 nodes. I run sudo FLEET_PUBLIC_IP=172.17.3.20X ./etcd (X=1 for node 1, ...X=3 for node 3).

One of the node will display:

INFO server.go:157: Establishing etcd connectivity
INFO server.go:168: Starting server components
INFO engine.go:185: Engine leadership acquired

but the two others will display those error messages every few seconds:

ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([2742 != 2748]) [2748]

Finally, checking the list of machine with fleetctl on each nodes will show only one of my node:

vagrant@rollup-box-01:~/fleet$ fleetctl --endpoint=http://127.0.0.1:2379 list-machines
MACHINE IP METADATA
62c87d2f... 172.17.3.203 -

Any idea why those error messages, and why I only get 1 machine listed?
BTW, I am running those machine locally with vagrant / virtualbox. I can get it to work with coreos (well, it work out of the box) but I need to get it to work with debian

Thanks!

The text was updated successfully, but these errors were encountered:

jonboulle · 2016-04-05T13:28:05Z

This looks like #1181 - could you confirm that the machines have different machine-ids?

ChristopheSchmitz · 2016-04-06T01:26:54Z

Thanks Jonboulle,

Indeed I didn't realize the 3 nodes have the same machine-ids! Thanks :)
I have to find out why Vagrant is doing that (any hint welcome :) )

ChristopheSchmitz · 2016-04-06T03:42:40Z

Yea, changing the machind-id definitly solved my issue, thanks again, I am closing this issue.

tixxdz · 2016-04-06T07:37:30Z

@jonboulle I guess same issue #615

jonboulle added the kind/support label Apr 5, 2016

ChristopheSchmitz closed this as completed Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

ChristopheSchmitz commented Apr 5, 2016

jonboulle commented Apr 5, 2016

ChristopheSchmitz commented Apr 6, 2016

ChristopheSchmitz commented Apr 6, 2016

tixxdz commented Apr 6, 2016

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

Comments

ChristopheSchmitz commented Apr 5, 2016

jonboulle commented Apr 5, 2016

ChristopheSchmitz commented Apr 6, 2016

ChristopheSchmitz commented Apr 6, 2016

tixxdz commented Apr 6, 2016