Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

Closed
ChristopheSchmitz opened this issue Apr 5, 2016 · 4 comments

Comments

@ChristopheSchmitz
Copy link

Hi There,

I successfully installed etcd on a 3-node debian cluster (etcd-v2.2.0-linux-amd64) and run it on each machines with something like:

./etcd --name rollup-bX \
     --initial-advertise-peer-urls http://172.17.3.20X:2380 \
     --listen-peer-urls http://172.17.3.20X:2380 \
     --listen-client-urls http://172.17.3.20X:2379,http://127.0.0.1:2379 \
     --advertise-client-urls http://172.17.3.20X:2379 \
     --initial-cluster-token etcd-cluster-4 \
     --initial-cluster rollup-b1=http://172.17.3.201:2380,rollup-b2=http://172.17.3.202:2380,rollup-b3=http://172.17.3.203:2380 \
     --initial-cluster-state new

where X=1 for first node, X=2 for second node, X=3 for third node.

It seems to work fine, for example:

vagrant@rollup-box-01:~/etcd-v2.2.0-linux-amd64$ ./etcdctl cluster-health
member 1d8e4c9184f09415 is healthy: got healthy result from http://172.17.3.201:2379
member 5e0ed77c2d33c7ef is healthy: got healthy result from http://172.17.3.203:2379
member d768a871f39c51f6 is healthy: got healthy result from http://172.17.3.202:2379
cluster is healthy

Now I am trying to run fleetd (tag v0.11.5 on git) on each of the 3 nodes. I run sudo FLEET_PUBLIC_IP=172.17.3.20X ./etcd (X=1 for node 1, ...X=3 for node 3).

One of the node will display:

INFO server.go:157: Establishing etcd connectivity
INFO server.go:168: Starting server components
INFO engine.go:185: Engine leadership acquired

but the two others will display those error messages every few seconds:

ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([2742 != 2748]) [2748]

Finally, checking the list of machine with fleetctl on each nodes will show only one of my node:

vagrant@rollup-box-01:~/fleet$ fleetctl --endpoint=http://127.0.0.1:2379 list-machines
MACHINE IP METADATA
62c87d2f... 172.17.3.203 -

Any idea why those error messages, and why I only get 1 machine listed?
BTW, I am running those machine locally with vagrant / virtualbox. I can get it to work with coreos (well, it work out of the box) but I need to get it to work with debian

Thanks!

@jonboulle
Copy link
Contributor

This looks like #1181 - could you confirm that the machines have different machine-ids?

@ChristopheSchmitz
Copy link
Author

Thanks Jonboulle,

Indeed I didn't realize the 3 nodes have the same machine-ids! Thanks :)
I have to find out why Vagrant is doing that (any hint welcome :) )

@ChristopheSchmitz
Copy link
Author

Yea, changing the machind-id definitly solved my issue, thanks again, I am closing this issue.

@tixxdz
Copy link
Contributor

tixxdz commented Apr 6, 2016

@jonboulle I guess same issue #615

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants