You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
After a short etcd blip, fleet has issues on its agent and engine, but the process remains up. This is affecting v0.13.0
The symptoms are as follows: the Monitor detects the server failed heartbeat, asks all components to shut down, but the shutdown of all components never completes. This means that most `components are dead, the server process is still up, but serves: {"error":{"code":503,"message":"fleet server unable to communicate with etcd"}}
The full error log is here:
Dec 07 19:28:55 eu2-prod-core-hasu fleetd[3250]: ERROR engine.go:221: Engine leadership lost, renewal failed: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:56 eu2-prod-core-hasu fleetd[3250]: ERROR job.go:109: failed fetching all Units from etcd: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:56 eu2-prod-core-hasu fleetd[3250]: ERROR reconcile.go:120: Failed fetching Units from Registry: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:56 eu2-prod-core-hasu fleetd[3250]: ERROR reconcile.go:73: Unable to determine agent's desired state: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:59 eu2-prod-core-hasu fleetd[3250]: ERROR job.go:109: failed fetching all Units from etcd: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:59 eu2-prod-core-hasu fleetd[3250]: ERROR engine.go:236: Failed fetching Units from Registry: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:59 eu2-prod-core-hasu fleetd[3250]: ERROR reconciler.go:59: Failed getting current cluster state: client: etcd cluster is unavailable or misconfigured
Dec 07 19:28:59 eu2-prod-core-hasu fleetd[3250]: WARN engine.go:117: Engine completed reconciliation in 4.004575849s
Dec 07 19:29:01 eu2-prod-core-hasu fleetd[3250]: ERROR job.go:109: failed fetching all Units from etcd: client: etcd cluster is unavailable or misconfigured
Dec 07 19:29:01 eu2-prod-core-hasu fleetd[3250]: ERROR reconcile.go:120: Failed fetching Units from Registry: client: etcd cluster is unavailable or misconfigured
Dec 07 19:29:01 eu2-prod-core-hasu fleetd[3250]: ERROR reconcile.go:73: Unable to determine agent's desired state: client: etcd cluster is unavailable or misconfigured
Dec 07 19:29:02 eu2-prod-core-hasu fleetd[3250]: ERROR server.go:237: Server monitor triggered: Monitor timed out before successful heartbeat
Dec 07 19:29:03 eu2-prod-core-hasu fleetd[3250]: ERROR engine.go:221: Engine leadership lost, renewal failed: client: etcd cluster is unavailable or misconfigured
Dec 07 19:30:02 eu2-prod-core-hasu fleetd[3250]: ERROR server.go:248: Timed out waiting for server to shut down
After a short
etcd
blip,fleet
has issues on its agent and engine, but the process remains up. This is affectingv0.13.0
The symptoms are as follows: the
Monitor
detects the server failed heartbeat, asks allcomponents
to shut down, but the shutdown of all components never completes. This means that most `components are dead, the server process is still up, but serves:{"error":{"code":503,"message":"fleet server unable to communicate with etcd"}}
The full error log is here:
The curious bit is this code:
https://github.com/coreos/fleet/blob/v0.13.0/server/server.go#L248
I think that after
Timed out waiting for server to shut down
the server should just crash immediately.The text was updated successfully, but these errors were encountered: