Default settings for Node removal are not optimal #830

abronan · 2015-05-21T22:16:07Z

Default settings for TTLs and Node removal on storage backends are not optimal and two annoying things could happen:

It takes time using Consul and Etcd to detect a node failure, if the value of the TTL is high, swarm will see those nodes as alive until those entries are expired.
This other scenario can happen:
- Swarm agent registers onto discovery service
- Manager sees the new entry: it is added to the list of known agents
- Swarm agent fails
- Manager removes the agent after the TTL expires
- Swarm agent is back to life at the same time or directly after
- It may take the Manager heartbeat time to register that node again although it's alive and working

One solution could be to reduce the default heartbeat and refresh time to a low value but at the risk of an increased pressure on the metadata storage cluster, especially if we use it for both discovery and metadata.

The text was updated successfully, but these errors were encountered:

abronan added area/discovery kind/bug labels May 21, 2015

abronan mentioned this issue May 23, 2015

Discovery temporarily loses nodes #837

Closed

aluzzardi added the priority/P1 label May 27, 2015

aluzzardi added this to the 0.3.0 milestone May 27, 2015

aluzzardi assigned abronan May 27, 2015

abronan mentioned this issue Jun 1, 2015

store: Fix ephemeral behavior with Consul #884

Merged

aluzzardi closed this as completed in #884 Jun 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default settings for Node removal are not optimal #830

Default settings for Node removal are not optimal #830

abronan commented May 21, 2015

Default settings for Node removal are not optimal #830

Default settings for Node removal are not optimal #830

Comments

abronan commented May 21, 2015