Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Default settings for Node removal are not optimal #830

Closed
abronan opened this issue May 21, 2015 · 0 comments · Fixed by #884
Closed

Default settings for Node removal are not optimal #830

abronan opened this issue May 21, 2015 · 0 comments · Fixed by #884

Comments

@abronan
Copy link
Contributor

abronan commented May 21, 2015

Default settings for TTLs and Node removal on storage backends are not optimal and two annoying things could happen:

  • It takes time using Consul and Etcd to detect a node failure, if the value of the TTL is high, swarm will see those nodes as alive until those entries are expired.
  • This other scenario can happen:
    • Swarm agent registers onto discovery service
    • Manager sees the new entry: it is added to the list of known agents
    • Swarm agent fails
    • Manager removes the agent after the TTL expires
    • Swarm agent is back to life at the same time or directly after
    • It may take the Manager heartbeat time to register that node again although it's alive and working

One solution could be to reduce the default heartbeat and refresh time to a low value but at the risk of an increased pressure on the metadata storage cluster, especially if we use it for both discovery and metadata.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants