New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Improved Node Management #1486

Closed
aluzzardi opened this Issue Dec 3, 2015 · 13 comments

Comments

Projects
None yet
9 participants
@aluzzardi
Contributor

aluzzardi commented Dec 3, 2015

When Swarm fails to connect to a node for the first time, it gives up and never retries.

This was designed to avoid continuously retrying to connect to invalid nodes.

This causes multiple problems (#1185 #1331) such as:

  • If the manager is started before a node then it will never see the node
  • It's very hard to debug for the user

Proposal

To fix those problems we could redesign the way Swarm does node management.

This is a possible approach I think we could explore:

  • Always add discovered nodes to the cluster
  • Do not attempt to synchronously connect to engines from discovery at all (this will make the discovery code much simpler since we won't need the goroutine anymore)
  • Let the Engine refresh loop, which currently handles retrying on failure, handle initial connection as well. This means that if an engine is not reachable at start, the refresh loop will try to reconnect after a random delay within range.
  • Make sure the rest of the code base can handle unhealthy nodes. Until we make first connection, we do not know the ID of the Engine which will be a problem since we index Engines by ID.
  • Perhaps we could do something more fancy than the random delay retry such as exponential back-off.

Reporting

Debugging node registration issues is really difficult for users and requires fiddling with logs. A proposal was already made (#1136) and I think it could be built on top of the improved node management.

Since we'd have a list of all nodes (even if we failed to connect to them), we could report all of them in docker info, along with their status and error message if any:

$ docker info
Nodes: 1
 dev: 10.0.0.123:4242
  └ Status: Down
  └ Error: 10.0.0.123: No route to host

This could include extra information such as:

  • Last seen (last successful heartbeat)
  • Number of retries
  • Scheduled next retry

Reporting must also take care of duplicate IDs: the problem confuses users very often (#1467). At the very least we should report it in docker info.

Existing Attempts

There have been a couple of attempts to fix this issue (#1044 #1195), however both approaches try to solve the problem at the discovery level rather than the engine level.

The downsides are:

  • It duplicates the reconnection problem in both engine and discovery. For instance, now that Engine retries to reconnect at a random interval we would have to reimplement that for discovery as well
  • It makes the discovery code path more complicated. It's already a beast. If we implement this proposal however, we would actually make it simpler since we don't need to connect to engines in goroutines anymore
  • It doesn't provide a way to report the status to the user, since the Cluster is not aware of those nodes.

/cc @docker/swarm-maintainers @MHBauer @lokikiller @ahmetalpbalkan @riuvshin

@lokikiller

This comment has been minimized.

Show comment
Hide comment
@lokikiller

lokikiller Dec 3, 2015

Treat it as an unhealthy node maybe a good idea~

Treat it as an unhealthy node maybe a good idea~

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Dec 3, 2015

Contributor

Great proposal! We may extend Node Management to cover node removal #1341, node maintenance #1230, etc. I think a node's life cycle can be modeled by a state machine. Corresponding actions can be triggered based on state, event and timer. It'd require persistent storage to store node states which is not available today.

Contributor

dongluochen commented Dec 3, 2015

Great proposal! We may extend Node Management to cover node removal #1341, node maintenance #1230, etc. I think a node's life cycle can be modeled by a state machine. Corresponding actions can be triggered based on state, event and timer. It'd require persistent storage to store node states which is not available today.

@jimmyxian

This comment has been minimized.

Show comment
Hide comment
@jimmyxian

jimmyxian Dec 3, 2015

Contributor

Cool! Totally agreed
I have tried to implement Expose un-healthy nodes in info and meet the same problem(When Swarm fails to connect to a node for the first time, it gives up and never retries. Then we can not get the un-health info of this node).
Should always add discovered nodes to the cluster and reconnect at engine level, so that we can expose un-healthy nodes in docker info. 👍

Contributor

jimmyxian commented Dec 3, 2015

Cool! Totally agreed
I have tried to implement Expose un-healthy nodes in info and meet the same problem(When Swarm fails to connect to a node for the first time, it gives up and never retries. Then we can not get the un-health info of this node).
Should always add discovered nodes to the cluster and reconnect at engine level, so that we can expose un-healthy nodes in docker info. 👍

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb Dec 3, 2015

Contributor

Great proposal, sounds good to me. Unhealthy node addresses definitely should be listed on /info and extra info such as last heartbeat, connection error, next retry would be useful as well. I'm not sure “number of retries” is going to be relevant.

Contributor

ahmetb commented Dec 3, 2015

Great proposal, sounds good to me. Unhealthy node addresses definitely should be listed on /info and extra info such as last heartbeat, connection error, next retry would be useful as well. I'm not sure “number of retries” is going to be relevant.

@MHBauer

This comment has been minimized.

Show comment
Hide comment
@MHBauer

MHBauer Dec 16, 2015

Member

Agree with concept. Agree with suggestion of state machine for managing node condition. Agree that unhealthy nodes should be kept aware of. I can see the details of implementing this will be considerable.

Member

MHBauer commented Dec 16, 2015

Agree with concept. Agree with suggestion of state machine for managing node condition. Agree that unhealthy nodes should be kept aware of. I can see the details of implementing this will be considerable.

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Dec 17, 2015

Contributor

Here is an implementation plan on node management improvement. It covers issues about node join, node maintenance, and refresh loop. Reporting, node remove are not covered. Feedbacks are welcome.

When a node is added to cluster through discovery, it's added to a pending list indexed by IP address. If the node is not reachable or has duplicate ID, it'll stay in pending state. Error message is associated with the node in docker info so user can correct them. When cluster validates the node, it retrieves its ID and adds it to engine map for task scheduling, and removes it from pending list. If the node's ID is already registered by another node, there are 2 possibilities. First one could be one node changes its IP address because of DHCP renew. This should be automatically resolved when the previous IP expires in discovery. The second case is ID duplication from VM clone. docker info will report this error and user should remove ID file from conflicting node.

A maintenance state (not implemented yet) is indication for node upgrade/maintenance. We may add commands for node management like set node to maintenance state. A node in maintenance will not be selected by cluster to run container. (Maintenance should be done directly to the node, using docker interface or not.) This state is used to do maintenance/debug work, or to put it in inactive state to drain/migrate running containers, collect data before deleting it.

Nodes are moved to disconnected state when they leave discovery.


                                add node thru discovery
                                          |     
                                          |    
                                          v            
                                    +-----------+
                                    |           |
                                    |  pending  |<-----+ 
                                    |           |      |      
                                    +-----------+      |
                                         |             |
                                         | validate    | 
                                         v             | 
+-----------+    refresh            +-----------+      |
|           |+--------------------->|           |      |
| unhealthy |                       |  healthy  |      |
|           |<---------------------+|           |      |
+-----------+  connection failure   +-----------+      |
       |                                   |           |
       |                                   |           | 
       | maintain                 maintain |           | 
       |           +-------------+         |           |
       +---------->|             |<--------+           |        
                   | maintenance |                     | 
                   |(not here yet|+--------------------+                  
                   +-------------+   maintenance done 


    remove node from discovery
                |
                |
                v
          +-------------+ 
          |             |        
          | disconnected|                   
          |             |                 
          +-------------+   

Several attributes will be added for node state. Next refresh time is calculated from current state and number of failed retries. A node fails recently can be probed frequently so it can recover fast (for example, VM reboot case). Retry interval takes a back-off strategy to reduce resource waste until it reaches a limit like 4 hours. (User can jumpstart it, maybe thru moving it in and out of maintenance state.)

  • UpdateAt
  • Last error
Contributor

dongluochen commented Dec 17, 2015

Here is an implementation plan on node management improvement. It covers issues about node join, node maintenance, and refresh loop. Reporting, node remove are not covered. Feedbacks are welcome.

When a node is added to cluster through discovery, it's added to a pending list indexed by IP address. If the node is not reachable or has duplicate ID, it'll stay in pending state. Error message is associated with the node in docker info so user can correct them. When cluster validates the node, it retrieves its ID and adds it to engine map for task scheduling, and removes it from pending list. If the node's ID is already registered by another node, there are 2 possibilities. First one could be one node changes its IP address because of DHCP renew. This should be automatically resolved when the previous IP expires in discovery. The second case is ID duplication from VM clone. docker info will report this error and user should remove ID file from conflicting node.

A maintenance state (not implemented yet) is indication for node upgrade/maintenance. We may add commands for node management like set node to maintenance state. A node in maintenance will not be selected by cluster to run container. (Maintenance should be done directly to the node, using docker interface or not.) This state is used to do maintenance/debug work, or to put it in inactive state to drain/migrate running containers, collect data before deleting it.

Nodes are moved to disconnected state when they leave discovery.


                                add node thru discovery
                                          |     
                                          |    
                                          v            
                                    +-----------+
                                    |           |
                                    |  pending  |<-----+ 
                                    |           |      |      
                                    +-----------+      |
                                         |             |
                                         | validate    | 
                                         v             | 
+-----------+    refresh            +-----------+      |
|           |+--------------------->|           |      |
| unhealthy |                       |  healthy  |      |
|           |<---------------------+|           |      |
+-----------+  connection failure   +-----------+      |
       |                                   |           |
       |                                   |           | 
       | maintain                 maintain |           | 
       |           +-------------+         |           |
       +---------->|             |<--------+           |        
                   | maintenance |                     | 
                   |(not here yet|+--------------------+                  
                   +-------------+   maintenance done 


    remove node from discovery
                |
                |
                v
          +-------------+ 
          |             |        
          | disconnected|                   
          |             |                 
          +-------------+   

Several attributes will be added for node state. Next refresh time is calculated from current state and number of failed retries. A node fails recently can be probed frequently so it can recover fast (for example, VM reboot case). Retry interval takes a back-off strategy to reduce resource waste until it reaches a limit like 4 hours. (User can jumpstart it, maybe thru moving it in and out of maintenance state.)

  • UpdateAt
  • Last error
@abronan

This comment has been minimized.

Show comment
Hide comment
@abronan

abronan Jan 8, 2016

Contributor

@dongluochen Do you think we can close this one or should we should keep it open because of the maintenance state?

Contributor

abronan commented Jan 8, 2016

@dongluochen Do you think we can close this one or should we should keep it open because of the maintenance state?

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Jan 8, 2016

Contributor

Thanks @abronan. I'd like to keep it for the maintenance state. I think we may come to it soon.

Contributor

dongluochen commented Jan 8, 2016

Thanks @abronan. I'd like to keep it for the maintenance state. I think we may come to it soon.

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen Jan 25, 2016

Contributor

The majority of this proposal is completed. State maintenance will be added later when we have swarm API that can control nodes.

Contributor

dongluochen commented Jan 25, 2016

The majority of this proposal is completed. State maintenance will be added later when we have swarm API that can control nodes.

@ghostsquad

This comment has been minimized.

Show comment
Hide comment
@ghostsquad

ghostsquad Jul 1, 2016

how do you remove a node from a swarm? (as part of testing and development of node management automation)

how do you remove a node from a swarm? (as part of testing and development of node management automation)

@abronan

This comment has been minimized.

Show comment
Hide comment
@abronan

abronan Jul 1, 2016

Contributor

Hi @ghostsquad, If using Consul, etcd or zookeeper for the node discovery: just stop the docker daemon on the node you want to remove and the node will be left out immediately of any scheduling decisions and removed from the list on info after some time (after discoveryTTL value expires). I admit that this is kind of hard for automation if you want an imperative way to remove and confirm the removal.

Alternatively, you can take a look at swarmkit and docker swarm mode (docker engine 1.12 RC3) which are including node management and are allowing you to add and remove nodes explicitly which is much easier for automation. Hope this helps!

Contributor

abronan commented Jul 1, 2016

Hi @ghostsquad, If using Consul, etcd or zookeeper for the node discovery: just stop the docker daemon on the node you want to remove and the node will be left out immediately of any scheduling decisions and removed from the list on info after some time (after discoveryTTL value expires). I admit that this is kind of hard for automation if you want an imperative way to remove and confirm the removal.

Alternatively, you can take a look at swarmkit and docker swarm mode (docker engine 1.12 RC3) which are including node management and are allowing you to add and remove nodes explicitly which is much easier for automation. Hope this helps!

@ghostsquad

This comment has been minimized.

Show comment
Hide comment
@ghostsquad

ghostsquad Jul 2, 2016

@abronan awesome. I'll upgrade and try it out.

@abronan awesome. I'll upgrade and try it out.

@jeffbaier

This comment has been minimized.

Show comment
Hide comment
@jeffbaier

jeffbaier Jul 2, 2016

Where does swarmkit fit in? Is swarm being replaced by swarmkit or do they fit different niches?

Where does swarmkit fit in? Is swarm being replaced by swarmkit or do they fit different niches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment