cluster-mgr HA considerations #53

mapuri · 2016-03-07T22:55:09Z

This bug tracks the items needed for cluster-mgr's HA. It s a big list and will be addressed by one or more future PRs preferably introducing small features.

Current behavior:

state management:
- clustermgr (and it's sub systems) keeps some state like global extra-vars, per node host-group (once a node is commissioned)
  - this state is lost on a process restart
- clustermgr is able to rebuild certain state like node's current inventory state (from collins db); node's current monitoring state (from serf's client interface)
  - if collins or serf also dies then this state can be lost
- clustermgr forgets some state like extra-vars specified when a node is commissioned which is expected to be provided everytime a node is commissioned
dependency on external processes:
- collins is used as inventory database and for node lifecycle management (primary state transition and logging events).
- while collins offers a rich feature set around node management itself (like power cycle etc), we are not using these features
- serf is used as node health monitoring service which helps with single point of node management.
number of clustermgr instances that can run:
- clustermgr is able to run behind a VIP, allowing multiple instances to be running at a time and only one instance serving the requests.
  - cluster-mgr has a minimal config that needs to be provided at process start. It is expected that all instances are started with similar config.
- collins is run as a container (with it's own local mysql db) which prevents node state to be available everywhere atm, so all clustermgr instances need to access same collins instance
  - if we can extract the mysql from collins container to be run on a distributed filesystem, it might be possible to make the info available everywhere
- serf runs on every node, so node monitoring state is available everywhere. However, care needs to be taken to ensure that only one cluster-mgr instance acts on it. Right now clustermgr doesn't do much on monitoring state changes.

Desired behavior and considerations for HA (++: priority high, +: low priority)

state availability:
- (++) the state should be available wherever a cluster-mgr instance is running
state restore on process restarts
- (++) if there are multiple instances, a new (or restarted) instance should be able to restore state from other instances.
- (+) if there is just a single instance, a restated instance should be able to restore state from local host.

Possible approaches (this space will be changing for a bit as I explore and document different approaches):

use a distributed memcache (like golang/groupcache) that each clustermgr instance can use to keep and share it's state.
- pros:
  - it's provided as a client/server lib with no need for a separate server
- cons:
  - it's not clear how peer additions/deletions are handled. Need to study the lib more.
  - it is not clear how the cache is updated (or flushed). Need to study the lib more.
  - persistence of state in single instance environment will be tricky
start with a in memory db (like boltdb)
- pros:
  - it's provided as a client/server lib with no need for a separate server
  - persistence of state is built-in
- cons:
  - distribution of state will need to be done separately

danehans · 2016-04-26T00:25:32Z

collins is run as a container (with it's own local mysql db) which prevents node state to be available everywhere atm, so all clustermgr instances need to access same collins instance

Consider using galera to replicate the sql db. I have used galera in past projects and it works well.

clustermgr (and it's sub systems) keeps some state like global extra-vars, per node host-group (once a node is commissioned). This state is lost on a process restart

What are your thoughts on having clustermgr use a K/V store such as etcd or boltdb to store persistent data? In the etcd scenario, we could leverage etcd's existing HA deployment model for clustermgr.

If all contiv/cluster services run as containers, I think this effects our approach to HA. For example, we should think in terms of "How do we ensure x # of clustermgr/collins/etc containers are always running in the contiv cluster"

mapuri · 2016-04-28T16:21:50Z

What are your thoughts on having clustermgr use a K/V store such as etcd or boltdb to store persistent data? In the etcd scenario, we could leverage etcd's existing HA deployment model for clustermgr.

Yeah, we have discussed this internally before as well and it is perfectly reasonable question. The challenge is that cluster manager is a service that is supposed to bootstrap a KV store in a cluster, so to bring up cluster manager if we need (or ask user) to setup etcd, it kind of beats the purpose of simplifying cluster provisioning. And is a chicken and egg problem.

For example, we should think in terms of "How do we ensure x # of clustermgr/collins/etc containers are always running in the contiv cluster"

yes, this kind of goes back to assigning semantics to the host-group management in clusterm as I just commented on here (#87 (comment)). This is not implemented yet but basically as clusterm is aware about how many master nodes are alive it can take care of ensuring a certain number of master nodes are always online which in turn makes sure that infrastructure services (like collins, K8s API server or swarm-replicas) keep functioning well even in case of node failures.

Note that this may sound like what k8s or swarm does for containers. But the difference is that cluster manager does it for these services instead, i.e. it ensures a certain instances of K8s API server instances are always running.

Does this makes sense?

danehans · 2016-04-29T15:42:36Z

Yeah, we have discussed this internally before as well and it is perfectly reasonable question. The challenge is that cluster manager is a service that is supposed to bootstrap a KV store in a cluster, so to bring up cluster manager if we need (or ask user) to setup etcd, it kind of beats the purpose of simplifying cluster provisioning. And is a chicken and egg problem.

My thought is not necessarily using a single etcd cluster for clusterm and the cluster scheduler. I agree that their should be clear separation between the two etcd clusters. However, I think their is value in minimizing the number of storage permutations in the overall solution. Creating an etcd backend for Collins in not a high-priority, but is a dev effort that would benefit us and the user.

Note that this may sound like what k8s or swarm does for containers. But the difference is that cluster manager does it for these services instead, i.e. it ensures a certain instances of K8s API server instances are always running

From a k8s standpoint, ^ is covered by podmaster: https://github.com/kubernetes/kubernetes/blob/master/docs/admin/high-availability/podmaster.yaml

mapuri · 2016-04-29T16:02:22Z

From a k8s standpoint, ^ is covered by podmaster:

hmm, podmaster seems to depend on etcd. Who ensure etcd is up and running?

danehans · 2016-04-29T16:40:28Z

I think that's where clusterm comes in. Let the cluster schedulers be responsible for managing their services (if supported within the cluster scheduler) and have clusterm handle the infra svcs the schedulers rely on. Thoughts?

mapuri · 2016-04-29T16:44:10Z

yep, I think we are on same page here :)

quoting myself from above (i didn't cover kv-stores and plugins in there, but I was intending for all infra services):

as clusterm is aware about how many master nodes are alive it can take care of ensuring a certain number of master nodes are always online which in turn makes sure that infrastructure services (like collins, K8s API server or swarm-replicas) keep functioning well even in case of node failures.

mapuri added this to the 0.2 milestone May 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-mgr HA considerations #53

cluster-mgr HA considerations #53

mapuri commented Mar 7, 2016

danehans commented Apr 26, 2016 •

edited

mapuri commented Apr 28, 2016

danehans commented Apr 29, 2016

mapuri commented Apr 29, 2016

danehans commented Apr 29, 2016

mapuri commented Apr 29, 2016

cluster-mgr HA considerations #53

cluster-mgr HA considerations #53

Comments

mapuri commented Mar 7, 2016

danehans commented Apr 26, 2016 • edited

mapuri commented Apr 28, 2016

danehans commented Apr 29, 2016

mapuri commented Apr 29, 2016

danehans commented Apr 29, 2016

mapuri commented Apr 29, 2016

danehans commented Apr 26, 2016 •

edited