Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-mgr HA considerations #53

Open
mapuri opened this issue Mar 7, 2016 · 6 comments
Open

cluster-mgr HA considerations #53

mapuri opened this issue Mar 7, 2016 · 6 comments
Milestone

Comments

@mapuri
Copy link
Contributor

mapuri commented Mar 7, 2016

This bug tracks the items needed for cluster-mgr's HA. It s a big list and will be addressed by one or more future PRs preferably introducing small features.

Current behavior:

  • state management:
    • clustermgr (and it's sub systems) keeps some state like global extra-vars, per node host-group (once a node is commissioned)
      • this state is lost on a process restart
    • clustermgr is able to rebuild certain state like node's current inventory state (from collins db); node's current monitoring state (from serf's client interface)
      • if collins or serf also dies then this state can be lost
    • clustermgr forgets some state like extra-vars specified when a node is commissioned which is expected to be provided everytime a node is commissioned
  • dependency on external processes:
    • collins is used as inventory database and for node lifecycle management (primary state transition and logging events).
    • while collins offers a rich feature set around node management itself (like power cycle etc), we are not using these features
    • serf is used as node health monitoring service which helps with single point of node management.
  • number of clustermgr instances that can run:
    • clustermgr is able to run behind a VIP, allowing multiple instances to be running at a time and only one instance serving the requests.
      • cluster-mgr has a minimal config that needs to be provided at process start. It is expected that all instances are started with similar config.
    • collins is run as a container (with it's own local mysql db) which prevents node state to be available everywhere atm, so all clustermgr instances need to access same collins instance
      • if we can extract the mysql from collins container to be run on a distributed filesystem, it might be possible to make the info available everywhere
    • serf runs on every node, so node monitoring state is available everywhere. However, care needs to be taken to ensure that only one cluster-mgr instance acts on it. Right now clustermgr doesn't do much on monitoring state changes.

Desired behavior and considerations for HA (++: priority high, +: low priority)

  • state availability:
    • (++) the state should be available wherever a cluster-mgr instance is running
  • state restore on process restarts
    • (++) if there are multiple instances, a new (or restarted) instance should be able to restore state from other instances.
    • (+) if there is just a single instance, a restated instance should be able to restore state from local host.

Possible approaches (this space will be changing for a bit as I explore and document different approaches):

  • use a distributed memcache (like golang/groupcache) that each clustermgr instance can use to keep and share it's state.
    • pros:
      • it's provided as a client/server lib with no need for a separate server
    • cons:
      • it's not clear how peer additions/deletions are handled. Need to study the lib more.
      • it is not clear how the cache is updated (or flushed). Need to study the lib more.
      • persistence of state in single instance environment will be tricky
  • start with a in memory db (like boltdb)
    • pros:
      • it's provided as a client/server lib with no need for a separate server
      • persistence of state is built-in
    • cons:
      • distribution of state will need to be done separately
@danehans
Copy link
Contributor

danehans commented Apr 26, 2016

collins is run as a container (with it's own local mysql db) which prevents node state to be available everywhere atm, so all clustermgr instances need to access same collins instance

Consider using galera to replicate the sql db. I have used galera in past projects and it works well.

clustermgr (and it's sub systems) keeps some state like global extra-vars, per node host-group (once a node is commissioned). This state is lost on a process restart

What are your thoughts on having clustermgr use a K/V store such as etcd or boltdb to store persistent data? In the etcd scenario, we could leverage etcd's existing HA deployment model for clustermgr.

If all contiv/cluster services run as containers, I think this effects our approach to HA. For example, we should think in terms of "How do we ensure x # of clustermgr/collins/etc containers are always running in the contiv cluster"

@mapuri
Copy link
Contributor Author

mapuri commented Apr 28, 2016

What are your thoughts on having clustermgr use a K/V store such as etcd or boltdb to store persistent data? In the etcd scenario, we could leverage etcd's existing HA deployment model for clustermgr.

Yeah, we have discussed this internally before as well and it is perfectly reasonable question. The challenge is that cluster manager is a service that is supposed to bootstrap a KV store in a cluster, so to bring up cluster manager if we need (or ask user) to setup etcd, it kind of beats the purpose of simplifying cluster provisioning. And is a chicken and egg problem.

For example, we should think in terms of "How do we ensure x # of clustermgr/collins/etc containers are always running in the contiv cluster"

yes, this kind of goes back to assigning semantics to the host-group management in clusterm as I just commented on here (#87 (comment)). This is not implemented yet but basically as clusterm is aware about how many master nodes are alive it can take care of ensuring a certain number of master nodes are always online which in turn makes sure that infrastructure services (like collins, K8s API server or swarm-replicas) keep functioning well even in case of node failures.

Note that this may sound like what k8s or swarm does for containers. But the difference is that cluster manager does it for these services instead, i.e. it ensures a certain instances of K8s API server instances are always running.

Does this makes sense?

@danehans
Copy link
Contributor

Yeah, we have discussed this internally before as well and it is perfectly reasonable question. The challenge is that cluster manager is a service that is supposed to bootstrap a KV store in a cluster, so to bring up cluster manager if we need (or ask user) to setup etcd, it kind of beats the purpose of simplifying cluster provisioning. And is a chicken and egg problem.

My thought is not necessarily using a single etcd cluster for clusterm and the cluster scheduler. I agree that their should be clear separation between the two etcd clusters. However, I think their is value in minimizing the number of storage permutations in the overall solution. Creating an etcd backend for Collins in not a high-priority, but is a dev effort that would benefit us and the user.

Note that this may sound like what k8s or swarm does for containers. But the difference is that cluster manager does it for these services instead, i.e. it ensures a certain instances of K8s API server instances are always running

From a k8s standpoint, ^ is covered by podmaster: https://github.com/kubernetes/kubernetes/blob/master/docs/admin/high-availability/podmaster.yaml

@mapuri
Copy link
Contributor Author

mapuri commented Apr 29, 2016

From a k8s standpoint, ^ is covered by podmaster:

hmm, podmaster seems to depend on etcd. Who ensure etcd is up and running?

@danehans
Copy link
Contributor

I think that's where clusterm comes in. Let the cluster schedulers be responsible for managing their services (if supported within the cluster scheduler) and have clusterm handle the infra svcs the schedulers rely on. Thoughts?

@mapuri
Copy link
Contributor Author

mapuri commented Apr 29, 2016

yep, I think we are on same page here :)

quoting myself from above (i didn't cover kv-stores and plugins in there, but I was intending for all infra services):

as clusterm is aware about how many master nodes are alive it can take care of ensuring a certain number of master nodes are always online which in turn makes sure that infrastructure services (like collins, K8s API server or swarm-replicas) keep functioning well even in case of node failures.

@mapuri mapuri added this to the 0.2 milestone May 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants