Skip to content
This repository has been archived by the owner on Jun 4, 2019. It is now read-only.

Proposal: Implement stateful TPR Flock #14

Open
tamalsaha opened this issue Mar 2, 2017 · 5 comments
Open

Proposal: Implement stateful TPR Flock #14

tamalsaha opened this issue Mar 2, 2017 · 5 comments
Assignees

Comments

@tamalsaha
Copy link
Member

tamalsaha commented Mar 2, 2017

Birds of a feather flock together

When we run stateful apps (apps that store data in disk) like GlusterFS or various databases, we face a choice which Kubernetes object to use for provisioning such objects. Here are the requirements:

  • Stable routable network ID (stable across restarts). Must support reverse PTR records.
  • Must be safe to run applications without authentication using the stable network ID
  • Run multiple replica with persistent storage.
  • It should be possible to run the different replicas on different nodes to achieve high availability. (Optional)

This can't be achieved in cloud providers that do not have native support for persistent storage or Kubernetes does not have volume controller (eg, DigitialOcean, Linode, etc).

Here is my proposal on how to meet the above requirements in a cloud provider agnostic way.

StatefulSet: If the underlying cloud provider have native support cloud disk and has built-in support in Kubernetes (aws/gce/azure), then we can use StatefulSet. We can prevision disks manually and bind them with claims. We might be able to also provision them using dynamic provisioning. Moreover, StatefulSets will allow using pod name as a stable network ID. Users can also use pod placement options to ensure that pods are distributed across nodes. This allows for HA.

DaemonSet: Cloud providers that does not support built-in storage and/or has no native support in Kubernetes (eg, DigitalOcean, Linode) can't use StatefulSets to run stateful apps. Stateful apps running in these clusters must use hostpath to store data or risk losing it when pods restart. StatefulSet can't dynamically provision host path bound PVCs. In these cases, we could use DaemonSet. We have to use hostpath or emptyDir` type PV with the DaemonSet. If DaemonSets are run with pod network, no stable ID is possible. If DaemonSets run with host network, then they might use node IP. Node names are generally not routable. But Node IPs are not stable either, since most times these are allocated via DHCP. Also, for cloud providers like DigitalOcean, host network are also shared and not safe to run with out authentication.

Luckily, we can achieve something similar to StatefulSet in such providers. The underlying process is based on how named headless services work as described here: https://kubernetes.io/docs/admin/dns/ .

In these types of providers, we have to run N ReplicaSet with replica=1. We can use a fixed hostpath. We can chose N nodes and index then from 0..n-1. We apply a nodeSelector with these RCs to ensure rc with index i always runs on node with index i. Since they are on separate node, they can safely use same host path. For network ID, we set both hostname and sub-domain in the PodTemplate for these RCs. This will give the pods a dns name the same was StatefulSets pods get. Since these pods are using pod network, it should be safe to run applications without authentication. Now, we have N pods with stable name and running on different nodes using hostpath. Voila!

To simplify the full process, we can create a new TPR called Flock. We implement GlusterFS or KubeDBs using this TPR. The Flock controller will be in charge of translating this into the appropriate Kubernetes object based on flags set on the controller.

@tamalsaha tamalsaha changed the title StatefulSet | DaemonSet for stateful apps Proposal: Implement stateful TPR Flock Mar 2, 2017
@sadlil
Copy link
Contributor

sadlil commented Mar 2, 2017

we have to run N ReplicaSet with replica=1. We can use a fixed hostpath. We can chose N nodes and index then from 0..n-1. We apply a nodeSelector with these RCs to ensure rc with index i always runs on node with index i.

What if Floc.Spec.Replica > Node Count? we can't use fixed hostpath if multiple pod running on same node.

@mirshahriar
Copy link
Contributor

@sadlil, we need to make sure that Flock.Spec.Replica will not greater than total node.

@tamalsaha
Copy link
Member Author

Filed this proposal in Kube. kubernetes/community#424 . At worst, they think I am crazy.

@tamalsaha
Copy link
Member Author

tamalsaha commented Mar 2, 2017

Based on my conversation on Slack, it might be possible to implement a Dynamic Hostpath PV provisioner. And the StatefulSet can use that.

https://github.com/kubernetes-incubator/external-storage/tree/master/docs/demo/hostpath-provisioner

7:09]  
@deads2k  on a separate note, we are exploring the idea of using creating a new TPR to address the limitation of StatefulSet that it can't use hostpath. Here is the proposal: https://github.com/kubernetes/community/issues/424  .  I would be glad if you can read it and give some feedback. (edited)

deads2k [7:10 AM] 
Why can't a statefulset use a hostpath in combination with an SA and a PSP?

tamal [7:11 AM] 
What is PSP?

deads2k [7:11 AM] 
@tamal podsecuritypolicy which is designed to control access to things like hostpath

tamal [7:13 AM] 
Ok. I will read that. if # of pods > 1, can stateful set guarantee that hostpth will not overlap between pods?

liggitt [7:13 AM] 
scheduling can be configured to spread across nodes on statefulset selection, right? (edited)

tamal [7:14 AM] 
But how do I guarantee that the same pod goes to the same node?

[7:14]  
Say, pod-0 always goes to node-X

tamal [7:19 AM] 
@deads2k  , I don't think PSP can do what I need.

claytonc [8:13 AM] 
@tamal that use case sounds a bit like a hostpath dynamic provisioner

tamal [8:14 AM] 
Yes. But I could not thin of a way to do that as Statefulsets work today

[8:15]  
We need a way to store the pod index -> node mapping. Passing the node selector in StatefulSet. (edited)

tamal [8:34 AM] 
@claytonc  , how do you think we can write a hostpath dynamic provisioner ?

claytonc [8:39 AM] 
@tamal set the volume class on your stateful set to have a specific name

[8:40]  
then write a loop that creates a hostpath PV and binds it to PVCs asking for that storage class when created

[8:40]  
and ensure that loop creates unique values for hostpath PV

tamal [8:41 AM] 
But how do I guarantee that the same pod goes to the same node?

claytonc [8:52 AM] 
set a unique volume label on the PV

[8:52]  
that corresponds to the hostname of the node the pod goes to

tamal [9:01 AM] 
@claytonc, i see how PV - PVC - Pod can be connected using the unique label that corresponds to hostname

[9:02]  
But I am still missing how do I make sure scheduler always picks the same nodes when pod restarts

new messages
claytonc [9:36 AM] 
scheduler picks nodes for recreated pods that match the volume’s label

tamal [10:05 AM] 
Thanks @claytonc  . I need to read these details.  Dynamic PVC provisioners are compiled in Kube?

[10:05]  
or are they separate banaries?

claytonc [10:05 AM] 
they can be run anywhere - the simplest pattern might just be a bash scirpt for loop

[10:05]  
there’s work to make provisioners easier

[10:06]  
to script

[10:06]  
not sure how far that has gotten

tamal [10:07 AM] 
Do you mind pointing me to the current ones? I can pattern it around the existing ones.

mrick [10:18 AM] 
@tamal check out the out of tree dynamic provisioners https://github.com/kubernetes-incubator/external-storage/tree/master/docs/demo/hostpath-provisioner
github.com
external-storage/docs/demo/hostpath-provisioner at master · kubernetes-incubator/external-storage · GitHub
external-storage - External Storage Plugins, Provisoners, and Helper Libraries

@tamalsaha
Copy link
Member Author

tamalsaha commented Mar 3, 2017

The example just run the provisioner in one node. We have to run that with DaemonSet on all nodes.
Also, we need to know in which directory to store data. since we have to mount it inside the provisioner docker image.

Also, we need to enable PSP: https://kubernetes.io/docs/user-guide/pod-security-policy/#controlling-volumes . So that Pods can use HostPath as volumes via dynamic provisioner.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants