Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

Add PersistVolumne Support #1861

Merged
merged 9 commits into from
Jan 20, 2018
Merged

Conversation

rjtsdl
Copy link
Contributor

@rjtsdl rjtsdl commented Jan 17, 2018

Successor of #1695

Copy the plan:

  • add PVC for etcd pod (Add PersistVolumne Support #1861)
  • handle node restart case
    Behavior: first, it will has the eviction behavior. After that, when the nodes back. The Unknown pod(s) would disappear in the pod list.
  • handle node partition case
  • handle node eviction case
    Behavior: the pod(s) scheduled on the evicted node will become Unknown after a certain period. Then another pod(s) will be scheduled on the healthy nodes to reach the desired replica count. The Unknown will not disappear though.

@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

2 similar comments
@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

/cc @msohn @georgekuruvillak

Can you guys also help on this PR.

As a side note, I noticed the SAP fork of etcd operator. Now etcd operator can already work with swift through its s3 compatible API. We want to get PV in.

We want to put some love in the project to move everyone back to upstream.

@hongchaodeng
Copy link
Member

@etcd-bot ok to test

@rjtsdl
Copy link
Contributor Author

rjtsdl commented Jan 18, 2018

@hongchaodeng , can you explain a bit, what is the node partition case?
For other cases, i have tested, and documented the behavior in the description. Check if it makes sense.

@rjtsdl rjtsdl changed the title [WIP] add PersistVolumne Support Add PersistVolumne Support Jan 18, 2018
@brendandburns
Copy link

LGTM.

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

@rjtsdl

i have not really looked into the code. but want to mention that if PV is enabled, etcd operator should

  1. always use write once mode to avoid two pods use the same etcd metadata. this can happen if k8s thinks a node is dead (however it is not) and creates a pod on some other node.
  2. disable disaster detection. even if there is 1 out of 3 pods left there (or even 0/3 pods), the data is on PV. we can just wait for the other two member comes back with data.


// By default, kubernetes will mount a service account token into the etcd pods.
// AutomountServiceAccountToken indicates whether pods running with the service account should have an API token automatically mounted.
AutomountServiceAccountToken *bool `json:"automountServiceAccountToken,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the changes related to AutomountServiceAccountToken?
This PR should be about PersistentVolumeClaimSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

@@ -219,6 +219,7 @@ func (r *Restore) createSeedMember(ec *api.EtcdCluster, svcAddr, clusterName str
}
ms := etcdutil.NewMemberSet(m)
backupURL := backupapi.BackupURLForRestore("http", svcAddr, clusterName)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove blank line

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
livenessProbe,
readinessProbe)

volumes := []v1.Volume{
{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
if cs.Pod != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes aren't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is taken care by function AddEtcdVolumeToPod in the same file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I don't understand your comment?
Why add this change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is

func AddEtcdVolumeToPod(pod *v1.Pod, pvc *v1.PersistentVolumeClaim) {
	vol := v1.Volume{Name: etcdVolumeName}
	if pvc != nil {
		vol.VolumeSource = v1.VolumeSource{
			PersistentVolumeClaim: &v1.PersistentVolumeClaimVolumeSource{ClaimName: pvc.Name},
		}
	} else {
		vol.VolumeSource = v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}
	}
	pod.Spec.Volumes = append(pod.Spec.Volumes, vol)
}

If it is not PV, it will just append the EmptyDir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand it.
Why is this related to PV or emptyDir? This is what I saw on this PR on github:

	if cs.Pod != nil {
		container = containerWithRequirements(container, cs.Pod.Resources)
	}

It only deals with resources.

Copy link
Contributor Author

@rjtsdl rjtsdl Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i thought you are asking line 267 & 268.

This line I don't think it has anything to do with this PR scope. It was there mostly because of my cherry pick from the old PR.

I do think it is a nice thing to have. But not belong to this PR's scope. I can remove it though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.
It's duplicate. Resource requirements/pod policy are applied outside this func:

applyPodPolicy(clusterName, pod, cs.Pod)

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

@rjtsdl

it seems that this PR only adds the PV as the non-stable storage (same as empty dir) for etcd pod, not use PV as a stable storage?

if we consider PV as stable storage, we should not add/remove members if the size field is not changed. if one pod dies, we should create another pod with the same FQDN + PV. no member changes need to be involved. when size is changed, we need to add/remove member as before.

@rjtsdl
Copy link
Contributor Author

rjtsdl commented Jan 18, 2018

@xiang90 you are right. It is treated as non-stable storage. We should make it as a stable storage.

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

@rjtsdl

i am fine marking this as alpha which uses PV as temp storage. then we can iterate on it.

@hongchaodeng
Copy link
Member

hongchaodeng commented Jan 18, 2018

@rjtsdl
Let me clarify the four points in the plan:

    1. Add PVC for etcd pod. This is basically what is achieved in this PR.
    1. When node restart happens, current etcd pods would fail and won't come up anymore. This PR DOES NOT handle it. Besides changing the restart policy to "Always", following cases need be addressed:
    • 2.1. kubelet couldn't report to master and node would become "unready". etcd-operator would need to tolerate this case for some period of time. If node restarts and then etcd pods restarts and everything turns to healthy state, then etcd-operator wouldn't do anything. Otherwise, etcd-operator needs to take the nodes and thus etcd pods on it as unhealthy and do healing.
    • 2.2. Precious case assumes etcd pods still exist. If not, e.g. evicted, we will handle it like case 4.
    1. If node partition happens, it could encounter case 2.1 that node would become "unready" and case 2.2 that etcd pods get deleted. The solution applies the same. But we need to think about and test this case.
    1. Node eviction happens, and etcd pods get deleted. Let's assume PV are still there. In this case, etcd-operator should have the knowledge of current membership and reschedule the "failed" members onto healthy nodes, and mount its corresponding PV back, and then everything should be fine because network and storage are still the same.

Gopkg.lock Outdated
@@ -328,6 +328,6 @@
[solve-meta]
analyzer-name = "dep"
analyzer-version = 1
inputs-digest = "ff4a4b53ef454edf0ae2ec3bc497baf74d561af328faa123eb472d87954a906c"
inputs-digest = "b19769641bf117041022e3de0c490313047f3b7aaef54e00a7354573fff9b738"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is updated automatically after I ran update_vendor.sh. Do I need to revert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

When node restart happens, current etcd pods would fail and won't come up anymore.

it is unfair to say this PR does not handle it. this PR adds PV as non-stable storage.

if the node goes down, the operator will still remove the member and add a new one as the empty-dir case, no?

@hongchaodeng
Copy link
Member

I'm fine to use it to replace ephemeral storage initially.

@xiang90
Copy link
Collaborator

xiang90 commented Jan 18, 2018

I'm fine to use it to replace ephemeral storage initially.

just to be clear. not to replace current ephemeral storage, but another ephemeral storage options besides current empty-dir, right? eventually we should make PV a stable storage option.

@hongchaodeng
Copy link
Member

@xiang90
Yes. That's the plan in #1323 (comment) which is explained above. Those items are expected to be solved one by one.

@rjtsdl
Copy link
Contributor Author

rjtsdl commented Jan 18, 2018

That would be awesome. I will address comments, then we can merge it as initial step.

sgotti and others added 5 commits January 18, 2018 13:45
this patch implements part 1 of the etcd data on persistent volumes design.

when pod pvsource is defined in the spec it'll create a PVC for every etcd
member and use it as the volume for etcd data.

pvc without a member will be removed during the reconcile.
@@ -375,9 +375,26 @@ func (c *Cluster) setupServices() error {
return k8sutil.CreatePeerService(c.config.KubeCli, c.cluster.Name, c.cluster.Namespace, c.cluster.AsOwner())
}

func (c *Cluster) IsPodPVEnabled() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add doc string on public method.

@@ -78,6 +78,10 @@ func GetPodNames(pods []*v1.Pod) []string {
return res
}

func PVCNameFromMemberName(memberName string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add doc string on public method.

@@ -209,6 +213,18 @@ func newEtcdServiceManifest(svcName, clusterName, clusterIP string, ports []v1.S
return svc
}

func AddEtcdVolumeToPod(pod *v1.Pod, pvc *v1.PersistentVolumeClaim) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add doc string on public method.

@@ -231,6 +249,19 @@ func NewSeedMemberPod(clusterName string, ms etcdutil.MemberSet, m *etcdutil.Mem
return pod
}

func NewEtcdPodPVC(m *etcdutil.Member, pvcSpec v1.PersistentVolumeClaimSpec, clusterName, namespace string, owner metav1.OwnerReference) *v1.PersistentVolumeClaim {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add doc string on public method.

@@ -138,6 +138,9 @@ type PodPolicy struct {
// bootstrap the cluster (for example `--initial-cluster` flag).
// This field cannot be updated.
EtcdEnv []v1.EnvVar `json:"etcdEnv,omitempty"`

// PersistentVolumeClaimSpec ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete the comment?

@xiang90
Copy link
Collaborator

xiang90 commented Jan 19, 2018

lgtm defer to @hongchaodeng


// PersistentVolumeClaimSpec is the spec to describe PVC for the etcd container
// This field is optional. If no PVC spec, etcd container will use emptyDir as volume
PersistentVolumeClaimSpec *v1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention that this field is alpha, and not use PV as stable storage as it should yet.

@@ -78,6 +78,11 @@ func GetPodNames(pods []*v1.Pod) []string {
return res
}

// PVCNameFromMemberName the way we get PVC name from the member name
func PVCNameFromMemberName(memberName string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVCNameFromMemberName -> PVCNameFromMember

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
livenessProbe,
readinessProbe)

volumes := []v1.Volume{
{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
if cs.Pod != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand it.
Why is this related to PV or emptyDir? This is what I saw on this PR on github:

	if cs.Pod != nil {
		container = containerWithRequirements(container, cs.Pod.Resources)
	}

It only deals with resources.

DNSPolicy: v1.DNSClusterFirstWithHostNet,
Hostname: m.Name,
Subdomain: clusterName,
AutomountServiceAccountToken: func(b bool) *bool { return &b }(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this. It is not related to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to delete AutomountServiceAccountToken . That is basically all the change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to delete AutomountServiceAccountToken

What's the reason for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to

Can you remove the changes related to AutomountServiceAccountToken?
This PR should be about PersistentVolumeClaimSpec.

The comment from your previous review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why AutomountServiceAccountToken is related to PV?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just get it back.

I deleted it while I was trying to remove changes to AutomountServiceAccountToken. Apparently, it shouldn't be removed here.

Copy link
Member

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after nit

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
livenessProbe,
readinessProbe)

volumes := []v1.Volume{
{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
if cs.Pod != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.
It's duplicate. Resource requirements/pod policy are applied outside this func:

applyPodPolicy(clusterName, pod, cs.Pod)

@rjtsdl
Copy link
Contributor Author

rjtsdl commented Jan 20, 2018

@hongchaodeng thx for explaining. Removed the line.

@hongchaodeng hongchaodeng merged commit ffbd359 into coreos:master Jan 20, 2018
@hongchaodeng
Copy link
Member

Thanks everyone!
Especially @sgotti for the initiate and @rjtsdl for the finish!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants