Skip to content

Commit

Permalink
Merge pull request #8 from dippynark/improve-remediation-and-cluster-…
Browse files Browse the repository at this point in the history
…deletion

Improve remediation and cluster deletion
  • Loading branch information
dippynark authored Apr 3, 2021
2 parents 731d0dd + 5691b90 commit c6a5b47
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 15 deletions.
1 change: 1 addition & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ rules:
- pods
verbs:
- create
- delete
- get
- list
- watch
Expand Down
21 changes: 14 additions & 7 deletions controllers/kubernetesmachine_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ type KubernetesMachineReconciler struct {
// +kubebuilder:rbac:groups=infrastructure.dippynark.co.uk,resources=kubernetesmachines,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=infrastructure.dippynark.co.uk,resources=kubernetesmachines/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=cluster.x-k8s.io,resources=machines,verbs=get;list;watch
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;create
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch;create;delete
// +kubebuilder:rbac:groups=core,resources=pods/exec,verbs=create
// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch;create
// +kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create
Expand Down Expand Up @@ -215,7 +215,7 @@ func (r *KubernetesMachineReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result

// Handle deleted machines
if !kubernetesMachine.ObjectMeta.DeletionTimestamp.IsZero() {
return r.reconcileDelete(ctx, machine, kubernetesMachine)
return r.reconcileDelete(ctx, machine, kubernetesMachine, cluster)
}

// Make sure infrastructure is ready
Expand Down Expand Up @@ -511,6 +511,13 @@ func (r *KubernetesMachineReconciler) reconcileNormal(ctx context.Context, clust
return ctrl.Result{}, nil
}
if kindContainerStatus.State.Terminated != nil {

if kubernetesMachine.Spec.AllowRecreation {
// Delete Pod to allow it to be recreated
log.Info("Deleting Pod due to terminated kind container")
return ctrl.Result{}, r.Delete(ctx, machinePod)
}

kubernetesMachine.Status.SetFailureReason(capierrors.UnsupportedChangeMachineError)
kubernetesMachine.Status.SetFailureMessage(errors.Errorf("kind container has terminated: %s", kindContainerStatus.State.Terminated.Reason))

Expand Down Expand Up @@ -552,11 +559,11 @@ func (r *KubernetesMachineReconciler) reconcileNormal(ctx context.Context, clust
return ctrl.Result{}, nil
}

func (r *KubernetesMachineReconciler) reconcileDelete(ctx context.Context, machine *clusterv1.Machine, kubernetesMachine *capkv1.KubernetesMachine) (ctrl.Result, error) {
// If the deleted machine is a control-plane node, exec kubeadm reset so the
// etcd member hosted on the machine gets removed in a controlled way

if util.IsControlPlaneMachine(machine) {
func (r *KubernetesMachineReconciler) reconcileDelete(ctx context.Context, machine *clusterv1.Machine, kubernetesMachine *capkv1.KubernetesMachine, cluster *clusterv1.Cluster) (ctrl.Result, error) {
// If the deleted machine is a control plane node, exec kubeadm reset so the etcd member hosted on
// the machine gets removed in a controlled way. If the cluster has been deleted then we skip this
// step to stop it hanging forever in the case of control plane failure
if cluster.ObjectMeta.DeletionTimestamp.IsZero() && util.IsControlPlaneMachine(machine) {
// Check if machine pod exists
machinePod := &corev1.Pod{}
err := r.Client.Get(ctx, types.NamespacedName{
Expand Down
25 changes: 19 additions & 6 deletions docs/flavors.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,17 @@ The default flavor creates a Kubernetes cluster with the controller Nodes manage
[KubeadmControlPlane](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20191017-kubeadm-based-control-plane.md)
resource and the worker Nodes managed by a
[MachineDeployment](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-deployment.html)
resource. The controller Nodes write etcd state to the container file system and the corresponding
KubernetesMachines will fail if the underlying Pods fails, relying on the
resource.

The controller Nodes write etcd state to an
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume and the
corresponding KubernetesMachines will fail if the underlying Pods fail, relying on the
KubeadmControlPlane for remediation.

The worker Nodes will fail if their underlying Pods fail, relying on the MachineDeployment and a
[MachineHealthCheck](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-health-check.html)
for remediation.

```sh
CLUSTER_NAME="example"
export KUBERNETES_CONTROL_PLANE_SERVICE_TYPE="LoadBalancer"
Expand Down Expand Up @@ -48,14 +55,20 @@ by a
[KubeadmControlPlane](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20191017-kubeadm-based-control-plane.md)
resource and the worker Nodes managed by a
[MachineDeployment](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-deployment.html)
resource. PersistentVolumes are dynamically provisioned for the controller Nodes to write etcd state
and the corresponding KubernetesMachines are configured to recreate the underlying Pod if it is
deleted as described in [persistence.md](persistence.md).
resource.

PersistentVolumes are dynamically provisioned for the controller Nodes to write etcd state and the
corresponding KubernetesMachines are configured to recreate their underlying Pods if they fail as
described in [persistence.md](persistence.md).

The worker Nodes will fail if their underlying Pods fail, relying on the MachineDeployment and a
[MachineHealthCheck](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-health-check.html)
for remediation.

```sh
CLUSTER_NAME="example"
export KUBERNETES_CONTROL_PLANE_SERVICE_TYPE="LoadBalancer"
export ETCD_STORAGE_CLASS_NAME="ssd"
export ETCD_STORAGE_CLASS_NAME="premium-rwo"
export ETCD_STORAGE_SIZE="1Gi"
clusterctl config cluster $CLUSTER_NAME \
--infrastructure kubernetes \
Expand Down
4 changes: 2 additions & 2 deletions docs/persistence.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Persistence

By default when the Pod backing a KubernetesMachine is deleted, the KubernetesMachine (and therefore
the managing Machine) will be set to failed. However, if the
By default when the Pod backing a KubernetesMachine fails or is deleted, the KubernetesMachine (and
therefore the managing Machine) will be set to failed. However, if the
`kubernetesMachine.spec.allowRecreation` field is set to `true`, the Pod will instead be recreated
with the same name. For controller Machines, by mounting a PersistentVolume at the etcd data
directory, the Pod can recover without data loss and without the managing KubernetesMachine failing:
Expand Down

0 comments on commit c6a5b47

Please sign in to comment.