-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Currently rolling updates in Kubernetes can cause downtime for a Lattice service. This problem can be solved by using a Kubernetes pod readiness gate which indicates that a pod has been successfully synced into Lattice.
This feature is supported in aws-load-balancer-controller. It describes the problem well:
The pod readiness gate is needed under certain circumstances to achieve full zero downtime rolling deployments. Consider the following example:
- Low number of replicas in a deployment
- Start a rolling update of the deployment
- Rollout of new pods takes less time than it takes the AWS Load Balancer controller to register the new pods and for their health state turn »Healthy« in the target group
- At some point during this rolling update, the target group might only have registered targets that are in »Initial« or »Draining« state; this results in service outage
In order to avoid this situation, the AWS Load Balancer controller can set the readiness condition on the pods that constitute your ingress or service backend. The condition status on a pod will be set to True only when the corresponding target in the ALB/NLB target group shows a health state of »Healthy«. This prevents the rolling update of a deployment from terminating old pods until the newly created pods are »Healthy« in the ALB/NLB target group and ready to take traffic.
I believe the Lattice controller would need to be modified to sync pods that are not yet ready (or in ContainersReady status) into Lattice target groups. It would also need to inject a pod readiness gate into newly created pods, and then update this pod readiness gate's status once the pod has been synced to Lattice.