Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to support rolling update in the ingress controller. #75

Closed
emilverwoerd opened this issue Nov 26, 2018 · 17 comments
Closed

Need to support rolling update in the ingress controller. #75

emilverwoerd opened this issue Nov 26, 2018 · 17 comments
Assignees
Labels
feature New feature or request

Comments

@emilverwoerd
Copy link

Describe the bug
When performing an rolling update of any kind of service you want the site or service to stay online. But during an update a 502 is given with the exception bad gateway. The problem occurs due the fact that Application Gateway is using the internal IP-addres of the nodes in the backend-pool instead of the Cluster IP of the specified service.

So what happens is that kubernetes is spinning up different nodes with new ip-addresses depending on the replica count and that the original ip-addresses are removed which are used by the Application Gateway. A couple of minutes later the backend-pool is updated with the new IP-addreses of the nodes. But we want the ClusterIP address to be used in the backend-pool so Kubernetes can perform correct load-balancing

To Reproduce
Redeploy a service and check if it is online

@asridharan
Copy link
Contributor

@emilverwoerd are you using the latest ingress controller helm chart? There was a bug in AKS because of which ingress controller would stop receiving pod update events to the order of minutes resulting in delay in updating the backend pools in the application gateway.

To comment on your solution, using the cluster IP instead of the pod IP for the backend IP address is a very bad idea. The cluster IP is a VIP (virtual IP address) that is used for layer 4 (TCP) load balancing. Adding the cluster IP as the backend to the application gateway instead of the actual pod IP would result in breaking session affinity if enabled in the application gateway. The correct solution is to observe the deployments associated with a service and not just the endpoints and update the backend pool sets.

Meanwhile if you haven't updated to the latest helm chart could you kindly update to the latest and retry the rolling update to the observe the behavior?

@asridharan
Copy link
Contributor

This is a classic case of supporting blue-green deployments. We should try and support this if it is not already working.

@asridharan asridharan added the feature New feature or request label Nov 26, 2018
@emilverwoerd
Copy link
Author

emilverwoerd commented Nov 26, 2018

@emilverwoerd are you using the latest ingress controller helm chart? There was a bug in AKS because of which ingress controller would stop receiving pod update events to the order of minutes resulting in delay in updating the backend pools in the application gateway.

To comment on your solution, using the cluster IP instead of the pod IP for the backend IP address is a very bad idea. The cluster IP is a VIP (virtual IP address) that is used for layer 4 (TCP) load balancing. Adding the cluster IP as the backend to the application gateway instead of the actual pod IP would result in breaking session affinity if enabled in the application gateway. The correct solution is to observe the deployments associated with a service and not just the endpoints and update the backend pool sets.

Meanwhile if you haven't updated to the latest helm chart could you kindly update to the latest and retry the rolling update to the observe the behavior?

We are currently using version 0.1.4 with latest helm chart so that should be the latest. What we are experiencing is that the site is temporarily unavailable due the fact that the pods are terminated and the new pods are running but it takes some time for Application Gateway to update the backend-pools so what happens is that for a brief amount of time Application Gateway is configured with IP-addresses from the old containers that are not running anymore so when kubernetes is done upgrading the containers applications gateway is not ready yet and this can take a couple of moments.

So the old containers should be terminated when Azure Application Gateway is done updating the backend-pools.

@asridharan
Copy link
Contributor

@emilverwoerd just checked the commits and the fix for the AKS event subscription was already present in the ingress controller helm chart 0.1.4, so you should have already had that fix:
#66

So I think the issue will still persist for you even after upgrading to 0.1.5. Could you kindly share your deployment spec here (the relevant parts such as "rolling update strategy", readiness probes and any preStop life cycle hooks you might have added)? Please read below for possible solutions that others have tried.

I dug around a bit as to how other ingress controllers were dealing with Zero downtime. Found these two blogs :
https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes
and
https://techblog.topdesk.com/continuous-integration/rolling-updates-kubernetes/

The problem you are facing seems to be a common issue with other ingress controllers as well (nginx controllers are the ones cited in the articles above). If you follow the articles, to achieve zero downtime upgrades "with ingress" there are two components that are required in your deployment spec. The first is the readinessProbe and the second is the preStop check in the container lifeCycle spec. The readinessProbe will allow Kubernetes to know when your pods are ready (assuming your pods handle SIGTERM well), and the preStop will give a grace period for the ingress controller to update the application gateway once the Endpoints object is updated.

NOTE: The ingress controller on the application gateway takes at least 40 seconds to take affect so setting a sleep of 40 seconds in the preStop stage should help reduce your outages during upgrades.

@emilverwoerd
Copy link
Author

Okay I will update my update spec here but I don't understand how you should specify a rolling update if AG isn't completed before the pods are recreated. That was also the reason I tought it would be better to use the Cluster IP of the service since it isn't chaning and the ip of the pods are. But I also understand that AG isn't part of the kubernetes platform so that it is not possible to do so. But than to perform a correct update you should wait unitl with container termination uniitl AG is ready. I will try the pre stop as workaround and will let you know if it works.

service.txt

@asridharan
Copy link
Contributor

@emilverwoerd the AG would be updated only after the new pods are created, since Kubernetes will update the Endpionts object for a service only after the new pods have been created. The problem here is not that we are not updating the AG with the new pods, the problem here is that there seems to be a delay between the point when we update the backendpool in the AG with the new pods and the time when the update actually takes affect (~40 sec). During this time if the old pods are taken down by Kubernetes and the backendpool is not updated it will lead to an intermittent outage as you are seeing. This is a problem with any ingress controller nothing specific to AG.

The preStop hook tells Kuberenets to execute a function before sending the SIGTERM to the old pod. Adding a sleep in this preStop will tell Kubernetes to wait for x seconds before sending the SIGTERM allowing the update the in application gateway to go through before the old pods are taken down. This hopefully will make the upgrades smoother.

Hope that explains the proposed solution?

@vramakrishnan
Copy link
Contributor

vramakrishnan commented Nov 28, 2018

@emilverwoerd Can you please share your findings with this config in deployment Spec.
We will work on reducing the 40sec update time subsequently.

lifecycle:
  preStop:
    exec:
      command: ["sleep", "40"]

Also can you please share your config for these settings in deployment spec.

rollingUpdate:
    maxSurge: 1
    maxUnavailable: 1

This article also has good insights.
https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/

@emilverwoerd
Copy link
Author

@emilverwoerd Can you please share your findings with this config in deployment Spec.
We will work on reducing the 40sec update time subsequently.

lifecycle:
  preStop:
    exec:
      command: ["sleep", "40"]

Also can you please share your config for these settings in deployment spec.

rollingUpdate:
    maxSurge: 1
    maxUnavailable: 1

This article also has good insights.
https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/

I tried to add the command sleep but when performing that it gives issues with the readinessprobe and it terminates the pod. Also the time it takes to update the backend-pool really takes some time so it has no different effect than without the sleep command.

We use the following spec for our rollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0

@asridharan
Copy link
Contributor

asridharan commented Nov 30, 2018

@emilverwoerd could you provide your subscription ID? want to make sure you are using Application Gateway v2 and not v1. Did you create the AKS cluster and application gateway through the templates?

We will try reproducing this problem at our end as well, but without the readiness probe kubernetes wouldn't know when to update the endpoints object with the new pods, so the whole rolling update process would be flaky. So I would think we need to get the readiness probes working for rolling updates.

@emilverwoerd
Copy link
Author

@asridharan we are on the following subscription '282d71e4-f66b-4e8f-8e49-4faea8667362' but we are using Gateway v2. And we created the cluster through our ARM templates so I could send you those templates if you want.

thx in advance for checking it out

@PandaXass
Copy link

@emilverwoerd @asridharan I think the terminationGracePeriodSeconds needs to be specified in the deployment yaml file to make sleeping 40s work, since by default, K8s will send TERM signal anyway after 30s.

@kernelv5
Copy link

Our environment tier with Azure by VPN. My nginx-ingress external IP is similar to my local network. When I hit nginx-ingress I can easily access my application but through Application Gateway getting 502. I test Application Gateway to nginx-ingress ip by NetworkTroubleshoot from Azure, that's also working fine. Just curious to know In order to route traffic from Azure Application Gateway to ingress, application-gateway-kubernetes-ingress is mandatory or I can go with nginx-ingress as well

@asridharan asridharan changed the title Backend pool uses IP-address of node instead of Cluster IP address Need to support rolling update in the ingress controller. Apr 22, 2019
@asridharan
Copy link
Contributor

@kernelv5 sorry for the late response, but one thing you want to check is that AG subnet is able to route to subnets that your pods are connected to? If AG is not able to connect to your pods than you might be getting a 502 error.

@Baklap4
Copy link

Baklap4 commented Nov 15, 2019

I think this is currently still an issue. I 'fixed' it somehow to use the preStop hook, in combination with the terminationGracePeriodSeconds

The sleep i have around 45 seconds, and the termination grace period around 90 seconds which seems to work for our case.

Would be nice if this gets implemented...

@hugoderene
Copy link

hugoderene commented Nov 25, 2019

We are dealing with the same issues. The workaround suggested by @Baklap4 functions, but is far from ideal.

The ingress controller pod (image: mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:tag) is initiating a reconfiguration of the appgw as soon as any of the ‘connected’ resources changes (= expected behaviour). This reconfiguration process is first fully completed, before a new reconfiguration process is initiated.

During a rolling redeployment several changes (stopping pods, creating pods, deleting pods) are happening relatively shortly after each other. A redeployment therefore causes a discrepancy between the configuration of the appgw and the actual situation in the cluster. This discrepancy is resulting in 502s and/or 503s which is not very ‘rolling’.

An example to illustrate:

  • The first step of a redeployment procedure is to stop one or multiple existing pod(s).

  • Immediately after this first step, a reconfiguration process is initiated by the ingress controller pod. This process takes about 45-60 seconds.

  • During these 45-60 seconds other existing pods might be stopped and new pods might be created. The addresses of the pods that have been stopped during the ‘second wave’ are not accessible anymore. However they are still being served via the application gateway.

  • Only when the first reconfiguration process has been completed and a second reconfiguration process is initiated the unavailable addresses will be updated in the appgw.

Is there a possibility to somehow supersede the reconfiguration process initiated by the ingress controller pod as soon as there is another change happening within the connected resources in the cluster?

@akshaysngupta
Copy link
Member

Closing this issue.
Please follow this document to reduce the 502s during rolling updates.

@Baklap4
Copy link

Baklap4 commented Feb 24, 2020

@akshaysngupta Where are the tracking issues for the longterm-support solution? Making the backendpool pick up changes faster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants