Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use health checks from kubernetes deployment #71

Closed
wesselkranenborg opened this issue Nov 12, 2018 · 16 comments
Closed

Use health checks from kubernetes deployment #71

wesselkranenborg opened this issue Nov 12, 2018 · 16 comments
Assignees
Labels
feature New feature or request

Comments

@wesselkranenborg
Copy link

Is your feature request related to a problem? Please describe.
We deploy our services to kubernetes with a custom health check endpoint (e.g. /private/ping). Kubernetes correctly uses this endpoint but the application gateway is using the default health probes which are described here: https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-probe-overview. No custom health probes are configured so the application gateway backend health is not doing the same checks as Kubernetes does which may cause wrong application gateway health check issues.

Describe the solution you'd like
Use the health check endpoint from the kubernetes deployment to configure a custom probe in the application gateway as part of the ingress configuration.

@asridharan
Copy link
Contributor

@wesselkranenborg if I understand correctly you want the ingress controller to learn the custom health check endpoint for a given service and configure custom health probes on the application gateway?
If my assumption is correct, could you give an example of the custom helath check endpoint here.

Thanks,
aVinash

@asridharan asridharan added the feature New feature or request label Nov 19, 2018
@wesselkranenborg
Copy link
Author

That is indeed correct. We for example use a /private/ping endpoint on all our services which are only exposed internally within the cluster (and thus available by the applicatoin gateway).

This service is configured as health check endpoint in the deployment of kubernetes and returns a 200 OK if it is healthy, otherwise an other statuscode. This endpoint is checking if the service is correctly running, has access to the database and other dependent services, etc.

@ilooner
Copy link

ilooner commented Nov 21, 2018

+1 For this feature I am running into the same issue now.

I tried to work around this by manually configuring the custom health check endpoint within the http-settings that are automatically added to the application gateway by the ingress controller. This manual configuration works until your pods are redeployed. When that happens the ingress controller rewrites the http-settings and consequently removes any custom health check endpoint you configured.

@dtsato
Copy link

dtsato commented Feb 21, 2019

+1 for this feature. The default nginx ingress controller supports this feature using a custom annotation: https://github.com/nginxinc/kubernetes-ingress/tree/master/examples/health-checks

@wesselkranenborg
Copy link
Author

We received this notification in our Azure Service Health. This makes this feature even more important to implement.

This is to notify that Application Gateway v2 SKU will be shortly introducing changes in probing behavior which may require customers to take action to prevent any application downtime. Currently v2 SKU does not perform any default health probes. This implies that if customers do not specify a probe configuration, then all backend instances are assumed to be healthy and traffic is sent to them irrespective of the backend’s health status. Application Gateway is planning to change this behavior by introducing default probes similar to the current GA SKU. Once these changes take effect, probing behavior on both V2 SKU and the current GA SKU will be identical. Details on the health probing
behavior are documented here - https://docs.microsoft.com/en-us/azure/application-gateway/application-gatewayprobe-overview.

No action is required at this time. We will be sending future communication before introduction of default probes reaches production, along with instructions on how to avoid failures. This communication is advanced notification of intended changes to the default probing behavior. Please feel free to reach out to us at appgwxzone@microsoft.com for comments.

@wesselkranenborg
Copy link
Author

This notification is now placed in the service health for region South Central US.

We have important information for your Application Gateway or WAF V2 resources. Beginning on May 1 2019, we will roll out default health probes for the Standard_V2 and WAF_V2 SKUs such that the behavior of the health probes becomes identical to that for the Standard and WAF (V1) SKUs, as documented here: https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-probe-overview#default-health-probe

Recommended action: We have identified one or more Application Gateway / WAF V2 gateways that you own that appear to be failing health probes, either due to a non-healthy HTTP status code (i.e not in the range 200 – 399) or due to a timeout for the health probe.

Since these default health probe failures can potentially cause application downtime (if all backend instances are unhealthy), it is recommended to check your backend endpoint health for these identified gateways and ensure that your backend endpoints are reachable and not returning HTTP failures on the default health probe URL (http://: or https://: depending on whether HTTPs is configured for backend http settings – please refer to the documentation: https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-probe-overview#default-health-probe).

We also recommend configuring custom health probes (https://docs.microsoft.com/en-us/azure/application-gateway/application-gateway-probe-overview#custom-health-probe) to have more control and flexibility on the health probe configuration.

@asridharan
Copy link
Contributor

@wesselkranenborg just acknowledging that we are working on adding custom health probe annotation and trying learn the custom health probe through the liveness checks defined for pods. Will update once we have PRs associated with these features.

@akshaysngupta
Copy link
Member

#158

@akshaysngupta
Copy link
Member

Please try out the release candidate and let us know if you find any issues.

helm install -f <helm-config.yaml> application-gateway-kubernetes-ingress/ingress-azure --version 0.4.0-rc1

@dan-jackson-github
Copy link

Thanks @akshaysngupta, is there any documentation on how to get this working?

@dan-jackson-github
Copy link

dan-jackson-github commented May 15, 2019

As an update @akshaysngupta I have deployed the latest release and seen that the health probes are created based on my liveness and readiness configuration. However in my setup I am using nginx ingress controller to actually forward on requests that come in from Application Gateway (because I don't want to be locked in to using application gateway fully, it also means I can service multiple namespaces in my AKS enviroment, using one ignress control plane and AG). Because the health probe is configured with the hostname from the ingress manifest, when the health check is executed it is sent to the nginx ingress controller running on the backend node, but nginx then forwards this on to mybackend pod which tries to execute the health check (this is because of the value of the hostname). I can get the behaviour that I want if I could override the hostname that the health probes are being configured with and set it to localhost or 127.0.0.1 which will the execute the health check on the nginx ingress controller itself. Is this possible?

@asridharan
Copy link
Contributor

@dan-jackson-github we working on the documentation for health porbes here:
#179

The ask you have is a bit odd since it looks like its very specific for your configuration of having AG ingress controller running with the nginx ingress controller. If you have access to the nginx ingress controller pod spec, adding liveness checks in the spec would be the most straightforward approach here.

@akshaysngupta
Copy link
Member

You can provide host in readinessProbe/livenessProbe.

Here is the merged doc:
https://azure.github.io/application-gateway-kubernetes-ingress/docs/tutorial.html#adding-health-probes-to-your-service

@asridharan
Copy link
Contributor

Closing this out. #158 covers this.

@pameruoso
Copy link

Hi, can't get this working https://azure.github.io/application-gateway-kubernetes-ingress/troubleshooting/

Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "5"
creationTimestamp: "2019-11-25T14:07:12Z"
generation: 10
labels:
run: nginx
name: nginx
namespace: default
resourceVersion: "1249139"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/nginx
uid: e0470631-0f8c-11ea-b26f-9e9afe712f2b
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
run: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
run: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 3
conditions:

  • lastTransitionTime: "2019-11-25T14:07:12Z"
    lastUpdateTime: "2019-12-06T12:50:34Z"
    message: ReplicaSet "nginx-68cf44b44d" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  • lastTransitionTime: "2019-12-06T12:51:04Z"
    lastUpdateTime: "2019-12-06T12:51:04Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
    observedGeneration: 10
    readyReplicas: 3
    replicas: 3
    updatedReplicas: 3

Service:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-11-25T14:08:02Z"
labels:
run: nginx
name: nginx
namespace: default
resourceVersion: "5452"
selfLink: /api/v1/namespaces/default/services/nginx
uid: fe67e689-0f8c-11ea-b26f-9e9afe712f2b
spec:
clusterIP: 10.0.63.93
ports:

  • port: 80
    protocol: TCP
    targetPort: 80
    selector:
    run: nginx
    sessionAffinity: None
    type: ClusterIP
    status:
    loadBalancer: {}

Ingress rule
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
appgw.ingress.kubernetes.io/backend-path-prefix: /
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"kubernetes.io/ingress.class":"azure/application-gateway"},"name":"test","namespace":"default"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"nginx","servicePort":80}}]}}]}}
kubernetes.io/ingress.class: azure/application-gateway
creationTimestamp: "2019-11-25T14:11:53Z"
generation: 2
name: test
namespace: default
resourceVersion: "1247987"
selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/test
uid: 884dab27-0f8d-11ea-b26f-9e9afe712f2b
spec:
rules:

  • http:
    paths:
    • backend:
      serviceName: nginx
      servicePort: 80
      path: /
      status:
      loadBalancer:
      ingress:
    • ip: XXX

No matter if i change from liveness to readiness probe... in the app-gateway I always see this:

image
the default configuration... no matter if i scale up or down or if i change the paths... it's always the default.

Using azure ingress 1.0.0

@pameruoso
Copy link

got it working:
Didn't know that the following is required. After specifying this inside the deployment it's working.
ports:
- containerPort: 80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants