Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to disable loadbalancer health probes #1394

Open
tarioch opened this issue Jan 17, 2020 · 24 comments
Open

Allow to disable loadbalancer health probes #1394

tarioch opened this issue Jan 17, 2020 · 24 comments
Assignees
Labels
action-required feature-request Requested Features Feedback General feedback

Comments

@tarioch
Copy link

tarioch commented Jan 17, 2020

Currently there is no way to disable the loadbalancer health probes. It would be good if an annotation could be add to allow to disable the health probes. Either globally or for some of the ports.

@jnoller
Copy link
Contributor

jnoller commented Jan 17, 2020

Can you provide a use case/business justification for completely disabling all health probes?

@tarioch
Copy link
Author

tarioch commented Jan 17, 2020

Sure. Our use case is that we're exposing jenkins jnlp. The health probes flood on the one hand the log files with unnecessary log entries and the other more important factor seems to be that the health probes are "too aggressive" and disconnect the connection where it should still be ok and jenkins is without that able to keep the connection open.

Right now we did the workaround from here: https://stackoverflow.com/a/54257960 : Basically changing externalTrafficPolicy to Local and adding an explicit healthCheckNodePort.

Since we did that change the connection stays very stable where before it got interrupted every couple hours.

@przemolb
Copy link

przemolb commented Feb 9, 2020

I also would like to be able to disable all health probes. In our case it is log flooding (and our developers hate it analysing logs in case of issues ....). But also just to have an option. Not really sure what is the a use case/business justification to enforce health probes ?

@pag08007
Copy link

We also ran into similar issue. We were deploying an application that was listening for TCP connections on a specific port and then triggered an event when a connection was made. The health probes were triggering our events and as a result were spamming our logs with fake errors.

@alex-doerfler
Copy link

For us this annotation would be helpful as well. We forward the TCP traffic to an outgoing connection. This is charged by bandwidth. The Healthprobes cause significant costs here. Therefore we had to use the workaround mentioned by @tarioch

@przemolb
Copy link

Any progress on this ?

@github-actions
Copy link

Action required from @Azure/aks-pm

@Bessonov
Copy link

Action required from @Azure/aks-pm

1 similar comment
@ghost
Copy link

ghost commented Jul 26, 2020

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jul 26, 2020
@TomGeske TomGeske added feature-request Requested Features and removed Needs Attention 👋 Issues needs attention/assignee/owner action-required labels Jul 27, 2020
@TomGeske
Copy link

+@palma21

@ghost ghost added the action-required label Jan 23, 2021
@antonmatsiuk
Copy link

another use case is using bitnami helm chart for mysql. Health probes flood the log with Got an error reading communication packets messages

@motmot80
Copy link

another use case is using the load balancer for udp services with no http or tcp endpoint.

@FanerYedermann
Copy link

another use case is using the load balancer for udp services with no http or tcp endpoint.

Almost the same case here. I have a raw socket that I don't want spammed.

@dnovvak
Copy link

dnovvak commented May 25, 2021

Any update on this?
Our use case is connection quality measurement using TCP/UDP sockets. Health probes from the load balancer disrupts measurements.

@ghost ghost removed the action-required label Jul 18, 2021
@palma21 palma21 added the Feedback General feedback label Jul 18, 2021
@BobClaerhout
Copy link

We are experiencing this issue as well. We have an mqtt port which is behind a loadbalancer. This mqtt port requires authentication and the health probe doesn't provide the authentication (of course) nor the correct protocol which results in logging (in the business application) of a faulty incoming request.
Since this has been updated 4 days ago, is this active now? If yes, what would be the release time for this?

@vishalsawale9
Copy link

vishalsawale9 commented Aug 3, 2021

I have a similar requirement. I'm hosting an HTTPS application on Azure AKS cluster with gunicorn as flask running wsgi gateway, I'm continuously getting these socket errors in pods, though the app is up and running. I suspect the health probes occupying some port, and thus getting those errors almost every 2-3 seconds.

@TomasTokaMrazek
Copy link

TomasTokaMrazek commented Dec 3, 2021

This is currently a severe blocker for our deployment. We have a service exposing non-traditional protocols like websockets and custom communication protocol over TCP. The health probe is sending some data instead of empty netcat, so every few seconds there is a exception and stack trace in our logs.

I understand that disabling health probe for ports is not a best practice, but it's fast solution to our issue discussed here. Other solution would be to allow us specify custom probe just as Kubernetes allows via readinessProbe and livenessProbe configuration.

I propose simple LB annotation service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names It's not ideal, but since we do not have health probes for UDP, it shouldn't matter that much.

Example

apiVersion: v1
kind: Service
metadata:
  name: app
  namespace : default
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "SomeSubnet"
    service.beta.kubernetes.io/azure-load-balancer-disable-health-probe-for-port-names: "binary,binary-secure,jms-tcp"
spec:
  selector:
    app: app
  type: LoadBalancer
  ports:
  - name: servlet-http
    protocol: TCP
    port: 9763
    targetPort: 9763
  - name: servlet-https
    protocol: TCP
    port: 9443
    targetPort: 9443
  - name: binary
    protocol: TCP
    port: 9611
    targetPort: 9611
  - name: binary-secure
    protocol: TCP
    port: 9711
    targetPort: 9711
  - name: jms-tcp
    protocol: TCP
    port: 5672
    targetPort: 5672

I dug up some other annotations realted to health probes here, but that doesn't seem to work or I don't understand, what it does.

service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path

@mplachter
Copy link

Would love to have a way to disable load balancer health checks for specific ports, use case for example is GRPC Ports don't like when TCP things probe them and do not ask for the GRPC Preface. This causes a ton of Log flooding which is just noise.

Another valid option would be instead to allow the specific configuration to have a different healthcheck like a http healthcheck to the downstream service instead of checking the GRPC TCP port for availability.

@Wuzyn
Copy link

Wuzyn commented Sep 21, 2022

Hi all
I'm having the same issue. I'm hosting sftp server on AKS.

@hterik
Copy link

hterik commented Oct 17, 2022

Putting LoadBalancer in front of a HTTP server as many have done above you need to be aware of following.
The LoadBalancer health probe runs from each node in the cluster. It opens a TCP request, holds it open, sends nothing, and then waits for 15 seconds before closing the connection. I don't know if its the responsibility of the server or prober to close it faster, but most servers i seen it just occupies one thread. Meaning your server must concurrently be able to hold at least one connections open per node. Switching to some kind of asyncio server helps a lot, otherwise you need to increase the number of threads to match at least the number of nodes in your cluster.

A better solution is to consider a Ingress controller when dealing with HTTP.

@solacens
Copy link

solacens commented Feb 1, 2023

For my infrastructure it requires multiple rules across different port, so if I need to have multiple copies of my microservices, I need multiple copies of ingress-controller for TCP forwarding. As a result, I turned into the Azure CNI provided LoadBalancer type.

After that somehow the client was experiencing intermittent 502 BAD GATEWAY for requests and I highly doubt it is related to the kubernetes or kubernetes-internal load balancer health probe misdetection underneath the Azure CNI. And I would like to rule out this possibility by disabling that.

@fethullahmisir
Copy link

fethullahmisir commented Feb 23, 2023

I had the same problem and was able to disable the health probe for my sftp server port with this annotation:

From the docs: https://cloud-provider-azure.sigs.k8s.io/topics/loadbalancer/#loadbalancer-annotations

service.beta.kubernetes.io/port_{port}_no_probe_rule: true

Where{port}must be replaced by the service port like service.beta.kubernetes.io/port_22_no_probe_rule.

I think this issue can be closed as disabling health probes are already supported.

@TomasTokaMrazek
Copy link

This was always the possibility or it was recently added as new function to AKS LB?

@fethullahmisir
Copy link

The doc states that it's possible since AKS Version v1.24. I don't know when the v1.24 Version was released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action-required feature-request Requested Features Feedback General feedback
Projects
None yet
Development

No branches or pull requests