-
Notifications
You must be signed in to change notification settings - Fork 327
pod watches stop sending data after a while, but remain connected in 1.18.4 #1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Triage required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
this was still happening as of a few days ago |
@blushingpenguin I was experiencing this as well. At one point it was very intermittent and then it started happening consistently. I was advised by the AKS team to configure my watches to be lower than 4 minutes. This is the default request timeouts for the azure load balancer, this did resolve my issue. Unfortunately these silent failure are problematic in that there are internal workloads with the cluster that have watches longer than 4 minutes. From my understanding this the load balancer on the control plane side so its out of the users control on configuration, but maybe we could get some clarification from someone on the AKS team though @juan-lee @palma21 FWIW. I was on 1.17, andusing nodejs, I would see it very infrequently in previous version well. not sure if/how the k8s version correlates to the issues. |
@blushingpenguin I'm experiencing it when I have long-lived applications in the cluster where the application deploys k8 jobs using inter-cluster comms. It times-out after 4 min silently. I think the reason is that the load balancer that AKS is using for intercluster calls to the API has "TCP reset" disabled but I don't think there is anyway to change this. |
Seeing this too. I think it happens when the watcher receives no events for > 4 minutes. I wrote a C# program that reproduces the problem: https://gist.github.com/bergeron/10c88fd26aa683619fc75cfd85f63acf When I modify the watched resource <= every 4 minutes, everything works:
But when I modify the watched resource > every 4 minutes:
This happens both when running the watcher outside the Kubernetes cluster (connecting to the API server via *.azmk8s.io:443), and also when running the watcher inside a pod in the Kubernetes cluster (connecting via KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT or 10.0.0.1:443). I didn't find the AKS version to have any effect. It reproduced on AKS 1.16.7, 1.16.13, 1.16.15, 1.17.9, 1.17.11, 1.18.6, 1.18.8, 1.19.0. I don't see this problem on kubeadm 1.15. I haven't tested other Kubernetes environments. Setting timeoutSeconds to < 4 minutes on the watch call seems to work though. The connection closes before it starts missing events and our application can restart the watch. |
I use the simple csharp app from @bergeron and I noticed that there is a TCP RST package after ~5min idle. Actually, this behavior is by design that the AKS underlay SLB would send a RST signal to the client after ~4min idle. The client must handle the signal and re-establish the connection. There are several ways to handle this:
|
For #2 with the k8s c# client there is no way of turning on tcp keepalives as it uses HttpClient: dotnet/runtime#31267. I also don't see a TCP RST (if there was a RST, the connection would fail on the client -- that would be ok as it just needs to restart the watch -- however as no RST is sent the connection actually stays open from the client's perspective). |
@nilo19 I never see a TCP RST. It almost seems like the SLB doesn't have RST enabled. I'm on AKS 1.17.7. NOTE: that all this is in-cluster comms. If apply the following python script import time
import logging
from kubernetes import config
logging.basicConfig(
format='%(asctime)s %(levelname)-8s %(message)s',
level = logging.DEBUG)
config.load_incluster_config()
from kubernetes import client
v1 = client.CoreV1Api()
logging.info('Calling 1st time')
v1.list_namespaced_pod('default')
logging.info('Sleeping 5 minutes')
time.sleep(300)
logging.info('Calling 2nd time')
# this call will timeout after 15 minutes
v1.list_namespaced_pod('default')
logging.info('OK') I get the following logs.
The following happens:
|
@thomasfrederikhoeck are you capturing the tcpdumps on the node or the pod itself? |
@marwanad sorry, I don't understand what you mean. I run it as a Deployment on the cluster where I'm using the |
My bad if I wasn't clear enough. I was curious about the part "I never see a TCP RST." - how are you validating that? edit: All I'm saying is looking at those logs isn't enough to tell you if you receive a RST packet or not - See kubernetes-client/python#1132. You need to capture |
@marwanad pardon my ignorance but what I meant was that the client never receives a message that the connection has been closed. Isn't the issue you are pointing to the other way around, where the connection is intentionally closed by the client using the python kubernetes package (but then never really closed)? I guess it would be pretty easy for someone from the AKS team to check if the SLB used for the k8 api has TCP Reset enabled or not, right? |
@thomasfrederikhoeck the TCP reset on our LB is on by default and the idle timeout is 4min. I create a watch and let it idled for 4min. Then I could collect a TCP RST using tcpdump. I believe the issue is introduced because the k8s client failed to handle the RST packet. |
Hi @thomasfrederikhoeck , just want to let you know that I picked this issue and did some testing. I'm still doing more testing but I posted the current test code and results in this repo https://github.com/yangl900/knet#client-python-tests. The findings I have is that:
I'm still trying to see if there is a workaround to enable keepalive for You can deploy the same script to have a test: apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["configmaps"]
verbs: ["get", "watch", "list", "update", "patch", "delete", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
subjects:
# You can specify more than one "subject"
- kind: ServiceAccount
name: default # "name" is case sensitive
apiGroup: ""
roleRef:
# "roleRef" specifies the binding to a Role / ClusterRole
kind: Role #this must be Role or ClusterRole
name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: knet-apiserver-watcher-py
spec:
selector:
matchLabels:
knet: apiserver-watcher-py
replicas: 1
template:
metadata:
labels:
knet: apiserver-watcher-py
spec:
containers:
- name: watcher
image: yangl/apiserver-watcher-py
imagePullPolicy: Always
- name: tcpdump
image: corfr/tcpdump
command:
- "/usr/sbin/tcpdump"
- "-i"
- "any"
- "-nn" |
Hi @blushingpenguin @bergeron @arsnyder16 - just want to let you know with recent AKS infrastructure release, you should see RST always. We still recommend client side turn on TCP keepalive, but I understand not all k8s client does that, e.g. python does not have it today. Please let me know if you still experience any issue with watches. I have some testing results posted on https://github.com/yangl900/knet and will keep posting. |
@yangl900 sorry for the late reply. I had a change to check it out know and everything is working smoothly for me now. Thank you! :-) |
Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days. |
What happened:
After upgrading from 1.16.10 to 1.18.4 watches stop working after a period of time. The connection to api server from the pod remains (i.e. it can be seen with lsof), but no further watch data is sent.
What you expected to happen:
With < 1.16.10 (back to 1.13), periodically the api server connection would drop after a period of time (which I believe was expected) -- in that case we'd just reconnect.
How to reproduce it (as minimally and precisely as possible):
I don't have an easy reproduction (the code is a custom job scheduler written in .NET using the k8s csharp client) -- I can probably boil it down if reproduction is necessary.
Anything else we need to know?:
Environment:
kubectl version
):1.18.4
3 x b2ms
dotnet / node microservices
The text was updated successfully, but these errors were encountered: