Sporadic "Kernel error" when a lot of pods are scheduled #1521

zerkms · 2023-08-02T23:05:27Z

What happened?

When I drain a node and pods get rescheduled on a different machine - more often than not the node that receives an instant spike in the pods to be scheduled gets this in logs:

E0802 21:30:35.086351       1 service_endpoints_sync.go:73] Error syncing ipvs svc iptables rules to permit traffic to service VIP's: failed to sync ipset: ipset v7.15: Kernel error received: Resource busy                                                                             
E0802 21:33:04.158225       1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-SRC-FRNJANT7TJPCNM5F due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component                                                          
E0802 21:33:15.078398       1 network_policy_controller.go:259] Aborting sync. Failed to sync network policy chains: %!v(MISSING)failed to perform ipset restore: ipset v7.15: Error in line 693: Kernel error received: Resource busy

And when it happens pods lose network connectivity (at least to service IPs).

I think the corresponding commands that update kernel resources should have their own retry queues that are tighter - eg retry in a loop for 10 seconds, if "Resource busy".

From another hand, I don't entirely understand what "Resource busy" exactly means in this context? Is it a race condition around updating/removing kernel objects?

What did you expect to happen?
I think it should not happen at all.

How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:

Schedule a significant number of pods on one node

**Screenshots / Architecture Diagrams / Network Topologies **
If applicable, add those here to help explain your problem.

** System Information (please complete the following information):**

Kube-Router Version (kube-router --version): 1.5.4
Kube-Router Parameters: [e.g. --run-router --run-service-proxy --enable-overlay --overlay-type=full etc.] --run-router=true --run-firewall=true --run-service-proxy=true --bgp-graceful-restart=true --kubeconfig=/var/lib/kube-router/kubeconfig --hairpin-mode=true --metrics-port=8080
Kubernetes Version (kubectl version) : [e.g. 1.18.3] 1.26.4
Cloud Type: [e.g. AWS, GCP, Azure, on premise] baremetal
Kubernetes Deployment Type: [e.g. EKS, GKE, Kops, Kubeadm, etc.] kubeadm
Kube-Router Deployment Type: [e.g. DaemonSet, System Service] ds
Cluster Size: [e.g. 200 Nodes] ~15

** Logs, other output, metrics **
Please provide logs, other kind of output or observed metrics here.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

aauren · 2023-08-04T22:31:22Z

Hmm... So this shouldn't happen because we have our own mutex surrounding ipset logic and all the functions that you posted above has the correct locking logic.

However, I can't argue with the logs. Can you please run kube-router with -v 1 as one of the options and then re-create the scenario and post me the logs? I'm especially interested in the lines like:

Attempting to attain ipset mutex lock
Attained ipset mutex lock, continuing...
Returned ipset mutex lock

and the ordering of those from the various controllers surrounding the errors.

zerkms · 2023-08-07T04:56:25Z

I have found out that I cannot reproduce it on any of my development clusters, but only in prod. Hence I will do that next time I a security bulletin for kernel is issued (and a new canonical kernel is released): it usually happens at least every 2 weeks.

zerkms · 2023-08-23T22:10:54Z

Here is entire log which includes events slightly before and slightly after incident when a pod could not connect to the service IP.

Timestamps when the pod tried and failed to connect: 21:45:37.945 and 21:45:41.917.
And then a bit later it succeeded at: 21:45:59.723809+00:00
Service IP it tried to connect to: 10.32.131.122

And entire (mildly anonymised but no lines removed) log: https://www.dropbox.com/scl/fi/ima6kknivqs17nj075t1d/kube-router-v1.log?rlkey=aigkfmrl08n35ksik3nddw2nz&dl=0

Log is attached via dropbox because github does not accept comments longer than 64kB.

aauren · 2023-08-28T22:47:06Z

Hmm... So I don't see anything in the logs. It seems that the error that you initially opened the issue for doesn't appear in the logs at all. There's only 1 error, and its a pretty common one, where an ipset fails to delete due to a reference not having cleared the kernel yet, that got corrected less than 2 seconds later in the next sync.

In terms of not having reachability to a pod for the time window that you mentioned, I don't see anything in the logs that would cause that. The failed ipset does get cleared right around there, but I think that is a red-herring as there are numerous successful syncs between the time period that it broke and that error that synced without any errors.

You do have a pretty high amount of churn on your host, but all of the controllers seem to be syncing fairly quickly.

In terms the service IP, that IP doesn't show up in the logs at all, because most things are logged by service name, so you might be able to diagnose that side more than I would be able to.

zerkms · 2023-08-28T22:54:08Z

In terms the service IP, that IP doesn't show up in the logs at all, because most things are logged by service name, so you might be able to diagnose that side more than I would be able to.

It shows:

I0823 21:45:20.292157       1 service_endpoints_sync.go:594] Found a destination 10.31.14.64:5432 (Weight: 1) in service tcp:10.32.131.122:5432 (Flags: [hashed entry]) which is no longer needed so cleaning up

at this point in time it's removed.

And apparently this is when it's added back:

I0823 21:46:00.111395       1 network_services_controller.go:914] Received update to endpoint: org-d-alarms-xtr-k8s-1-prod/db-pgbouncer from watch API
I0823 21:46:00.111533       1 ecmp_vip.go:325] Received update to endpoint: org-d-alarms-xtr-k8s-1-prod/db-pgbouncer from watch API
I0823 21:46:00.111587       1 ecmp_vip.go:184] Updating service org-d-alarms-xtr-k8s-1-prod/db-pgbouncer triggered by endpoint update event
I0823 21:46:00.112829       1 network_services_controller.go:948] Syncing IPVS services sync for update to endpoint: org-d-alarms-xtr-k8s-1-prod/db-pgbouncer
I0823 21:46:00.112874       1 network_services_controller.go:454] Performing requested sync of ipvs services

And in-between there is no ipvs service available.

And yes, sorry for not mentioning what ns/service-name that IP is associated with.

zerkms · 2023-08-28T23:02:54Z

And a tiny update on my previous

And then a bit later it succeeded at: 21:45:59.723809+00:00

Statement: this timestamp is when the application has started. It definitely tried to connect some time later than that. As the application does not have verbose logging on when exactly it connected to the database during initialisation I can only tell that it happened between: 21:45:59.723809+00:00 (and more precisely - 100% later than 2023-08-23T21:45:59.992235+00:00) and 21:46:02.477+00:00 (to which I0823 21:46:00.112874 falls ideally).

aauren · 2023-09-04T14:34:52Z

@zerkms - Sorry for missing the service IP in the logs, I must have made a typo or something.

From what I can see, without knowing more about this specific service that you're deploying, it looks like Kubernetes likely told us that the pod was no longer ready or healthy or deployed or some such, and so we withdrew it from the service. Later on it came back so we put it back.

So as far as I can see, again without knowing more, it looks like kube-router did what it was supposed to do. However, I think this error is a bit off topic from the original issue reported. The first one was about kube-router encountering a kernel error where it wasn't able to update IPVS. This one is about something different.

I'd recommend that we keep this thread about the kernel error (of which I can't find any evidence that it happened in this case from the log you provided). If you want to pursue this other error, we should probably open another issue with more information about how db-pgbouncer is deployed, along with an even higher log level, maybe log level 3?

zerkms · 2023-09-04T21:31:35Z

From what I can see, without knowing more about this specific service that you're deploying, it looks like Kubernetes likely told us that the pod was no longer ready or healthy or deployed or some such, and so we withdrew it from the service. Later on it came back so we put it back.

That's how I read it too. BUT!! It's 100% healthy pods available there (and they happily serve during the same time frames). And it's not a single service - as you can see in that log it's a large batch of them removed. And those services don't belong to the same (or similar applications) - those are just random services from the entire cluster.

So as far as I can see, again without knowing more, it looks like kube-router did what it was supposed to do. However, I think this error is a bit off topic from the original issue reported. The first one was about kube-router encountering a kernel error where it wasn't able to update IPVS. This one is about something different.

Agree. Should we close this (as I don't have any more details for the original one) and create a new one?

If you want to pursue this other error, we should probably open another issue with more information about how db-pgbouncer is deployed, along with an even higher log level, maybe log level 3?

As I mentioned above: db-pgbouncer is just one that I picked for example, those are absolutely different services (see ports) And yep - I can collect more logs at level 3 (I'm surprised nobody else ever experienced it too - I have it in several different clusters).

What they have in common - they are the pods from the same node: 10.31.14.xxx

I0823 21:45:20.285541       1 service_endpoints_sync.go:594] Found a destination 10.31.14.58:8080 (Weight: 1) in service tcp:10.32.54.139:80 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.286338       1 service_endpoints_sync.go:594] Found a destination 10.31.14.65:5432 (Weight: 1) in service tcp:10.32.157.238:5432 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.288207       1 service_endpoints_sync.go:594] Found a destination 10.31.14.249:9153 (Weight: 1) in service tcp:10.32.0.10:9153 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.291732       1 service_endpoints_sync.go:594] Found a destination 10.31.14.51:4443 (Weight: 1) in service tcp:10.32.229.27:443 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.292157       1 service_endpoints_sync.go:594] Found a destination 10.31.14.64:5432 (Weight: 1) in service tcp:10.32.131.122:5432 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.293237       1 service_endpoints_sync.go:594] Found a destination 10.31.14.35:8443 (Weight: 1) in service tcp:10.32.18.253:443 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.298051       1 service_endpoints_sync.go:594] Found a destination 10.31.14.204:8080 (Weight: 1) in service tcp:10.32.195.94:80 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.298707       1 service_endpoints_sync.go:594] Found a destination 10.31.14.249:53 (Weight: 1) in service tcp:10.32.0.10:53 (Flags: [hashed entry]) which is no longer needed so cleaning up
I0823 21:45:20.299980       1 service_endpoints_sync.go:594] Found a destination 10.31.14.249:53 (Weight: 1) in service udp:10.32.0.10:53 (Flags: [hashed entry]) which is no longer needed so cleaning up

If I needed to take a guess - to me it looks kube-router under those circumstances removes services that still have at least one healthy pod running (for whatever reason).

Btw, is it suspicious that coredns ip/port is twice there: 10.31.14.249:53?

github-actions · 2023-10-05T02:04:40Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

zerkms · 2023-10-05T02:11:24Z

I think it's not stale, but I will bring more logs with extra verbose flag, next week on a next kernel upgrade loop.

github-actions · 2023-11-05T02:06:24Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

zerkms · 2023-11-05T20:33:56Z

Okay, I forgot about it, sorry :-D Nonetheless, within next couple of weeks on next upgrade cycle I will provide more logs and will stop bumping the report.

github-actions · 2023-12-06T02:08:01Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

zerkms · 2023-12-06T03:11:44Z

Ok, I need some more time.

zerkms · 2023-12-15T02:16:39Z

Okay, it looks like I cannot reproduce it anymore on 1.27.x.

It was easy and reliable to reproduce in 100% on 1.26.x though.

Hence closing :-)

vladimirtiukhtin · 2024-03-13T00:59:38Z

@aauren I am having same issue. NetworkPolicies do not work for me. I run 1.26.4. In logs I see

Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-3E7NRCUJY5FMHIWS due to ipset v7.17: Set cannot be destroyed: it is in use by a kernel component

aauren · 2024-03-13T02:09:50Z

@vladimirtiukhtin can you open a new issue with all of the fields in the template asking with as many other details as possible? Maybe some debug logs and reproduction instructions?

zerkms added the bug label Aug 2, 2023

github-actions bot added the Stale label Oct 5, 2023

github-actions bot removed the Stale label Oct 6, 2023

github-actions bot added the Stale label Nov 5, 2023

github-actions bot removed the Stale label Nov 6, 2023

github-actions bot added the Stale label Dec 6, 2023

github-actions bot removed the Stale label Dec 7, 2023

zerkms closed this as completed Dec 15, 2023

vladimirtiukhtin mentioned this issue Mar 18, 2024

NetworkPolicy's Egress is not working #1617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic "Kernel error" when a lot of pods are scheduled #1521

Sporadic "Kernel error" when a lot of pods are scheduled #1521

zerkms commented Aug 2, 2023 •

edited

Loading

aauren commented Aug 4, 2023

zerkms commented Aug 7, 2023

zerkms commented Aug 23, 2023

aauren commented Aug 28, 2023

zerkms commented Aug 28, 2023 •

edited

Loading

zerkms commented Aug 28, 2023 •

edited

Loading

aauren commented Sep 4, 2023

zerkms commented Sep 4, 2023 •

edited

Loading

github-actions bot commented Oct 5, 2023

zerkms commented Oct 5, 2023

github-actions bot commented Nov 5, 2023

zerkms commented Nov 5, 2023

github-actions bot commented Dec 6, 2023

zerkms commented Dec 6, 2023

zerkms commented Dec 15, 2023

vladimirtiukhtin commented Mar 13, 2024 •

edited

Loading

aauren commented Mar 13, 2024

Sporadic "Kernel error" when a lot of pods are scheduled #1521

Sporadic "Kernel error" when a lot of pods are scheduled #1521

Comments

zerkms commented Aug 2, 2023 • edited Loading

aauren commented Aug 4, 2023

zerkms commented Aug 7, 2023

zerkms commented Aug 23, 2023

aauren commented Aug 28, 2023

zerkms commented Aug 28, 2023 • edited Loading

zerkms commented Aug 28, 2023 • edited Loading

aauren commented Sep 4, 2023

zerkms commented Sep 4, 2023 • edited Loading

github-actions bot commented Oct 5, 2023

zerkms commented Oct 5, 2023

github-actions bot commented Nov 5, 2023

zerkms commented Nov 5, 2023

github-actions bot commented Dec 6, 2023

zerkms commented Dec 6, 2023

zerkms commented Dec 15, 2023

vladimirtiukhtin commented Mar 13, 2024 • edited Loading

aauren commented Mar 13, 2024

zerkms commented Aug 2, 2023 •

edited

Loading

zerkms commented Aug 28, 2023 •

edited

Loading

zerkms commented Aug 28, 2023 •

edited

Loading

zerkms commented Sep 4, 2023 •

edited

Loading

vladimirtiukhtin commented Mar 13, 2024 •

edited

Loading