Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crippling xtables lock contention on kubernetes #933

Closed
SleepyBrett opened this issue Jan 26, 2018 · 4 comments
Closed

Crippling xtables lock contention on kubernetes #933

SleepyBrett opened this issue Jan 26, 2018 · 4 comments

Comments

@SleepyBrett
Copy link
Contributor

Expected Behavior

Kubeproxy should be able to get a word in edgewise and update our tables.

Current Behavior

We currently have three kubernetes 1.7.6 clusters in prod, one fairly large one (about 50 m4.10xl nodes in aws, 818 services) and two smaller clusters that net about 200 services apiece but are otherwise identical.

As part of our move to kubernetes 1.9 by way or 1.8 we performed an upgrade from flannel 0.8.0 to 0.9.1. Things went smoothly on the two smaller clusters, however on the large cluster as soon as a flannel pod got upgraded on a node the kubeproxy on the node started reporting kube-proxy-w8hxt:kube-proxy E0126 20:16:49.022132 1 proxier.go:1601] Failed to execute iptables-restore: failed to acquire old iptables lock: timed out waiting for the condition.

As part of our troubleshooting we also moved to flannel 0.10.0, same issue.

I don't have any in depth knowledge of how the iptables xtables.lock works, but on an upgraded box we were seeing upwards of 4 iptables processes pending at all times (/usr/local/bin/iptables commands with --wait flags) they don't seem to have a fifo type queuing arrangement I imagine that it's just whoever checks and finds no lock runs.

Digging through the chanelog we found this pull request: #808

We suspected this was the root cause and that a 5s check for these rules was causing excessive contention on our nodes.

Possible Solution

I've built a replacement container against 0.10.0 in which I quickly changed the 5s check to a 5m check and deployed it on one of the nodes that was previously effected, so far so good.

I think perhaps that hardcoding this value is a mistake and perhaps a flag or other configuration could be provided to adjust this sync timer. I'm perfectly happy to work up the pull request for this feature but would like some guidance about how you would like me to provide this new parameter (I suspect a commandline flag that defaults to 5 seconds).

Steps to Reproduce (for bugs)

  1. Have a largeish cluster running kubeproxy in iptables mode and a large number of services w/ endpoints
  2. install flannel and monitor the kube proxy
  3. potentially delete some pods to trigger a iptables rebuild
  4. observe errors

Context

I think I covered the context above

Your Environment

  • Flannel version: 0.9.1, 0.10.0
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: not sure at the moment, i suspect its not relevant
  • Kubernetes version (if used): 1.7.6 w/ kube proxy in iptables mode
  • Operating System and version: coreos current beta channel
  • Link to your project (optional):
@tomdee
Copy link
Contributor

tomdee commented Jan 26, 2018

Thanks for the great issue report. A command line arg would be the right way to configure this. Allowing very large values would also provide a mechanism for disabling it. It would be great to add a node to troubleshooting.md about this too. I think it's worth getting this fix into the next release but longer term we'll probably want to do something better.

@tomdee
Copy link
Contributor

tomdee commented Jan 29, 2018

Going to leave this open for now to help track a longer term solution.

@squeed
Copy link
Contributor

squeed commented Jan 29, 2018

Switch to nftables :-) ?

@stale
Copy link

stale bot commented Jan 26, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 26, 2023
@stale stale bot closed this as completed Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants