kube-router duplicates rules in the KUBE-ROUTER-INPUT chain #1676

TPXP · 2024-05-21T18:44:00Z

What happened?
Hello, we're happily using kube-router to handle network policies on our kubernetes cluster (--run-router=false --run-firewall=true --run-service-proxy=false). We recently upgraded to v2.0.1 and v2.1.2 and it seems that kube-router is slow to sync firewall rules on some of our nodes.

We believe this could cause some packet drops impacting performance of our applications. Our investigations led us to look at iptables on our server. These look mostly OK, except for the KUBE-ROUTER-INPUT chain in the ip filter table. Indeed, on some of our machines, it that some rules are repeated:

-A KUBE-ROUTER-INPUT -d 10.96.0.0/12 -m comment --comment "allow traffic to primary/secondary cluster IP range - BVHX2PIHDXXEO43X" -j RETURN
-A KUBE-ROUTER-INPUT -p tcp -m comment --comment "allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M" -m addrtype --dst-type LOCAL -m multiport --d
ports 30000:32767 -j RETURN
-A KUBE-ROUTER-INPUT -p udp -m comment --comment "allow LOCAL UDP traffic to node ports - 76UCBPIZNGJNWNUZ" -m addrtype --dst-type LOCAL -m multiport --d
ports 30000:32767 -j RETURN
-A KUBE-ROUTER-INPUT -p tcp -m comment --comment "allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M" -m addrtype --dst-type LOCAL -m multiport --d
ports 30000:32767 -j RETURN
-A KUBE-ROUTER-INPUT -p udp -m comment --comment "allow LOCAL UDP traffic to node ports - 76UCBPIZNGJNWNUZ" -m addrtype --dst-type LOCAL -m multiport --d
ports 30000:32767 -j RETURN
[... 2 last lines are repeated thousands of times ...]

Our servers run Ubuntu 22.04 LTS so we're using nftables under the hood. Asking nftables what it has gives curious results:

# nft list chain ip filter KUBE-ROUTER-INPUT
table ip filter {
        chain KUBE-ROUTER-INPUT {
                ip daddr 10.96.0.0/12  counter packets 0 bytes 0 return
                meta l4proto tcp  fib daddr type local tcp dport 30000-32767 counter packets 1079 bytes 65033 return
                meta l4proto udp  fib daddr type local udp dport 30000-32767 counter packets 21 bytes 3481 return
                meta l4proto tcp  fib daddr type local tcp dport 30000-32767 counter packets 0 bytes 0 return
                meta l4proto udp  fib daddr type local udp dport 30000-32767 counter packets 0 bytes 0 return
[... 2 last lines are repeated thousands of times ...]
# nft list chain ip filter KUBE-ROUTER-INPUT | grep 30000-32767 | wc -l
3481

Also, it seems that the rule is frequently replaced/re-created since the packet counter (at the end of every rule line) is frequently reset to 0. While I'm not sure about it, I think the rules reset could cause the packet losses we're investigating.

Another consequence of this is that kube-router takes much longer to sync iptables rules. On impacted nodes, sync times can take over 10 seconds which -we think- may prevent newly assigned pod from reaching the internet in their startup scripts.

What did you expect to happen?
kube-router should recognize existing "allow LOCAL TCP traffic to node ports" and "allow LOCAL UDP traffic to node ports" rules and not re-create them every time it syncs. 🙏

How can we reproduce the behavior you experienced?
I'm not 100% sure about what's causing this since some of our nodes don't have the issue. Here's my guess

Install kube-router v2.0.1 or v2.1.2 on a cluster
Add a bunch of pods with node ports
Let kube-router sync run a few times (I think it runs every 5 minutes?)
Check what iptables look like on the server.

Screenshots / Architecture Diagrams / Network Topologies
We have a bunch of bare metal servers with a public IP and use kilo to encrypt traffic between them.

System Information (please complete the following information):

Kube-Router Version (kube-router --version): 2.1.2
Kube-Router Parameters: --run-router=false --run-firewall=true --run-service-proxy=false
Kubernetes Version (kubectl version) : v1.29.1
Cloud Type: bare metal
Kubernetes Deployment Type: Bare metal. Our servers run Ubuntu 22.04 LTS
Kube-Router Deployment Type: DaemonSet
Cluster Size: about 25 nodes

Logs, other output, metrics
Here's the iptables sync time metrics we have. The repeated rules are present on all pods taking longer to sync.

Additional context
We've been using kube-router for a few years and it has served us well, thanks for maintaining it! <3

The text was updated successfully, but these errors were encountered:

aauren · 2024-05-22T21:59:44Z

Thanks for reporting this @TPXP!

I haven't seen this in any of my clusters, but I'll be taking a look as soon as I have a free moment to see if I can figure out what might be causing this.

In the meantime, if you get any more information that might be helpful, please add more comments.

aauren · 2024-05-27T17:42:32Z

@TPXP - So after being a bit more observant, I can now say that I definitely see this happening in my own clusters as well.

Thanks for reporting it! This is a very serious problem, so I'm glad that you took the time.

From what I can tell, it looks like the Exists() function is no longer working. Behind the scenes that calls the iptables check option -C.

When I try to execute this even from within the kube-router pod, it looks like it is failing:

root@kube-router-vm2:~ #iptables-save | grep "allow traffic to load balancer IP range: 10.243.1.0/24" | head -1
-A KUBE-ROUTER-INPUT -d 10.243.1.0/24 -m comment --comment "allow traffic to load balancer IP range: 10.243.1.0/24 - GNMUGXUVWPZFLO4Q" -j RETURN
root@kube-router-vm2:~ #iptables -t filter -C KUBE-ROUTER-INPUT -d 10.243.1.0/24 -m comment --comment "allow traffic to load balancer IP range: 10.243.1.0/24 - GNMUGXUVWPZFLO4Q" -j RETURN
iptables: Bad rule (does a matching rule exist in that chain?).

This seems like something is broken upstream in the netfilter code. When I revert to a base image of Alpine 3.18 which carries with it iptables v1.8.9 instead of v1.8.10 it looks like this problem no longer occurs:

root@kube-router-vm2:~ #iptables-save | grep "allow traffic to load balancer IP range: 10.243.1.0/24" | head -1
-A KUBE-ROUTER-INPUT -d 10.243.1.0/24 -m comment --comment "allow traffic to load balancer IP range: 10.243.1.0/24 - GNMUGXUVWPZFLO4Q" -j RETURN
root@kube-router-vm2:~ #iptables -t filter -C KUBE-ROUTER-INPUT -d 10.243.1.0/24 -m comment --comment "allow traffic to load balancer IP range: 10.243.1.0/24 - GNMUGXUVWPZFLO4Q" -j RETURN
root@kube-router-vm2:~ #echo $?
0

I want to see if we're able to get to the bottom of this, but in the meantime, I think that we should revert the alpine version bump and see if we can get a stable version of kube-router out there.

aauren · 2024-05-27T20:47:46Z

This will be fixed in v2.1.3 which should be pushed later today.

TPXP added the bug label May 21, 2024

TPXP changed the title ~~kube-router duplicates rules~~ kube-router duplicates rules in the KUBE-ROUTER-INPUT chain May 21, 2024

aauren mentioned this issue May 27, 2024

feat(alpine): revert 3.19 -> 3.18 #1678

Merged

aauren closed this as completed in #1678 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-router duplicates rules in the KUBE-ROUTER-INPUT chain #1676

kube-router duplicates rules in the KUBE-ROUTER-INPUT chain #1676

TPXP commented May 21, 2024

aauren commented May 22, 2024

aauren commented May 27, 2024

aauren commented May 27, 2024

kube-router duplicates rules in the KUBE-ROUTER-INPUT chain #1676

kube-router duplicates rules in the KUBE-ROUTER-INPUT chain #1676

Comments

TPXP commented May 21, 2024

aauren commented May 22, 2024

aauren commented May 27, 2024

aauren commented May 27, 2024