Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom ipset sets and entries get reverted periodically #1677

Closed
shouya opened this issue May 24, 2024 · 12 comments
Closed

Custom ipset sets and entries get reverted periodically #1677

shouya opened this issue May 24, 2024 · 12 comments
Labels

Comments

@shouya
Copy link

shouya commented May 24, 2024

What happened?

I'm running k3s agent on a gateway node in my homelab. Recently I added a feature in my firewall script that uses ipset. However, I noticed that the ipset entries get reverted back every few minutes. Further digging brought me to the ipset utility defined in kube-router.

What did you expect to happen?

I expect that kube-router can keep ipsets not defined by itself untouched.

How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:

  1. Run ipset create test hash:net; ipset add test 1.2.3.4
  2. Start k3s agent.
  3. Run ipset add test 5.6.7.8
  4. Wait 5-10 minutes.
  5. Run ipset list test and observe the output to only include 1.2.3.4

System Information:

  • Kube-Router Version (kube-router --version): probably v2.1.0
  • Kube-Router Parameters: not sure, it's not printed in the log
  • Kubernetes Version (kubectl version) : k3s v1.29.4+k3s1
  • Cloud Type: my own box
  • Kube-Router Deployment Type: bundled in k3s
  • Cluster Size: my personal home cluster

Logs:

The fork log during the ipset reversion: execsnoop-bpfcc -T -t:

09:25:28 256.865 ip6tables        1239148 3902986   0 /usr/sbin/ip6tables -w 5 -W 100000 -S KUBE-PROXY-CANARY -t mangle
09:25:28 256.866 ip6tables        1239147 3902986   0 /usr/sbin/ip6tables -w 5 -W 100000 -S KUBE-KUBELET-CANARY -t mangle
09:25:28 256.872 iptables         1239149 3902986   0 /usr/sbin/iptables -w 5 -W 100000 -S KUBE-KUBELET-CANARY -t mangle
09:25:29 256.874 iptables         1239150 3902986   0 /usr/sbin/iptables -w 5 -W 100000 -S KUBE-PROXY-CANARY -t mangle
09:25:29 257.590 iptables         1239151 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-INPUT 1 --wait
09:25:29 257.596 iptables         1239152 3902986   0 /usr/sbin/iptables -t filter -C INPUT -m comment --comment kube-router netpol - 4IA2OSFRMVNDXBVV -j KUBE-ROUTER-INPUT --wait
09:25:29 257.603 iptables         1239153 3902986   0 /usr/sbin/iptables -t filter -S INPUT --wait
09:25:29 257.610 iptables         1239154 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait
09:25:29 257.617 iptables         1239155 3902986   0 /usr/sbin/iptables -t filter -C FORWARD -m comment --comment kube-router netpol - TEMCG2JMHZYE7H7T -j KUBE-ROUTER-FORWARD --wait
09:25:29 257.622 iptables         1239156 3902986   0 /usr/sbin/iptables -t filter -S FORWARD --wait
09:25:29 257.626 iptables         1239157 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-OUTPUT 1 --wait
09:25:29 257.630 iptables         1239158 3902986   0 /usr/sbin/iptables -t filter -C OUTPUT -m comment --comment kube-router netpol - VEAAIY32XVBHCSCY -j KUBE-ROUTER-OUTPUT --wait
09:25:29 257.634 iptables         1239159 3902986   0 /usr/sbin/iptables -t filter -S OUTPUT --wait
09:25:29 257.639 iptables         1239160 3902986   0 /usr/sbin/iptables -t filter -C KUBE-ROUTER-INPUT -m comment --comment allow traffic to primary/secondary cluster IP range - EKROEGTNIJ3AP3LC -d 10.43.0.0/16 -j RETURN --wait
09:25:29 257.642 iptables         1239161 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-INPUT --wait
09:25:29 257.646 iptables         1239162 3902986   0 /usr/sbin/iptables -t filter -C KUBE-ROUTER-INPUT -p tcp -m comment --comment allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M -m addrtype --dst-type LOCAL -m multiport --dports 30000:32767 -j
09:25:29 257.650 iptables         1239163 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-INPUT --wait
09:25:29 257.655 iptables         1239164 3902986   0 /usr/sbin/iptables -t filter -C KUBE-ROUTER-INPUT -p udp -m comment --comment allow LOCAL UDP traffic to node ports - 76UCBPIZNGJNWNUZ -m addrtype --dst-type LOCAL -m multiport --dports 30000:32767 -j
09:25:29 257.659 iptables         1239165 3902986   0 /usr/sbin/iptables -t filter -S KUBE-ROUTER-INPUT --wait
09:25:29 257.664 iptables         1239166 3902986   0 /usr/sbin/iptables -t filter -S KUBE-NWPLCY-DEFAULT 1 --wait
09:25:29 257.668 iptables         1239167 3902986   0 /usr/sbin/iptables -t filter -C KUBE-NWPLCY-DEFAULT -j MARK -m comment --comment rule to mark traffic matching a network policy --set-xmark 0x10000/0x10000 --wait
09:25:29 257.672 iptables-save    1239168 3902986   0 /usr/sbin/iptables-save -t filter
09:25:29 257.680 ipset            1239169 3902986   0 /var/lib/rancher/k3s/data/b159f6e26663d8c92285e7bc4a6881d85bd8c81efc55eb2cf191c54100387fbb/bin/ipset restore -exist
09:25:29 257.715 iptables         1239170 3902986   0 /usr/sbin/iptables -t filter -S --wait
09:25:29 257.727 iptables-restor  1239171 3902986   0 /usr/sbin/iptables-restore --wait -T filter
09:25:29 257.738 ipset            1239172 3902986   0 /var/lib/rancher/k3s/data/b159f6e26663d8c92285e7bc4a6881d85bd8c81efc55eb2cf191c54100387fbb/bin/ipset save

Based on my analysis, the ipset was created, and saved the state on k3s agent startup: https://github.com/k3s-io/k3s/blob/master/pkg/agent/netpol/netpol.go#L55, and then each time the ipset Restore(), the existing ip set gets replaced by the saved ipset, regardless whether or not a third-party has modified the entries. This means any modifications made after k3s agent startup are lost after some time. Please correct me if I'm wrong.

Possible fix:

My suggested solution to this is to add a filter in ParseIPSetSave and make it only handle sets created by kube-router itself.

Workaround:

My current workaround this problem is to restart k3s agent after changing ipset so the its internally saved ipset would reflect the latest state. This is not a perfect solution unfortunately because there are obvious places for race conditions, and restarting k3s agent just for this simple goal seems gratuitous to me.

@shouya shouya added the bug label May 24, 2024
@aauren
Copy link
Collaborator

aauren commented May 24, 2024

kube-router only touches it's own ipsets. You can see this from the delete logic here: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/netpol/network_policy_controller.go#L709-L714

The only way that I could see this happening is if you got incredibly lucky and you somehow consistently managed to add your ipset in the microseconds between when kube-router saves the ipset's and the restores them (see logic above). But for this to be consistently happening seems very, very unlikely unless something else is also going wrong on the node causing a severe performance regression.

Additionally, I tested this on my own cluster just now by creating an ipset and then forcing a network policy sync and seeing that the ipset survived.

Maybe this is something that k3s is doing? If you want to be super sure, you can actually increase the logging verbosity level 3 and see the entire output of the ipset restore that kube-router is doing (https://github.com/cloudnativelabs/kube-router/blob/master/pkg/utils/ipset.go#L595).

@shouya
Copy link
Author

shouya commented May 24, 2024

I'd like to increase log verbosity level and inspect the full restored ipset save string. But I didn't find a place to inject -v3 to kube-router launch option via k3s agent. Is there an environment variable or other means I can modify kube-router's launch parameters?

@aauren
Copy link
Collaborator

aauren commented May 24, 2024

To be honest, I'm not really familiar with k3s's options as I don't use it myself.

@shouya
Copy link
Author

shouya commented May 24, 2024

I just dialed up the verbosity of k3s agent and found it works too. So here's the captured ipset restore string:

May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.534687 1291507 ipset.go:564] ipset restore looks like: create TMP-XRYGEMNBG5CVZI2Y hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: add TMP-XRYGEMNBG5CVZI2Y 1.2.3.4
May 24 23:12:01 scout k3s[1291507]: create DIRECT_SERVER hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: swap TMP-XRYGEMNBG5CVZI2Y DIRECT_SERVER
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: add TMP-XRYGEMNBG5CVZI2Y 1.2.3.4
May 24 23:12:01 scout k3s[1291507]: create PROXIED_SERVER_12345 hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: swap TMP-XRYGEMNBG5CVZI2Y PROXIED_SERVER_12345
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: destroy TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.561481 1291507 policy.go:177] Restoring IPv4 ipset took 26.9803ms

I see it uses a temporary ipset TMP-XRYGEMNBG5CVZI2Y to replace the content of my two ipsets to the state prior to netpol controller launch: neither of the ipset's names (DIRECT_SERVER and PROXIED_SERVER_12345) are owned by kube-router as specified here. As a result, my later added entry 5.6.7.8 is gone from these ipsets.

@shouya
Copy link
Author

shouya commented May 24, 2024

The k3s code for netpol looks suspicious to me in its Run function. Can you confirm if the call of set.Save() prior to controller initialization may be causing the result?

@aauren
Copy link
Collaborator

aauren commented May 25, 2024

I'm glad that you were able to figure out how to increase the logging verbosity for kube-router within k3s, that is definitely helpful.

So yes, I can confirm that it is expected that you would see your ipset in the output for kube-router ipset restore function. This is because it reads all ipsets during the save, and then only modifies its own.

What is unexpected is that it would revert the value to a previous value.

So following through the logic we can look at how kube-router encounters / mutates ipsets in the network policy controller:

So I can't see any obvious place where it would be reverting user chain information.

One thing that I did think of when reviewing the code, is maybe if https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/netpol/network_policy_controller.go#L229-L243 code block was excluded from your build of k3s then it might cause issues for you? Because saved data would have been added into the ipset object from one run, and then possibly restored in the next at: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/netpol/policy.go#L149-L151 ?

I can see that upstream, they have this block in their code because they are linking against kube-router v2.1.2 which has this fix in it. But maybe the version of k3s that you're running is using an older version of kube-router?

@shouya
Copy link
Author

shouya commented May 25, 2024

Thank you for the detailed explanation. I think the initial call to ipset.Save() originates from k3s code, not inside kube-router. This logic on its own can explain how the ipset state on startup gets restored over and over.


Another confusing thing from execsnoop reports is, it seem the last step does involve in issuing an ipset save execution from somewhere, which seems absurd given that we find no place calling ipset.Save() within either k3s or kube-router's normal operating logic.

Let me include the full log that contains hints about the calling lines from the source files:

May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.435020 1291507 health_controller.go:91] Received heartbeat from MC
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.446371 1291507 network_policy_controller.go:201] Requesting periodic sync of iptables to reflect network policies
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.446463 1291507 network_policy_controller.go:216] Full sync request queue was empty so a full sync request was successfully sent
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.446504 1291507 network_policy_controller.go:193] Received request for a full sync, processing
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.446542 1291507 network_policy_controller.go:241] Starting sync of iptables with version: 1716592321446527320
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.447598 1291507 health_controller.go:91] Received heartbeat from NPC
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.451486 1291507 network_policy_controller.go:449] Ensuring jump to chain KUBE-ROUTER-FORWARD from FORWARD at position 1
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.465615 1291507 network_policy_controller.go:449] Ensuring jump to chain KUBE-ROUTER-OUTPUT from OUTPUT at position 1
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.478754 1291507 network_policy_controller.go:449] Ensuring jump to chain KUBE-ROUTER-INPUT from INPUT at position 1
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.486890 1291507 network_policy_controller.go:463] Allow traffic to ingress towards Cluster IP Range: 10.43.0.0/16 for family: IPv4
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.496057 1291507 network_policy_controller.go:481] Allow TCP traffic to ingress towards node port range: 30000:32767 for family: IPv4
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.504866 1291507 network_policy_controller.go:494] Allow UDP traffic to ingress towards node port range: 30000:32767 for family: IPv4
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.534369 1291507 network_policy_controller.go:270] Saving IPv4 iptables rules took 8.56443ms
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.534433 1291507 policy.go:83] Attempting to attain ipset mutex lock
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.534453 1291507 policy.go:85] Attained ipset mutex lock, continuing...
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.534687 1291507 ipset.go:564] ipset restore looks like: create TMP-XRYGEMNBG5CVZI2Y hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: add TMP-XRYGEMNBG5CVZI2Y 1.2.3.4
May 24 23:12:01 scout k3s[1291507]: create DIRECT_SERVER hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: swap TMP-XRYGEMNBG5CVZI2Y DIRECT_SERVER
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: add TMP-XRYGEMNBG5CVZI2Y 1.2.3.4
May 24 23:12:01 scout k3s[1291507]: create PROXIED_SERVER_12345 hash:net family inet hashsize 1024 maxelem 65536 bucketsize 12
May 24 23:12:01 scout k3s[1291507]: swap TMP-XRYGEMNBG5CVZI2Y PROXIED_SERVER_12345
May 24 23:12:01 scout k3s[1291507]: flush TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: destroy TMP-XRYGEMNBG5CVZI2Y
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.561481 1291507 policy.go:177] Restoring IPv4 ipset took 26.9803ms
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.561544 1291507 policy.go:184] Iptables chains in the filter table are synchronized with the network policies.
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.561564 1291507 policy.go:88] Returned ipset mutex lock
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.561579 1291507 policy.go:80] Syncing network policy chains took 27.151674ms
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.593673 1291507 network_policy_controller.go:310] Restoring IPv4 iptables rules took 15.060754ms
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.593736 1291507 network_policy_controller.go:699] Attempting to attain ipset mutex lock
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.593767 1291507 network_policy_controller.go:701] Attained ipset mutex lock, continuing...
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.597505 1291507 network_policy_controller.go:704] Returned ipset mutex lock
May 24 23:12:01 scout k3s[1291507]: I0524 23:12:01.597569 1291507 network_policy_controller.go:238] sync iptables took 151.020789ms

I'm inclined to believe the k3s version I run doesn't actually link against kube-router v2.1: If you look at my log it shows "ipset restore looks like: ...", which is consistent with the log format in v2.0, whereas in v2.1 the format was updated to contain "ipv6?". Will v2.0 explain the we're seeing here?

@aauren
Copy link
Collaborator

aauren commented May 25, 2024

I haven't gone in-depth on what changes between versions for this one, but I believe that v2.0 could explain it. Maybe try building k3s from HEAD and see if the problem goes away?

@shouya
Copy link
Author

shouya commented May 25, 2024

If my understanding is correct, the problem won't go away even if I rebuild k3s from HEAD because of the initial call of ipset.Save() here on startup will still save the ipset state at that time. Then any subsequent ipset.Restore() is going to essentially revert any external changes made after startup.

To stop ipset.Restore from interfering with sets not created by kube-router, I'm looking forward for one of two changes:

  • First, we can eliminate any calls to ipset.Save(), both from startup and from the periodic sync subroutine. This was what we tried hunting for so far. If there is confirmed no place ipset.Save can be invoked during sync, I can file a bug report on k3s so the startup call may also be removed.
  • Alternatively, we can add a filter in ipset.Save() to only handle sets that are related to kube-router, probably in the ParseIPSetSave function.

Either options would be sufficient to solve the problem I encountered. I'm personally inclined to the second option because logically it may be more correct for kube-router to only save anything it creates.

@aauren
Copy link
Collaborator

aauren commented May 25, 2024

I don't believe that first save does anything. Saving the ipset only puts info in the struct, but that struct doesn't get used again. I'm nearly positive that the problem you're experiencing is do to an old version of kube-router being used.

@shouya
Copy link
Author

shouya commented May 25, 2024

Good. Let me try upgrading my k3s and report back to you.

@shouya
Copy link
Author

shouya commented May 25, 2024

I tried upgrading to the latest k3s version (v1.30.0+k3s1) and still see the same problem occurring. And the logging of ipset restore string is still in the old format. Upon closer inspection I found their go.sum specifies their own fork of kube-router. Let me file a bug report to k3s developers. Thank you for your time helping me looking into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants