Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cleanup Code for v2.x #1650

Merged
merged 3 commits into from
Apr 26, 2024
Merged

Fix Cleanup Code for v2.x #1650

merged 3 commits into from
Apr 26, 2024

Conversation

aauren
Copy link
Collaborator

@aauren aauren commented Apr 21, 2024

kube-router v2.X introduced the idea of iptables and ipset handlers that allow kube-router to be dual-stack capable. However, the cleanup logic for the various controllers was not properly ported when this happened. When the cleanup functions run, they often have not had their controllers fully initialized as cleanup should not be dependant on kube-router being able to reach a kube-apiserver.

As such, they were missing these handlers. And as such they either silently ended up doing noops or worse, they would run into nil pointer failures.

This corrects that, so that kube-router no longer fails this way and cleans up as it had in v1.X.

@vladimirtiukhtin - I tested this in my environment and found it worked correctly, but if you wouldn't mind checking as well, that would be helpful to ensure that there isn't something I'm missing. There should be a kube-router docker container that is built as part of this PR that you can find in the GitHub actions section if you don't want to build kube-router yourself.

Fixes: #1649

@vladimirtiukhtin
Copy link

@aauren it works. Not without errors though

# kube-router --cleanup-config
I0422 08:55:24.774859       7 network_policy_controller.go:713] Cleaning up NetworkPolicyController configurations...
I0422 08:55:25.432990       7 network_policy_controller.go:766] Successfully cleaned the NetworkPolicyController configurations done by kube-router
I0422 08:55:25.433718       7 network_services_controller.go:1755] Cleaning up NetworkServiceController configurations...
WARN[0000] Running modprobe ip_vs failed with message: `modprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1 
I0422 08:55:25.437538       7 network_services_controller.go:1762] ipvs definitions don't have names associated with them for checking, during cleanup we assume that we own all of them and delete all ipvs definitions
I0422 08:55:25.446883       7 network_services_controller.go:1790] Processing IPv4 hairpin rule cleanup
I0422 08:55:25.449415       7 network_services_controller.go:1798] Processing IPv6 hairpin rule cleanup
E0422 08:55:25.470290       7 network_services_controller.go:566] Failed to run iptables command: running [/sbin/iptables -t filter -X KUBE-ROUTER-SERVICES --wait]: exit status 4: iptables v1.8.10 (nf_tables):  CHAIN_DEL failed (Resource busy): chain KUBE-ROUTER-SERVICES
I0422 08:55:26.553989       7 network_services_controller.go:1822] Successfully cleaned the NetworkServiceController configuration done by kube-router
I0422 08:55:26.554088       7 network_routes_controller.go:897] Cleaning up NetworkRoutesController configurations
I0422 08:55:26.561566       7 pod_egress.go:73] Deleted iptables rule to masquerade outbound traffic from pods.
W0422 08:55:27.626388       7 network_routes_controller.go:947] Error deleting ipset: ipset v7.19: Set cannot be destroyed: it is in use by a kernel component
W0422 08:55:27.646544       7 network_routes_controller.go:947] Error deleting ipset: ipset v7.19: The set with the given name does not exist
I0422 08:55:27.646589       7 network_routes_controller.go:951] Successfully cleaned the NetworkRoutesController configuration done by kube-router

@aauren
Copy link
Collaborator Author

aauren commented Apr 22, 2024

@vladimirtiukhtin - Thanks for testing!

Can you show me how you ran the cleanup? When I tested, I ran it from within the kube-router container and I didn't get any of the errors that you got. But we could probably make it a bit more robust.

@vladimirtiukhtin
Copy link

This is what I did on one of the nodes

# ctr run --privileged -t --net-host docker.io/cloudnativelabs/kube-router-git:PR-1650@sha256:4109d544ae89cb08a694baf325bc54f15f449a3f36ae58bfe1be537a3b5f7e2d test sh
~ # 
~ # kube-router --cleanup-config
I0422 08:55:24.774859       7 network_policy_controller.go:713] Cleaning up NetworkPolicyController configurations...
I0422 08:55:25.432990       7 network_policy_controller.go:766] Successfully cleaned the NetworkPolicyController configurations done by kube-router
I0422 08:55:25.433718       7 network_services_controller.go:1755] Cleaning up NetworkServiceController configurations...
WARN[0000] Running modprobe ip_vs failed with message: `modprobe: can't change directory to '/lib/modules': No such file or directory`, error: exit status 1 
I0422 08:55:25.437538       7 network_services_controller.go:1762] ipvs definitions don't have names associated with them for checking, during cleanup we assume that we own all of them and delete all ipvs definitions
I0422 08:55:25.446883       7 network_services_controller.go:1790] Processing IPv4 hairpin rule cleanup
I0422 08:55:25.449415       7 network_services_controller.go:1798] Processing IPv6 hairpin rule cleanup
E0422 08:55:25.470290       7 network_services_controller.go:566] Failed to run iptables command: running [/sbin/iptables -t filter -X KUBE-ROUTER-SERVICES --wait]: exit status 4: iptables v1.8.10 (nf_tables):  CHAIN_DEL failed (Resource busy): chain KUBE-ROUTER-SERVICES
I0422 08:55:26.553989       7 network_services_controller.go:1822] Successfully cleaned the NetworkServiceController configuration done by kube-router
I0422 08:55:26.554088       7 network_routes_controller.go:897] Cleaning up NetworkRoutesController configurations
I0422 08:55:26.561566       7 pod_egress.go:73] Deleted iptables rule to masquerade outbound traffic from pods.
W0422 08:55:27.626388       7 network_routes_controller.go:947] Error deleting ipset: ipset v7.19: Set cannot be destroyed: it is in use by a kernel component
W0422 08:55:27.646544       7 network_routes_controller.go:947] Error deleting ipset: ipset v7.19: The set with the given name does not exist
I0422 08:55:27.646589       7 network_routes_controller.go:951] Successfully cleaned the NetworkRoutesController configuration done by kube-router

@aauren
Copy link
Collaborator Author

aauren commented Apr 22, 2024

So on my cluster, I was able to run without issues if I add in a bind mount for the /lib/modules directory (which is what the deployments in k8s do for kube-router) with the following:

sudo ctr run --privileged -t --mount type=bind,src=/lib/modules,dst=/lib/modules,options=rbind:ro --net-host docker.io/cloudnativelabs/kube-router-git:PR-1650@sha256:4109d544ae89cb08a694baf325bc54f15f449a3f36ae58bfe1be537a3b5f7e2d test /usr/local/bin/kube-router --cleanup-config

The ipset error are likely a symptom of the referencing iptables not being removed. Those errors are caused by this log line:

E0422 08:55:25.470290       7 network_services_controller.go:566] Failed to run iptables command: running [/sbin/iptables -t filter -X KUBE-ROUTER-SERVICES --wait]: exit status 4: iptables v1.8.10 (nf_tables):  CHAIN_DEL failed (Resource busy): chain KUBE-ROUTER-SERVICES

I can't reproduce that on my node, but that is probably due to not having the same contention over iptables as your cluster likely has. You mentioned in another post that you have configured some jobs to run with a frequency of 1 minute. If those types of nodes are the same ones where you are testing this, then you likely have a lot more going on with iptables than I do in my cluster.

My guess is that --wait isn't fully taking effect because the xtables.lock isn't being mounted into the container. I would try running with that mounted as well like the daemonset runs. So something like this:

sudo ctr run --privileged -t --mount type=bind,src=/lib/modules,dst=/lib/modules,options=rbind:ro --mount type=bind,src=/run/xtables.lock,dst=/run/xtables.lock,options=rbind:rw --net-host docker.io/cloudnativelabs/kube-router-git:PR-1650@sha256:4109d544ae89cb08a694baf325bc54f15f449a3f36ae58bfe1be537a3b5f7e2d test /usr/local/bin/kube-router --cleanup-config

See if that works for you?

Before the logic ran like the following in terms of preference:

1. Prefer environment var NODE_NAME
2. `Use os.Hostname()`
3. Fallback to `--hostname-override` passed by user

This didn't make a whole lot of sense, as `--hostname-override` is
directly, and supposedly intentionally set by the user, therefore it
should be the MOST preferred, not the least preferred. Additionally,
none of the errors encountered were passed back to the user so that
future conditions could be considered, so if there was an error at the
API level, that error was swallowed. Now the logic looks like:

1. Prefer `--hostname-override` if it is set. If it is set and we
   weren't able to resolve to a node object, return the error
2. Use environment var NODE_NAME if it is set. If it is set and we
   weren't able to resolve to a node object, return the error
3. Fallback to `os.Hostname()`. If we weren't able to resolve to a node
   object then return the error and give the user options
kube-router v2.X introduced the idea of iptables and ipset handlers that
allow kube-router to be dual-stack capable. However, the cleanup logic
for the various controllers was not properly ported when this happened.
When the cleanup functions run, they often have not had their
controllers fully initialized as cleanup should not be dependant on
kube-router being able to reach a kube-apiserver.

As such, they were missing these handlers. And as such they either
silently ended up doing noops or worse, they would run into nil pointer
failures.

This corrects that, so that kube-router no longer fails this way and
cleans up as it had in v1.X.
@aauren
Copy link
Collaborator Author

aauren commented Apr 26, 2024

I have updated the user-guide to show a better version of the cleanup command which should resolve any errors for most users.

@aauren aauren merged commit e40f46e into master Apr 26, 2024
6 checks passed
@aauren aauren deleted the fix_cleanup_code_for_v2.x branch April 26, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLI Option --cleanup-config Is Not Working Correctly
2 participants