Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install fib rules and routes with proto kernel to avoid systemd messing with them #24288

Merged
merged 14 commits into from Mar 25, 2023

Conversation

NikAleksandrov
Copy link

@NikAleksandrov NikAleksandrov commented Mar 10, 2023

In order to workaround systemd's bad recent changes where they decided to manage "foreign" rules and to flush them on certain events (e.g. device flap), we should add our rules as "proto kernel" so systemd will just skip them and leave them in place in such events instead of wiping them from underneath us if a networkd policy like below is not installed (https://docs.cilium.io/en/stable/operations/system_requirements/#systemd-based-distributions):

ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no

Normal rules with Cilium deployed look like:
$ ip ru
9: from all fwmark 0x200/0xf00 lookup 2004
10: from all fwmark 0xa00/0xf00 lookup 2005
100: from all lookup local
32766: from all lookup main
32767: from all lookup default

After a network event we see systemd flushing all unspec rules (9, 10 and 100, the last one being critical):
$ ip rule list
32766: from all lookup main
32767: from all lookup default

With this change the rules remain in place and everything continues working as expected.

The change also affects routes installed by Cilium, they are now with proto kernel.
For example:

$ ip -d r sh
unicast 10.0.0.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 mtu 1373 
unicast 10.0.1.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 mtu 1373 
unicast 10.0.2.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 
unicast 10.0.2.245 dev cilium_host proto kernel scope link

$ ip -d rule sh
1:      from all fwmark 0xd00/0xf00 lookup 200 proto kernel
1:      from all fwmark 0xe00/0xf00 lookup 200 proto kernel
9:      from all fwmark 0x200/0xf00 lookup 2004 proto kernel
10:     from all fwmark 0xa00/0xf00 lookup 2005 proto kernel
100:    from all lookup local proto kernel
32766:  from all lookup main proto kernel
32767:  from all lookup default proto kernel

$ ip -d rou sh table 200
unicast 10.0.0.0/24 dev cilium_host proto kernel scope global mtu 1450 
unicast 10.0.1.0/24 dev cilium_host proto kernel scope global mtu 1450 
local 10.0.2.0/24 dev cilium_vxlan proto kernel scope host 

$ ip -d rou sh table 2004
local default dev lo proto kernel scope host 

$ ip -d rou sh table 2005
unicast default via 10.0.2.245 dev cilium_host proto kernel scope global 
unicast 10.0.2.245 dev cilium_host proto kernel scope link 

Note that we rely on the fact that route deletes with proto == 0 cause the kernel to not check if
the route protocol matches (it is ignored). Note also that there is one exception for the conversion done
which is about node-to-node encryption routes because those overload the route protocol and use it
to recognize routes installed by Cilium for node-to-node encryption. That can be revisited additionally.

Patches overview:
Patch 01 - adds a new RTProto constant in linux_defaults to be used for fib rules and routes
Patch 02 - adds a new RulePriorityLocalLookup constant in linux_defaults to be used for fib rules and routes
Patch 03 - changes rules and routes in bpf/init.sh (also default rt proto is taken as an argument)
Patch 04 - changes init.sh to take the local lookup rule priority as an argument
Patch 05 - updates the netlink library to get RTA_PROTOCOL support for fib rules
Patch 06 - adds support for rule protocol to datapath/linux/route
Patch 07 - updates datapath/linux/route
Patch 08 - updates egress gateway
Patch 09 - updates datapath/loader
Patch 10 - updates datapath/linux/routing (skips migration code, more in commit msg)
Patch 11 - updates datapath/linux/node (the patch doesn't modify node-to-node encryption routes)
Patch 12 - removes RouteProtocolIPSec and uses proto kernel instead
Patch 13 - adds "-d" to bugtool when dumping fib rules so we can log their protocol
Patch 14 - ensures that local lookup IP rules are installed on agent init and also cleans up old local lookup rules with a
different rt proto only if they're before the new ones with proto kernel

I have tested locally IPv4, IPv6 with and without tunnels, with and without ipsec. Routes and rules are installed
and cleaned up properly. This set also fixes another bug when we restore ip rules on init.sh failure, we'd
install rule #0 with proto unspec instead of kernel as the original rule.

Closes: #18706

Thanks,
Nik

@NikAleksandrov NikAleksandrov self-assigned this Mar 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 10, 2023
@brandshaide
Copy link
Contributor

brandshaide commented Mar 14, 2023

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@NikAleksandrov
Copy link
Author

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@brandshaide it is incomplete, to fix the issue fully we need to change all routes and rules to use proto kernel
but we're currently blocked by merging support for the proto attribute in the netlink library (PR).
Once that is merged I can push the additional changes that will complete the fix.

@brandshaide
Copy link
Contributor

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@brandshaide it is incomplete, to fix the issue fully we need to change all routes and rules to use proto kernel but we're currently blocked by merging support for the proto attribute in the netlink library (PR). Once that is merged I can push the additional changes that will complete the fix.

@NikAleksandrov aweseome! and thanks a lot for your efforts 👍

@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch 3 times, most recently from 49efd48 to bbf0bc3 Compare March 17, 2023 14:36
@NikAleksandrov NikAleksandrov marked this pull request as ready for review March 17, 2023 15:09
@NikAleksandrov NikAleksandrov requested review from a team as code owners March 17, 2023 15:09
@NikAleksandrov NikAleksandrov changed the title init.sh: install ip rules with proto kernel Install fib rules and routes with proto kernel to avoid systemd messing with them Mar 18, 2023
@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch 2 times, most recently from c43f8e1 to 0943194 Compare March 20, 2023 09:09
@NikAleksandrov
Copy link
Author

/test

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks nice to me. A non-blocking suggestion. I think it's better to define Go constant like DefaultRTProto (and the equivalent shell-script variable) and use that instead of hard-coding kernel protocol. So that when this kernel protocol comes at a problem, we can easily change that instead of finding all the places we set protocol.

@NikAleksandrov
Copy link
Author

Overall, looks nice to me. A non-blocking suggestion. I think it's better to define Go constant like DefaultRTProto (and the equivalent shell-script variable) and use that instead of hard-coding kernel protocol. So that when this kernel protocol comes at a problem, we can easily change that instead of finding all the places we set protocol.

Yep, good point. Coming right up.

@NikAleksandrov
Copy link
Author

@YutaroHayakawa I've requested a new review, the code is basically the same just using a constant as you suggested.

@pchaigno
Copy link
Member

This PR introduced a CI-only regression that was fixed by #24577. We'll thus need to backport that second PR wherever we backport this one.

@gandro
Copy link
Member

gandro commented Mar 27, 2023

This PR introduced a CI-only regression that was fixed by #24577. We'll thus need to backport that second PR wherever we backport this one.

Why did CI not catch the regression in this PR?

@pchaigno
Copy link
Member

@gandro CI wasn't run after the rebase. I don't know if the test passed before the last rebase.

@tklauser tklauser mentioned this pull request Mar 28, 2023
9 tasks
@tklauser tklauser added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.13 in 1.13.2 Mar 28, 2023
@tklauser tklauser added affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Mar 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Needs backport from master in 1.13.2 Mar 30, 2023
@tklauser tklauser added affects/v1.13 This issue affects v1.13 branch backport/author The backport will be carried out by the author of the PR. labels Mar 30, 2023
@jibi jibi removed affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch affects/v1.13 This issue affects v1.13 branch labels Mar 31, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from master in 1.13.2 Mar 31, 2023
@jibi
Copy link
Member

jibi commented Mar 31, 2023

Dropped the needs-backport labels as this PR is likely to be reverted

@vethastanley
Copy link

I see the fix is reverted. Do we know whether this issue will be fixed anytime recently?

@borkmann borkmann added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Sep 29, 2023
@julianwiedmann julianwiedmann removed the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Host network broken after one of the underlying interfaces of a bond goes down