New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing: Fix route collisions in AWS ENI #14269
routing: Fix route collisions in AWS ENI #14269
Conversation
9cb50cc
to
93e8218
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
93e8218
to
3dfb7f3
Compare
[ upstream commit 332a3fd; forward-ported from v1.7 tree ] This commit fixes a potential route collision in AWS ENI IPAM modes, where the ifindex could equal the main routing table ID (from 253-255) [1], causing traffic to be subject to these routes incorrectly. This is admittedly rare, but we've seen this from a user report. The impact is that most traffic on the node is suddenly blackholed. To fix this, we say that each device or interface (ENI) will have their own dedicated routing table. The table ID will start with an offset of 10 because it is highly unlikely to collide with the main routing table ID (from 253-255). We grab the number associated with the ENI device (`Number`) and add the offset. For example, if we have an ENI device "eni-0" which has a `Number` of 5, then the table ID will be 10 + 5 = 15. Another important piece to note is that only the egress rule will reside inside the per-device tables, whereas the ingress rule will stay in the main routing table. This is because we want the main routing table to hold the routes to the endpoint. Moving forward, the ENI datapath will now create rules under a new egress priority value (RulePriorityEgressv2), as long as the egress-multi-home-ip-rule-compat flag is false. If it's true, then the datapath will create rules under the original egress priority value (RulePriorityEgress). This helps disambiguate when running with the older or newer ENI datapath. See cilium#14336. [1]: See ip-route(8) Reported-by: Vlad Ungureanu <vladu@palantir.com> Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
ac7b671
to
9efaa11
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the original PR and was happy with it, and I've now reviewed the commits that were not in the original PR. Those LGTM.
Let me know if you think I should pay closer attention to any of the remaining commits.
See commit msgs.
This PR has been "forward-ported" from the following direct backport PR to the v1.7 branch: #14337.
Most commits have been forward-ported from the aforementioned PR, but a few commits are only meant for master (with intention to be backported to v1.9 & 1.8) because of code that didn't exist in v1.7 and because of the initial assumption that this issue impacts both ENI and Azure modes. In reality, only ENI mode has been impacted so far. While it is possible for the issue to occur on Azure, it is very unlikely and a separate PR will be made to address that (see #14705).
Fixes: #14336