New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align IP selection behavior with kernel #22866
Conversation
Thanks for your PR. I am not familiar with this code path, so I will ask for other agent member to review the changes 🙏. |
Would appreciate a review on this. Anything I can do to help? |
@akhilles I believe the comments on the caller to this function
I believe the addresses will be returned in creation order, as that's how the kernel returns them on my machine. |
pkg/node/address_linux.go
Outdated
// If the IP is the same as the preferredIP, that | ||
// means that maybe it is restored from node_config.h, | ||
// so if it is present we prefer this one. | ||
if a.IP.Equal(preferredIP) { | ||
hasPreferred = true | ||
} else if a.Flags&unix.IFA_F_SECONDARY > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to filter our secondary addresses, I think we should do it all of the time. Hoisting this to the first or second statement in for loop maybe best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think preferredIP
should take precedence over primary/secondary because it's being explicitly provided as an input.
I don't have a strong preference, but respecting preferredIP
seems less surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could refactor the code a bit to make this clearer (note the comment update as well):
// If the IP is the same as the preferredIP, that
// means that maybe it is restored from node_config.h,
// so if it is present we prefer this one, even if it
// is a secondary address.
if a.IP.Equal(preferredIP) {
hasPreferred = true
continue
}
if a.Flags&unix.IFA_F_SECONDARY > 0 {
// Skip secondary addresses
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be a bit problematic in the case where you may have a preferred IP on an interface, but also a secondary IP which overlaps it? You'd potentially select the secondary IP if it was created first I believe.
Maybe these are edge cases we aren't concerned with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding the scenario you described, but I think the preferred IP would still be selected since it has precedence over the ordering and the filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to get it is, we must explain this function like this, currently, "We filter secondary IP addresses, unless it is preferred. If two preferred IP addresses exist across interfaces, we return the one created first".
If we always filter secondary, the outcome is more predictable. "We filter secondary IP addresses. IPs are returned in creation order across interfaces".
Question tho, given your statement here: #22866 (comment) why do you propose we filter these secondary addresses? It seems like simply sorting by creation order fixes your issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering secondary addresses isn't needed, I added it only to be consistent with the kernel's inet_select_addr
function. Maybe it's better to remove it for now? I can't think of a use-case where creation order isn't sufficient for selecting a primary IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think being consistent with the kernel is good (as long as it's in a separate commit). That avoids surprises for users when they switch from kube-proxy to KPR for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the filtering into a separate commit and added this comment:
All secondary IPs, except the preferredIP, are filtered out.
IMO, this is clear enough since currently, there's only one preferred IP that's provided as an input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the IPs were sorted and the first value was chosen. This could cause a floating IP to be chosen for SNAT use instead of the primary IP address.
This patch stably sorts addresses by link index, which preserves the ordering of IPs for any given interface.
Could you give a bit more detail here? IIUC, it's the sort that is fixing your issue with floating IP addresses. How come? Is it because your floating IPs are always assigned to an interface that is created after the interface with the primary address?
Prefer primary IPs for masquerade, NodePort address selection
Maybe reword to:
Align selection of IP address for masquerading on Linux's logic, by preferring older interfaces and filtering out secondary addresses.
This comment was marked as resolved.
This comment was marked as resolved.
1 similar comment
This comment was marked as resolved.
This comment was marked as resolved.
Sorting IPs by `bytes.Compare` gets rid of creation order, which can lead to secondary IPs being chosed for masquerading and NodePort SNAT purposes. Instead, stably sort by ifindex to preserve creation order and give preference to primary IPs. This also more closely aligns with the Linux kernel's IP selection behavior. Signed-off-by: Akhil Velagapudi <4@4khil.com>
This further aligns IP selection with the Linux kernel, which results in less surprising behavior when switching from kube-proxy to KPR. Signed-off-by: Akhil Velagapudi <4@4khil.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
/test Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
I think the test failure is unrelated or a flake. I noticed an unrelated PR (22768) failed with the similar error (on a different k8s version):
Would it be possible to rerun the tests? |
Previous failure was known flake #15455. |
Looks like Cluster mesh w/ WireGuard test failure is also known flake: #23044. |
Yep, that one was disabled in the meantime so fine to ignore. Everything else is green and we have reviews so I'll mark ready to merge. |
Thanks! |
Sorting IPs by
bytes.Compare
gets rid of creation order, which canlead to secondary IPs being chosed for masquerading and NodePort SNAT
purposes. Instead, stably sort by ifindex to preserve creation order and
give preference to primary IPs. Also, filter out secondary IPs.
This also more closely aligns with the Linux kernel's IP selection
behavior.
Fixes: #22853