Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align IP selection behavior with kernel #22866

Merged
merged 2 commits into from Jan 18, 2023
Merged

Align IP selection behavior with kernel #22866

merged 2 commits into from Jan 18, 2023

Conversation

akhilles
Copy link
Contributor

@akhilles akhilles commented Dec 24, 2022

Sorting IPs by bytes.Compare gets rid of creation order, which can
lead to secondary IPs being chosed for masquerading and NodePort SNAT
purposes. Instead, stably sort by ifindex to preserve creation order and
give preference to primary IPs. Also, filter out secondary IPs.

This also more closely aligns with the Linux kernel's IP selection
behavior.

Fixes: #22853

Align selection of IP addresses used for masquerading and NodePort SNAT with Linux kernel behavior, by preferring addresses assigned to the interface earlier and filtering out secondary addresses.

@akhilles akhilles requested a review from a team as a code owner December 24, 2022 00:00
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 24, 2022
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Dec 24, 2022
@christarazi christarazi requested review from a team, NikAleksandrov and pchaigno and removed request for a team December 27, 2022 20:39
@sayboras
Copy link
Member

Thanks for your PR. I am not familiar with this code path, so I will ask for other agent member to review the changes 🙏.

@sayboras sayboras removed their request for review January 3, 2023 03:11
@akhilles
Copy link
Contributor Author

akhilles commented Jan 5, 2023

Would appreciate a review on this. Anything I can do to help?

pkg/node/address_linux.go Outdated Show resolved Hide resolved
@ldelossa ldelossa added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jan 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 5, 2023
@ldelossa
Copy link
Contributor

ldelossa commented Jan 5, 2023

@akhilles I believe the comments on the caller to this function firstGlobalV4Addr will need updating. Most notably, this statement is no longer true, correct?

// firstGlobalV4Addr returns the first IPv4 global IP of an interface,
// where the IPs are sorted in ascending order.

I believe the addresses will be returned in creation order, as that's how the kernel returns them on my machine.

// If the IP is the same as the preferredIP, that
// means that maybe it is restored from node_config.h,
// so if it is present we prefer this one.
if a.IP.Equal(preferredIP) {
hasPreferred = true
} else if a.Flags&unix.IFA_F_SECONDARY > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to filter our secondary addresses, I think we should do it all of the time. Hoisting this to the first or second statement in for loop maybe best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think preferredIP should take precedence over primary/secondary because it's being explicitly provided as an input.

I don't have a strong preference, but respecting preferredIP seems less surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could refactor the code a bit to make this clearer (note the comment update as well):

// If the IP is the same as the preferredIP, that
// means that maybe it is restored from node_config.h,
// so if it is present we prefer this one, even if it
// is a secondary address.
if a.IP.Equal(preferredIP) {
	hasPreferred = true
	continue
}

if a.Flags&unix.IFA_F_SECONDARY > 0 {
	// Skip secondary addresses
	continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be a bit problematic in the case where you may have a preferred IP on an interface, but also a secondary IP which overlaps it? You'd potentially select the secondary IP if it was created first I believe.

Maybe these are edge cases we aren't concerned with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding the scenario you described, but I think the preferred IP would still be selected since it has precedence over the ordering and the filtering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to get it is, we must explain this function like this, currently, "We filter secondary IP addresses, unless it is preferred. If two preferred IP addresses exist across interfaces, we return the one created first".

If we always filter secondary, the outcome is more predictable. "We filter secondary IP addresses. IPs are returned in creation order across interfaces".

Question tho, given your statement here: #22866 (comment) why do you propose we filter these secondary addresses? It seems like simply sorting by creation order fixes your issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering secondary addresses isn't needed, I added it only to be consistent with the kernel's inet_select_addr function. Maybe it's better to remove it for now? I can't think of a use-case where creation order isn't sufficient for selecting a primary IP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being consistent with the kernel is good (as long as it's in a separate commit). That avoids surprises for users when they switch from kube-proxy to KPR for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the filtering into a separate commit and added this comment:

All secondary IPs, except the preferredIP, are filtered out.

IMO, this is clear enough since currently, there's only one preferred IP that's provided as an input.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the IPs were sorted and the first value was chosen. This could cause a floating IP to be chosen for SNAT use instead of the primary IP address.

This patch stably sorts addresses by link index, which preserves the ordering of IPs for any given interface.

Could you give a bit more detail here? IIUC, it's the sort that is fixing your issue with floating IP addresses. How come? Is it because your floating IPs are always assigned to an interface that is created after the interface with the primary address?

Prefer primary IPs for masquerade, NodePort address selection

Maybe reword to:

Align selection of IP address for masquerading on Linux's logic, by preferring older interfaces and filtering out secondary addresses.

pkg/node/address_linux.go Outdated Show resolved Hide resolved
pkg/node/address_linux.go Outdated Show resolved Hide resolved
@akhilles akhilles closed this Jan 11, 2023
@akhilles akhilles reopened this Jan 11, 2023
@maintainer-s-little-helper

This comment was marked as resolved.

1 similar comment
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 11, 2023
Sorting IPs by `bytes.Compare` gets rid of creation order, which can
lead to secondary IPs being chosed for masquerading and NodePort SNAT
purposes. Instead, stably sort by ifindex to preserve creation order and
give preference to primary IPs.

This also more closely aligns with the Linux kernel's IP selection
behavior.

Signed-off-by: Akhil Velagapudi <4@4khil.com>
This further aligns IP selection with the Linux kernel, which results in
less surprising behavior when switching from kube-proxy to KPR.

Signed-off-by: Akhil Velagapudi <4@4khil.com>
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 11, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pchaigno pchaigno requested a review from brb January 11, 2023 11:44
Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@pchaigno
Copy link
Member

pchaigno commented Jan 11, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN and endpoint routes

Failure Output

FAIL: Failed to reach 192.168.56.11:80 from testclient-ghm72

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

@akhilles
Copy link
Contributor Author

I think the test failure is unrelated or a flake. I noticed an unrelated PR (22768) failed with the similar error (on a different k8s version):

[unable to upgrade connection: Authorization error](error: unable to upgrade connection: Authorization error (user=kube-apiserver-kubelet-client, verb=create, resource=nodes, subresource=proxy))

Would it be possible to rerun the tests?

@pchaigno
Copy link
Member

pchaigno commented Jan 13, 2023

Previous failure was known flake #15455.
/test-1.25-4.19

@akhilles
Copy link
Contributor Author

Looks like Cluster mesh w/ WireGuard test failure is also known flake: #23044.

@pchaigno
Copy link
Member

Looks like Cluster mesh w/ WireGuard test failure is also known flake: #23044.

Yep, that one was disabled in the meantime so fine to ignore.

Everything else is green and we have reviews so I'll mark ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 13, 2023
@akhilles
Copy link
Contributor Author

Thanks!

@joestringer joestringer merged commit d19645e into cilium:master Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFP: Improve firstGlobalAddr IP selection
5 participants