Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Linux slave interface detection #17189

Merged
merged 1 commit into from Aug 23, 2021

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Aug 19, 2021

Using method netlink.Slave() doesn't always work. In particular, it doesn't work on AKS, maybe because there's no master
bond interface in that case. We should instead rely on the flags passed by Linux's netlink API.

Tested with a small reproduction commit on top of this pull request:

Fixes: #17169.

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.9 labels Aug 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.10 Aug 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.4 Aug 19, 2021
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-linux-slave-interface-detection branch from 1c21415 to fc7c39d Compare August 19, 2021 12:47
@pchaigno pchaigno added release-blocker/1.10 integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. and removed release-blocker/1.9 labels Aug 19, 2021
Using method Slave() exposed by the netlink package doesn't always work.
In particular, it doesn't work on AKS, maybe because there's no master
bond interface in that case. We should instead rely on the flags passed
by Linux's netlink API.

Fixes: 3e24551 ("routing: Fix incorrect interface selection for pod routes")
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-linux-slave-interface-detection branch from fc7c39d to 07a443e Compare August 19, 2021 13:29
@pchaigno pchaigno marked this pull request as ready for review August 19, 2021 13:29
@pchaigno pchaigno requested review from a team and kkourt August 19, 2021 13:29
@pchaigno
Copy link
Member Author

pchaigno commented Aug 19, 2021

test-me-please

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sLRPTests Checks local redirect policy LRP connectivity

Failure Output

FAIL: Expected

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.16-net-next' hit: #17060 (85.54% similarity)

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sFQDNTest Restart Cilium validate that FQDN is still working

Failure Output

FAIL: Testapp is not ready after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sBookInfoDemoTest Bookinfo Demo Tests bookinfo demo

Failure Output

FAIL: Pods are not ready after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@pchaigno
Copy link
Member Author

pchaigno commented Aug 20, 2021

/mlh new-flake Cilium-PR-K8s-GKE

👍 created #17204 #17205

Seems very unlikely to be introduced by this PR because it only touches ENI/Azure logic which we don't cover in GKE.

@pchaigno
Copy link
Member Author

pchaigno commented Aug 20, 2021

Restarting to be sure:
test-gke

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.10 Aug 20, 2021
@pchaigno
Copy link
Member Author

The same sort of issue happened two other times on GKE with different tests (cf. #17204 (comment)). On the timespan since I triggered the test the first time here, the GKE test suite was executed once on master. So it's possible the flake is in master but just didn't happen yet. We had a second look with Joe and can't see how this ENI/Azure PR could cause that flake on GKE.
Given other tests are passing and review requests are covered, I'm marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 20, 2021
@tklauser tklauser merged commit 37d6c8d into master Aug 23, 2021
@tklauser tklauser deleted the pr/pchaigno/fix-linux-slave-interface-detection branch August 23, 2021 07:11
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.4 Aug 23, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.4 Aug 23, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.4 Aug 24, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.4 Aug 24, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.10 Sep 1, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.10 Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.10.4
Backport done to v1.10
1.9.10
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

None yet

6 participants