Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: Revert incorrect initialization of endpoints in chaining mode #16227

Merged
merged 4 commits into from May 27, 2021

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented May 19, 2021

This pull request is a partial revert of #15228.

Commit 72e6238 started removing existing endpoint routes when enable-endpoint-routes is disabled in the agent. In chaining mode however, if Cilium isn't the primary CNI, it isn't responsible for the endpoint's networking. In that case, the primary CNI may install and rely on those endpoint routes and we shouldn't remove them. Commit 0875453, from the same PR, overwrote the RequireEgressProg datapath attribute, breaking reverse NAT in chaining mode.

This pull request reverts incorrect changes. We'll provide a proper solution to remove only endpoint routes Cilium "owns" in a subsequent pull request, after more end-to-end tests have been added for chaining mode.

See commit descriptions for details.

Fixes: #16007.
Fixes: #15228.

@pchaigno pchaigno added priority/high This is considered vital to an upcoming release. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.9 labels May 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.8 May 19, 2021
@pchaigno
Copy link
Member Author

All test failures happen after tests running with endpoint-routes enabled. So I think I also need to revert a9ecab1 😞

@rewiko
Copy link

rewiko commented May 20, 2021

We might try on our cluster with AWS-cni chaining.

@rewiko
Copy link

rewiko commented May 20, 2021

Drop Screenshot from 2021-05-20 14-41-27

We have been using this PR cherrypicked on v1.9.7 branch, it seems it does generate drop when we restart cilium-agent while the load-injector is sending traffic to the echo target.

Keeping ep.DatapathConfiguration = NewDatapathConfiguration() do not generate any drop.

Using kubernetes 1.14, kernel 5.4, Ubuntu, Aws CNI 1.7.10.

Actually the drop happen even with the ep.DatapathConfiguration = NewDatapathConfiguration() commented.

@pchaigno pchaigno force-pushed the revert-ep-route-removal branch 3 times, most recently from 61837de to a1773ce Compare May 21, 2021 12:27
This commit is a partial revert of 72e6238 ("loader: Remove program and
route when disable endpoint routes").

Commit 72e6238 started removing existing endpoint routes when
enable-endpoint-routes is disabled in the agent. In chaining mode
however, if Cilium isn't the primary CNI, it isn't responsible for
the endpoint's networking. In that case, the primary CNI may install
and rely on those endpoint routes and we shouldn't remove them.

This commit reverts the removal of endpoint routes. We'll provide a
proper solution to remove only endpoint routes Cilium "owns" in a
subsequent commit.

Fixes: 72e6238 ("loader: Remove program and route when disable endpoint routes")
Signed-off-by: Paul Chaignon <paul@cilium.io>
This commit partially reverts commit a9ecab1.

Disabling endpoint routes in an existing cluster is not supported for
now. We first need to find a way to properly remove the endpoint routes
(see previous commit) before we can support this.

We keep the override of endpoint datapath config. for the host endpoint
as otherwise host firewall test will error due to a failure to load
bpf_host.

Signed-off-by: Paul Chaignon <paul@cilium.io>
We need to deploy pods after Cilium is installed or they may receive the
datapath corresponding to a previous Cilium installation.

Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Commit 0875453 ("endpoint: Refactor init of
EndpointDatapathConfiguration") leads to .RequireEgressProg being
overwritten on endpoint creation. That in turns breaks reverse NAT when
running in chaining mode [1].

This commit is a partial revert of commit
0875453, keeping only a helper function.

1 - https://github.com/cilium/cilium/blob/v1.10.0/plugins/cilium-cni/chaining/generic-veth/generic-veth.go#L165
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno
Copy link
Member Author

@rewiko Hey 👋 Thanks for testing! The pull request wasn't ready before; I think it should be fine now. I'll leave draft mode once end-to-end tests are passing. If you want to give it a second try, I'm happy to make the v1.9 backport and provide an image to test.

@yorg1st
Copy link

yorg1st commented May 25, 2021

@rewiko Hey 👋 Thanks for testing! The pull request wasn't ready before; I think it should be fine now. I'll leave draft mode once end-to-end tests are passing. If you want to give it a second try, I'm happy to make the v1.9 backport and provide an image to test.

@pchaigno , i will retest with the latest commit and provide the results tommorow.

@pchaigno pchaigno marked this pull request as ready for review May 26, 2021 08:38
@pchaigno pchaigno requested review from a team as code owners May 26, 2021 08:38
@pchaigno pchaigno requested a review from a team May 26, 2021 08:38
@pchaigno pchaigno requested a review from a team as a code owner May 26, 2021 08:38
@pchaigno
Copy link
Member Author

Thanks for testing @yorg1st! We also tested an upgrade to this image from v1.9.5 and the master image in aws-cni clusters.

@pchaigno
Copy link
Member Author

All required tests are passing, several manual tests were performed with aws-cni, and reviews are in. Marking as ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2021
@kaworu kaworu merged commit 8da8b88 into cilium:master May 27, 2021
@pchaigno pchaigno deleted the revert-ep-route-removal branch May 27, 2021 13:39
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.8 May 27, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.8 May 27, 2021
@qmonnet qmonnet mentioned this pull request Jun 1, 2021
23 tasks
pchaigno pushed a commit that referenced this pull request Jul 2, 2021
This reverts commit 437e2bb.
The original issue has been fixed, and hence this can be removed (c.f.
#16227).

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
kkourt pushed a commit that referenced this pull request Jul 8, 2021
[ upstream commit 8c94f11 ]

This reverts commit 437e2bb.
The original issue has been fixed, and hence this can be removed (c.f.
#16227).

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Signed-off-by: Tom Payne <tom@isovalent.com>
aditighag pushed a commit that referenced this pull request Jul 8, 2021
[ upstream commit 8c94f11 ]

This reverts commit 437e2bb.
The original issue has been fixed, and hence this can be removed (c.f.
#16227).

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Signed-off-by: Tom Payne <tom@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high This is considered vital to an upcoming release. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.9.8
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

Connectivity broken in AWS CNI chaining mode
9 participants