New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loader: Revert incorrect initialization of endpoints in chaining mode #16227
Conversation
All test failures happen after tests running with endpoint-routes enabled. So I think I also need to revert a9ecab1 😞 |
We might try on our cluster with AWS-cni chaining. |
We have been using this PR cherrypicked on v1.9.7 branch, it seems it does generate drop when we restart cilium-agent while the load-injector is sending traffic to the echo target. Keeping Using kubernetes 1.14, kernel 5.4, Ubuntu, Aws CNI 1.7.10. Actually the drop happen even with the |
61837de
to
a1773ce
Compare
This commit is a partial revert of 72e6238 ("loader: Remove program and route when disable endpoint routes"). Commit 72e6238 started removing existing endpoint routes when enable-endpoint-routes is disabled in the agent. In chaining mode however, if Cilium isn't the primary CNI, it isn't responsible for the endpoint's networking. In that case, the primary CNI may install and rely on those endpoint routes and we shouldn't remove them. This commit reverts the removal of endpoint routes. We'll provide a proper solution to remove only endpoint routes Cilium "owns" in a subsequent commit. Fixes: 72e6238 ("loader: Remove program and route when disable endpoint routes") Signed-off-by: Paul Chaignon <paul@cilium.io>
This commit partially reverts commit a9ecab1. Disabling endpoint routes in an existing cluster is not supported for now. We first need to find a way to properly remove the endpoint routes (see previous commit) before we can support this. We keep the override of endpoint datapath config. for the host endpoint as otherwise host firewall test will error due to a failure to load bpf_host. Signed-off-by: Paul Chaignon <paul@cilium.io>
We need to deploy pods after Cilium is installed or they may receive the datapath corresponding to a previous Cilium installation. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Signed-off-by: Paul Chaignon <paul@cilium.io>
Commit 0875453 ("endpoint: Refactor init of EndpointDatapathConfiguration") leads to .RequireEgressProg being overwritten on endpoint creation. That in turns breaks reverse NAT when running in chaining mode [1]. This commit is a partial revert of commit 0875453, keeping only a helper function. 1 - https://github.com/cilium/cilium/blob/v1.10.0/plugins/cilium-cni/chaining/generic-veth/generic-veth.go#L165 Signed-off-by: Paul Chaignon <paul@cilium.io>
a1773ce
to
0118a4e
Compare
@rewiko Hey 👋 Thanks for testing! The pull request wasn't ready before; I think it should be fine now. I'll leave draft mode once end-to-end tests are passing. If you want to give it a second try, I'm happy to make the v1.9 backport and provide an image to test. |
@pchaigno , i will retest with the latest commit and provide the results tommorow. |
Thanks for testing @yorg1st! We also tested an upgrade to this image from v1.9.5 and the master image in aws-cni clusters. |
All required tests are passing, several manual tests were performed with aws-cni, and reviews are in. Marking as ready to merge. |
This pull request is a partial revert of #15228.
Commit 72e6238 started removing existing endpoint routes when
enable-endpoint-routes
is disabled in the agent. In chaining mode however, if Cilium isn't the primary CNI, it isn't responsible for the endpoint's networking. In that case, the primary CNI may install and rely on those endpoint routes and we shouldn't remove them. Commit 0875453, from the same PR, overwrote theRequireEgressProg
datapath attribute, breaking reverse NAT in chaining mode.This pull request reverts incorrect changes. We'll provide a proper solution to remove only endpoint routes Cilium "owns" in a subsequent pull request, after more end-to-end tests have been added for chaining mode.
See commit descriptions for details.
Fixes: #16007.
Fixes: #15228.