Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium-dbg: New encrypt flush --stale flag #31159

Merged
merged 3 commits into from Mar 6, 2024

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Mar 5, 2024

First commit changes how we handle errors. Second commit refactors the confirmation logic. Last commit introduces the new flag.

Introduce `cilium-dbg encrypt flush --stale` flag to remove XFRM states and policies with stale node IDs.

@pchaigno pchaigno added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/cli Impacts the command line interface of any command in the repository. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. feature/ipsec Relates to Cilium's IPsec feature labels Mar 5, 2024
This commit slightly changes the behavior of the "encrypt flush"
command in case of errors when trying to delete XFRM rules. The tool
currently lists rules, filters them based on user-given arguments, and
then deletes them. If an XFRM rule is deleted by the agent or the user
while we're filtering, the deletion will fail.

The current behavior in that case is to fatal. On busy clusters, that
might mean that we always fatal because XFRM states and policies are
constently added and removed.

This commit changes the behavior to proceed with subsequent deletions in
case one fails.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit refactors the code a bit simplify a latter commit. No
functional changes.

This may be a bit excessive in commit splitting, but at least I can
claim my last commit is free of any refactoring 😅

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This new flag will allow users to clean stale XFRM states and policies
based on the node ID map contents. If XFRM states or policies are found
with a node ID that is not in the BPF map, then we probably have a leak
somewhere.

Such leaks can lead in extreme cases to performance degradation when the
number of XFRM states and policies grows large (and if using ENI or
Azure IPAM). Having a tool to cleanup these XFRM states and policies
until the leak is fixed can therefore be critical.

The new flag is incompatible with the --spi and --node-id filter flags.

We first dump the XFRM rules and then dump the map content. In that way,
if a new node ID is allocated while we're running the tool, we will
simply ignore the corresponding XFRM rules. If a node ID is removed
while running the tool, we will fail to remove the corresponding XFRM
rules and continue with the others.

Tested on a GKE cluster by adding fake XFRM states and policies that the
tool was able to remove.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno
Copy link
Member Author

pchaigno commented Mar 5, 2024

/test

@pchaigno pchaigno marked this pull request as ready for review March 5, 2024 11:45
@pchaigno pchaigno requested review from a team as code owners March 5, 2024 11:45
@pchaigno pchaigno enabled auto-merge March 5, 2024 21:45
@pchaigno pchaigno added this pull request to the merge queue Mar 6, 2024
Merged via the queue into cilium:main with commit 5eb27e2 Mar 6, 2024
62 checks passed
@pchaigno pchaigno deleted the ipsec-cli-clean-stale-xfrm branch March 6, 2024 05:57
@pchaigno pchaigno added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.2 Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.8 Mar 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.13 Mar 6, 2024
@jibi jibi mentioned this pull request Mar 11, 2024
5 tasks
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 11, 2024
@jibi jibi added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Needs backport from main in 1.13.13 Mar 13, 2024
@jrajahalme jrajahalme added this to Needs backport from main in 1.15.3 Mar 13, 2024
@jrajahalme jrajahalme removed this from Needs backport from main in 1.15.2 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.14 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@jibi jibi added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 13, 2024
@thorn3r thorn3r added this to Backport pending to v1.14 in 1.14.9 Mar 13, 2024
@thorn3r thorn3r removed this from Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Mar 16, 2024
@jrajahalme jrajahalme moved this from Needs backport from main to Backport done to v1.15 in 1.15.3 Mar 26, 2024
@thorn3r thorn3r moved this from Needs backport from main to Backport done to v1.13 in 1.13.14 Mar 26, 2024
@jrajahalme jrajahalme moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.9 Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Impacts the command line interface of any command in the repository. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. feature/ipsec Relates to Cilium's IPsec feature release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
No open projects
1.13.14
Backport done to v1.13
1.14.9
Backport done to v1.14
1.15.3
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

None yet

3 participants