Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugtool: Collect XFRM error counters twice #28790

Merged
merged 1 commit into from Oct 26, 2023

Conversation

pchaigno
Copy link
Member

This pull request changes the bugtool report to collect the XFRM error counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will collect at the beginning and end of the bugtool collection. In that way, there will be around 5-6 seconds between the two collections and we may see if any counter is currently increasing.

$ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
5c5
< XfrmInStateProtoError   	4
---
> XfrmInStateProtoError   	6

In this example, we can easily see that the XfrmInStateProtoError is increasing. That suggests a key rotation issue is currently ongoing (cf. IPsec troubleshooting docs).

I tried other approaches to collect over a longer timespan. That may allow us to see slower increases. They all end up being more complex or messier (we'd need to collect at beginning and end of the sysdump instead). In the end, considering this is already a fallback plan for when customers don't collect Prometheus metrics, I think the current, simpler approach is good enough.

Fixes: #16538.

@pchaigno pchaigno added area/bugtool Impacts gathering of data for debugging purposes. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.9 Oct 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Oct 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.16 Oct 25, 2023
@pchaigno pchaigno marked this pull request as ready for review October 25, 2023 14:45
@pchaigno pchaigno requested a review from a team as a code owner October 25, 2023 14:45
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question inline to help my own understanding.

bugtool/cmd/configuration.go Show resolved Hide resolved
This commit changes the bugtool report to collect the XFRM error
counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will
collect at the beginning and end of the bugtool collection. In that way,
there will be around 5-6 seconds between the two collections and we may
see if any counter is currently increasing.

    $ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
    5c5
    < XfrmInStateProtoError   	4
    ---
    > XfrmInStateProtoError   	6

In this example, we can easily see that the XfrmInStateProtoError is
increasing. That suggests a key rotation issue is currently ongoing (cf.
IPsec troubleshooting docs).

I tried other approaches to collect over a longer timespan. That may
allow us to see slower increases. They all end up being more complex or
messier (we'd need to collect at beginning and end of the sysdump
instead). In the end, considering this is already a fallback plan for
when customers don't collect Prometheus metrics, I think the current,
simpler approach is good enough.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 26, 2023
@dylandreimerink dylandreimerink merged commit c1803ba into cilium:main Oct 26, 2023
62 checks passed
@pchaigno pchaigno deleted the collect-xfrm-stats-twice branch October 26, 2023 10:08
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
9 tasks
@pippolo84 pippolo84 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.4 Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
6 tasks
@pippolo84 pippolo84 added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Oct 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.9 Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 31, 2023
4 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.16 Oct 31, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.12 in 1.12.16 Nov 2, 2023
@pippolo84 pippolo84 added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Nov 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport done to v1.12 in 1.12.16 Nov 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.9 Nov 2, 2023
@jibi jibi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.4 Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bugtool Impacts gathering of data for debugging purposes. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.12.16
Backport done to v1.12
1.13.9
Backport done to v1.13
1.14.4
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

Collect /proc/net/xfrm_stat twice in bugtool
5 participants