Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugtool: Collect XFRM error counters twice #28790

Merged
merged 1 commit into from
Oct 26, 2023

Commits on Oct 25, 2023

  1. bugtool: Collect XFRM error counters twice

    This commit changes the bugtool report to collect the XFRM error
    counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will
    collect at the beginning and end of the bugtool collection. In that way,
    there will be around 5-6 seconds between the two collections and we may
    see if any counter is currently increasing.
    
        $ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
        5c5
        < XfrmInStateProtoError   	4
        ---
        > XfrmInStateProtoError   	6
    
    In this example, we can easily see that the XfrmInStateProtoError is
    increasing. That suggests a key rotation issue is currently ongoing (cf.
    IPsec troubleshooting docs).
    
    I tried other approaches to collect over a longer timespan. That may
    allow us to see slower increases. They all end up being more complex or
    messier (we'd need to collect at beginning and end of the sysdump
    instead). In the end, considering this is already a fallback plan for
    when customers don't collect Prometheus metrics, I think the current,
    simpler approach is good enough.
    
    Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
    pchaigno committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    2c1a62f View commit details
    Browse the repository at this point in the history