Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsec: Safely delete Xfrm state #32450

Merged

Commits on May 10, 2024

  1. ipsec: Minor refactor for ipsecDeleteXfrmState

    No functional change, just to make further revision easier.
    
    Signed-off-by: gray <gray.liang@isovalent.com>
    jschwinger233 committed May 10, 2024
    Configuration menu
    Copy the full SHA
    ab12313 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2024

  1. ipsec: Safely delete xfrm state

    This patch introduces a workaround to avoid kernel issue when deleting
    xfrm states.
    
    Let's start from the kernel issue.
    
    After installing two xfrm states on the same host using below commands
    (please note the differences on the mark, mask, src):
    
    ```
    ip x s a src 10.244.1.43 dst 10.244.3.114 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x2a450d00 mask 0xffff0f00 output-mark 0xd00 mask 0xffffff00 aead 'rfc4106(gcm(aes))' 0x42a89014074c49243219a20cc87cadd7b9c0d7d1 128 sel src 0.0.0.0/0 dst 0.0.0.0/0
    ip x s a src 0.0.0.0 dst 10.244.3.114 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0xd00 mask 0xf00 output-mark 0xd00 mask 0xffffff00 aead 'rfc4106(gcm(aes))' 0x42a89014074c49243219a20cc87cadd7b9c0d7d1 128 sel src 0.0.0.0/0 dst 0.0.0.0/0
    ```
    
    When trying to delete the first xfrm state using the following command,
    Linux kernel will instead remove the second xfrm state but keep the
    first:
    
    ```
    ip x s d src 10.244.1.43 dst 10.244.3.114 proto esp spi 0x00000003 mark 0x2a450d00 mask 0xffff0f00
    ```
    
    This causes troubles for cilium upgrade.
    
    A real world scenario for cilium upgrade from 1.13.12 to 1.13.14 could
    be like:
    1. Before upgrade, the node has "old-style" xfrm state to catch mark
       "0xd00/0xf00" for ingress traffic; old bpf programs also set "0xd00"
       mark to ingress skbs;
    2. Upgrade begins, bpf programs are reloaded to new version, thereafter
       ingress skbs are marked with "0xXXXX0d00";
    3. After a short while, cilium-agent installs new xfrm states to catch
       traffic with specific mark "0xXXXX0d00";
    
    During window between step 2 and 3, cilium relies on "old-style" xfrm
    states "0xd00/0xf00" to catch traffic with specific mark "0xXXXX0d00".
    
    So far so good.
    
    However, in a large scale cluster it's inevitable to receive
    NodeDeletion events during upgrade due to node churn. Once seeing a
    NodeDeletion event, cilium-agent will remove the xfrm state for that
    gone-away remote node.
    
    Now we hit the aforementioned kernel issue: cilium-agent tries to delete
    the xfrm state catching more specific mark, but kernel wrongly removes
    the one catching general mark.
    
    This causing traffic disruption until upgrade completes with all new xfrm
    states installed.
    
    This patch provides an elegant solution at low cost: if cilium-agent
    wants to remove a xfrm state catching specific mark, it has to
    temporarily remove the xfrm state catching general mark first and add it
    back after:
    1. Temporarily remove the xfrm states catching the general mark;
    2. Remove the xfrm state we really care abot;
    3. Add back the temporaily removed one on step 1;
    
    Indeed there will be a small window between temporary removing and
    adding back, but our past test shows the window lasts 200-900µs only, so short
    that we shoudn't see many drops.
    
    Suggested-by: Julian Wiedmann <jwi@isovalent.com>
    Signed-off-by: gray <gray.liang@isovalent.com>
    jschwinger233 committed May 15, 2024
    Configuration menu
    Copy the full SHA
    af478a8 View commit details
    Browse the repository at this point in the history