Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/ipsec: Extend troubleshooting for long key rotations #26809

Merged
merged 1 commit into from
Jul 31, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 13 additions & 3 deletions Documentation/security/network/encryption-ipsec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,15 +211,25 @@ Troubleshooting

* All XFRM errors correspond to a packet drop in the kernel. Except for
``XfrmFwdHdrError`` and ``XfrmInError``, all XFRM errors indicate a bug in
Cilium or an operational mistake. ``XfrmOutStateSeqError`` and
``XfrmInStateProtoError`` may be caused by operational mistakes, as detailed
in the following points.
Cilium or an operational mistake. ``XfrmOutStateSeqError``,
``XfrmInStateProtoError``, and ``XfrmInNoStates`` may be caused by
operational mistakes, as detailed in the following points.

* If the sequence number reaches its maximum value for any XFRM OUT state, it
will result in packet drops and XFRM errors of type
``XfrmOutStateSeqError``. A key rotation resets all sequence numbers.
Rotate keys frequently to avoid this issue.

* After a key rotation, if the old key is cleaned up before the
configuration of the new key is installed on all nodes, it results in
``XfrmInNoStates`` errors. The old key is removed from nodes after a default
interval of 5 minutes by default. By default, all agents watch for key
updates and update their configuration within 1 minute after the key is
changed, leaving plenty of time before the old key is removed. If you expect
the key rotation to take longer for some reason (for example, in the case of
Cluster Mesh if several clusters need to be updated), you can increase the
delay before cleanup with agent flag ``ipsec-key-rotation-duration``.

* ``XfrmInStateProtoError`` errors can happen if the key is updated without
incrementing the SPI (also called ``KEYID`` in :ref:`ipsec_key_rotation`
instructions above). It can be fixed by performing a new key rotation,
Expand Down