New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deletion of tunnel map entries when node has non-zero cluster ID. #27353
Fix deletion of tunnel map entries when node has non-zero cluster ID. #27353
Conversation
Given that tunnel map entries are currently inserted without specifying the cluster ID, we should do the same during the removal phase. Otherwise, we fail to remove the entries if the node is associated with a non-zero cluster ID. Let's additionally update the test so that it would catch future regressions. Fixes: cda8767 ("bpf: Make tunnel map APIs aware of ClusterID") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we didn't see this in the logs earlier. From my reading, under the hood, the deleteTunnelMapping()
invokes BPF_MAP_DELETE_ELEM
which should return an error if the entry doesn't exist.
My guess is that it is because we don't remove nodes during the clustermesh tests, and in all the other cases we are likely not setting a ClusterID different from zero (as for instance in the unit test that I modified). I had tested this bug in a local kind environment, and indeed the error log was output after deleting a CiliumNode resource. |
👍 Do you see any risk with changing the approach to propagating the CID to the tunnel map when adding/deleting instead of passing it as zero? |
My understanding is that the datapath is currently doing the lookups using CID=0, so that would not work. |
Given that tunnel mapping entries are currently inserted without specifying the cluster ID, we should do the same during the removal phase. Otherwise, we fail to remove the entries if the node is associated with a non-zero cluster ID. Let's additionally update the test so that it would catch future regressions.