-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
neigh: Clean up stale/untracked non-GC'ed neighbors #17918
Conversation
Since it's not only about ARP anymore. Also remove 'neigh' abbreviation while at it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The NodeNeighDiscoveryEnabled() is called after we completed a resync of all nodes with kubeapi-server. (See initRestore() -> SyncWithK8sFinished() and the agent's wait in runDaemon() for restoreComplete channel to finish in non-dry mode.) Given that, neighLastPingByNextHop map is populated at that time, so when doing migration of neighbor entries in NodeCleanNeighbors() we can remove unrelevant ones at the same time to not let garbage neighbor entries pile up in the neighbor hash table and/or avoid that the kernel needs to do periodic work for them in case of NTF_EXT_MANAGED ones. neighLastPingByNextHop holds both the v4 and v6 entries as a string for the key, so it's enough to just check it there. Fixes: #17905 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 🚀
For older cilium versions, I suppose we would need to backport changes that populate the neighNextHopByNode{4,6}
map along with invoking NodeCleanNeighbors
from syncWithK8sfinished
?
Yeah, true, we need some custom one given this depends on earlier changes (v6 not however, only v4). |
Add a variant which does not depend on the config dir but where we can pass a link directly to it. This is needed for the runtime test given subsequent calls to NodeCleanNeighbors() won't have any effect as the link's config file gets removed upon first successful cleanup. Needed for upcoming runtime tests. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a test case which simulates agent resync where the agent cleans up stale neighbor entries that we no longer track due to nodes not being in cluster. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similar as we do for IPv6, add a test case which simulates agent resync where the agent cleans up stale neighbor entries that we no longer track due to nodes not being in cluster. Make sure that we also have IPv4 entries covered. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
/test |
Manual backport pending in #17974 (I almost started a backport for this as part of my tophat duty) |
See commit msg. We might need custom backport for <=1.10.
Fixes: #17905