Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neigh: Clean up stale/untracked non-GC'ed neighbors #17918

Merged
merged 5 commits into from
Nov 18, 2021
Merged

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Nov 17, 2021

See commit msg. We might need custom backport for <=1.10.

Fixes: #17905

Since it's not only about ARP anymore. Also remove 'neigh' abbreviation
while at it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The NodeNeighDiscoveryEnabled() is called after we completed a resync of all
nodes with kubeapi-server. (See initRestore() -> SyncWithK8sFinished() and the
agent's wait in runDaemon() for restoreComplete channel to finish in non-dry
mode.)

Given that, neighLastPingByNextHop map is populated at that time, so when doing
migration of neighbor entries in NodeCleanNeighbors() we can remove unrelevant
ones at the same time to not let garbage neighbor entries pile up in the neighbor
hash table and/or avoid that the kernel needs to do periodic work for them in
case of NTF_EXT_MANAGED ones.

neighLastPingByNextHop holds both the v4 and v6 entries as a string for the key,
so it's enough to just check it there.

Fixes: #17905
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann added kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact. labels Nov 17, 2021
@borkmann borkmann requested review from brb, aditighag and a team November 17, 2021 13:05
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🚀
For older cilium versions, I suppose we would need to backport changes that populate the neighNextHopByNode{4,6} map along with invoking NodeCleanNeighbors from syncWithK8sfinished?

@borkmann
Copy link
Member Author

borkmann commented Nov 17, 2021

I suppose we would need to backport changes that populate the neighNextHopByNode{4,6} map along with invoking NodeCleanNeighbors from syncWithK8sfinished?

Yeah, true, we need some custom one given this depends on earlier changes (v6 not however, only v4).

@aditighag
Copy link
Member

aditighag commented Nov 17, 2021

Fixes: #17905

Do we have a tracking issue for the backports?

Filed - #17923.

Add a variant which does not depend on the config dir but where we can pass
a link directly to it. This is needed for the runtime test given subsequent
calls to NodeCleanNeighbors() won't have any effect as the link's config file
gets removed upon first successful cleanup. Needed for upcoming runtime tests.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a test case which simulates agent resync where the agent cleans up stale
neighbor entries that we no longer track due to nodes not being in cluster.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similar as we do for IPv6, add a test case which simulates agent resync where
the agent cleans up stale neighbor entries that we no longer track due to nodes
not being in cluster. Make sure that we also have IPv4 entries covered.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member Author

/test

@borkmann borkmann merged commit 4f63603 into master Nov 18, 2021
@borkmann borkmann deleted the pr/neigh-cleaner branch November 18, 2021 13:29
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.6 Nov 23, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.6 Nov 23, 2021
@gandro
Copy link
Member

gandro commented Nov 23, 2021

Manual backport pending in #17974

(I almost started a backport for this as part of my tophat duty)

@joestringer joestringer added this to Backport pending to v1.10 in 1.10.7 Dec 10, 2021
@joestringer joestringer removed this from Backport pending to v1.10 in 1.10.6 Dec 10, 2021
@joestringer joestringer added this to Backport pending to v1.10 in 1.10.8 Jan 18, 2022
@joestringer joestringer removed this from Backport pending to v1.10 in 1.10.7 Jan 18, 2022
@joestringer
Copy link
Member

@borkmann @gandro it looks like this was never backported to v1.10. Dropping from the project.

@joestringer joestringer removed this from Backport pending to v1.10 in 1.10.8 Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Neighbor discovery handler leaves stale ARP entries in certain cases
6 participants