New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downgrade L2 Neighbor Discovery failure log to Debug #31179
Merged
YutaroHayakawa
merged 1 commit into
cilium:main
from
YutaroHayakawa:neighbor-discovery-debug
Mar 7, 2024
Merged
Downgrade L2 Neighbor Discovery failure log to Debug #31179
YutaroHayakawa
merged 1 commit into
cilium:main
from
YutaroHayakawa:neighbor-discovery-debug
Mar 7, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
maintainer-s-little-helper
bot
added
the
dont-merge/needs-release-note-label
The author needs to describe the release impact of these changes.
label
Mar 6, 2024
YutaroHayakawa
added
the
release-note/misc
This PR makes changes that have no direct user impact.
label
Mar 6, 2024
maintainer-s-little-helper
bot
removed
the
dont-merge/needs-release-note-label
The author needs to describe the release impact of these changes.
label
Mar 6, 2024
YutaroHayakawa
added
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
labels
Mar 6, 2024
/test |
Suppress the "Unable to determine next hop address" logs. While it shows the L2 neighbor resolution failure, it does not always indicate a datapath connectivity issue. For example, when "devices=eth+" is specified and the device naming/purposing is not consistent across the nodes in the cluster, in some nodes "eth1" is a device reachable to other nodes, but in some nodes, it is not. As a result, L2 Discovery generates an "Unable to determine next hop address". Another example is ENI mode with automatic device detection. When secondary interfaces are added, they are used for L2 Neighbor Discovery as well. However, nodes can only be reached via the primary interface through the default route in the main routing table. Thus, L2 Neighbor Discovery generates "Unable to determine next hop address" for secondary interfaces. In both cases, it does not always mean the datapath has an issue for KPR functionality. However, the logs appear repeatedly, are noisy, and the message is error-ish, causing confusion. This log started to appear for some users who did not see it before from v1.14.3 (cilium#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR + Tunnel users because of f2dcc86. Before the commit, we did not perform L2 Neighbor Discovery in tunnel mode, so even if users had an interface unreachable to other nodes, the log did not appear. For v1.15.0, it affects to the users who used to have the unreachable interface. 2058ed6 made it visible. Before the commit, some kind of the errors like EHOSTUNREACH and ENETUNREACH were not caught because FIBMatch option didn't specified. After v1.15.0, users started to see the log. Fixes: cilium#28858 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
force-pushed
the
neighbor-discovery-debug
branch
from
March 6, 2024 06:56
8e45d3a
to
3d9c6c9
Compare
aspsk
approved these changes
Mar 6, 2024
/test |
nickolaev
approved these changes
Mar 6, 2024
Conformance Ginkgo: #31040 |
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Mar 7, 2024
jibi
added
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
and removed
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
labels
Mar 12, 2024
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.14
in 1.14.8
Mar 12, 2024
jibi
added
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
and removed
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
labels
Mar 12, 2024
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.15
in 1.15.2
Mar 12, 2024
jibi
added
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
and removed
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
labels
Mar 13, 2024
maintainer-s-little-helper
bot
moved this from Backport pending to v1.15
to Needs backport from main
in 1.15.2
Mar 13, 2024
jibi
added
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
and removed
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
labels
Mar 13, 2024
maintainer-s-little-helper
bot
moved this from Backport pending to v1.14
to Needs backport from main
in 1.14.8
Mar 13, 2024
jibi
added
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
and removed
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
labels
Mar 13, 2024
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.14
in 1.14.8
Mar 13, 2024
jibi
added
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
and removed
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
labels
Mar 13, 2024
github-actions
bot
added
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
backport-done/1.15
The backport for Cilium 1.15.x for this PR is done.
and removed
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
labels
Mar 16, 2024
jrajahalme
moved this from Needs backport from main
to Backport done to v1.15
in 1.15.3
Mar 26, 2024
jrajahalme
moved this from Backport pending to v1.14
to Backport done to v1.14
in 1.14.9
Mar 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
backport-done/1.15
The backport for Cilium 1.15.x for this PR is done.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Suppress the "Unable to determine next hop address" logs. While it shows the L2 neighbor resolution failure, it does not always indicate a datapath connectivity issue.
For example, when "devices=eth+" is specified and the device naming/purposing is not consistent across the nodes in the cluster, in some nodes "eth1" is a device reachable to other nodes, but in some nodes, it is not. As a result, L2 Discovery generates an "Unable to determine next hop address".
Another example is ENI mode with automatic device detection. When secondary interfaces are added, they are used for L2 Neighbor Discovery as well. However, nodes can only be reached via the primary interface through the default route in the main routing table. Thus, L2 Neighbor Discovery generates "Unable to determine next hop address" for secondary interfaces.
In both cases, it does not always mean the datapath has an issue for KPR functionality. However, the logs appear repeatedly, are noisy, and the message is error-ish, causing confusion.
This log started to appear for some users who did not see it before from v1.14.3 (#28858) and v1.15.0 (in theory). For v1.14.3, it affects KPR + Tunnel users because of f2dcc86. Before the commit, we did not perform L2 Neighbor Discovery in tunnel mode, so even if users had an interface unreachable to other nodes, the log did not appear.
For v1.15.0, it affects to the users who used to have the unreachable interface. 2058ed6 made it visible. Before the commit, some kind of the errors like EHOSTUNREACH and ENETUNREACH were not caught because FIBMatch option didn't specified. After v1.15.0, users started to see the log.
Fixes: #28858