Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-neigh: Reduce arping related log msg's level #15261

Merged
merged 1 commit into from Apr 9, 2021

Conversation

brb
Copy link
Member

@brb brb commented Mar 9, 2021

It is not the end of the world if any arping related operation fails (e.g. frequent connections between nodes ensure the presence of relevant L2 entries in the neigh table). So, decrease the log level of the log msgs.

@brb brb added area/daemon Impacts operation of the Cilium daemon. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.8 labels Mar 9, 2021
@brb brb requested review from a team and jrfastab March 9, 2021 09:39
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.5 Mar 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.8 Mar 9, 2021
@brb brb force-pushed the pr/brb/reduce-arping-log-level branch from 82ea199 to 6aaa80e Compare March 9, 2021 09:39
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine reducing the log msg to level "Info" but if we are printing a "Info" message with the word "Fail" in it it might be confusing to users. If they are not that important maybe considering setting them as debug or add a new metric?

@brb
Copy link
Member Author

brb commented Mar 9, 2021

If they are not that important maybe considering setting them as debug or add a new metric?

@aanm I still want to see what exactly failed, so keeping them as log msg in info lvl instead of adding a metric.

@aanm
Copy link
Member

aanm commented Mar 9, 2021

If they are not that important maybe considering setting them as debug or add a new metric?

@aanm I still want to see what exactly failed, so keeping them as log msg in info lvl instead of adding a metric.

@brb you, a developer, yes, but users will see this and find it confusing.

@brb
Copy link
Member Author

brb commented Mar 9, 2021

@aanm Recently we had some big changes in the arp handling code. So I'd like to observe any possible discrepancies at least for a while. Later on, we can mute the messages, and expose errors via metrics.

@aanm
Copy link
Member

aanm commented Mar 9, 2021

@aanm Recently we had some big changes in the arp handling code. So I'd like to observe any possible discrepancies at least for a while. Later on, we can mute the messages, and expose errors via metrics.

Then let's prefix [DEBUG] in these messages

@joestringer joestringer added this to Needs backport from master in 1.8.9 Mar 9, 2021
@joestringer joestringer removed this from Needs backport from master in 1.8.8 Mar 9, 2021
@christarazi christarazi added this to Needs backport from master in 1.9.6 Mar 10, 2021
@christarazi christarazi removed this from Needs backport from master in 1.9.5 Mar 10, 2021
@brb brb force-pushed the pr/brb/reduce-arping-log-level branch from 6aaa80e to 5240b52 Compare March 15, 2021 08:20
@brb
Copy link
Member Author

brb commented Mar 15, 2021

Then let's prefix [DEBUG] in these messages

@aanm I've changed the logging subsystem to node-neigh-debug. PTAL.

Copy link
Contributor

@jrfastab jrfastab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could change "Failed" -> "Unable" then those messages would read less severe, e.g. "Failed to remove neighbor entry" becomes "Unable to remove neighbor entry".

@@ -695,7 +695,7 @@ func (n *linuxNodeHandler) insertNeighbor(ctx context.Context, newNode *nodeType
logfields.IPAddr: neigh.IP,
logfields.HardwareAddr: neigh.HardwareAddr,
logfields.LinkIndex: neigh.LinkIndex,
}).WithError(err).Warn("Failed to remove neighbor entry")
}).WithError(err).Info("Failed to remove neighbor entry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would this happen and not be an error? Is it possible for the entry to be removed async to this operation so it does not exist or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the case when a network operator removes an entry manually.

@jrfastab
Copy link
Contributor

Beyond the scope of this patch, but I took look at the callers of insertNeighbor and shouldn't we return an error here and kick the retry logic? Otherwise we are waiting for the refresh logic to kick in from neighbor-table-refresh? I would expect a failed arp could be retried almost immediately and then backed off from there if it keeps failing.

@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Mar 29, 2021
@brb
Copy link
Member Author

brb commented Apr 9, 2021

Beyond the scope of this patch, but I took look at the callers of insertNeighbor and shouldn't we return an error here and kick the retry logic

@jrfastab I've added a retry logic into both arping libraries we use. They will retry 3 times. If all fail, then it is up the periodic refresh to fix any outstanding issues.

@brb brb force-pushed the pr/brb/reduce-arping-log-level branch from 204fe6c to eca1bed Compare April 9, 2021 07:12
@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 9, 2021
@pchaigno pchaigno merged commit 2de65d0 into master Apr 9, 2021
1.10.0 automation moved this from In progress to Done Apr 9, 2021
@pchaigno pchaigno deleted the pr/brb/reduce-arping-log-level branch April 9, 2021 08:51
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.9 Apr 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.6 Apr 9, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.6 Apr 14, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.9 Apr 17, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.9 Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.8.9
Backport done to v1.8
1.9.6
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

None yet

6 participants