New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.7 backports 2020-06-11 #12031
Merged
Merged
v1.7 backports 2020-06-11 #12031
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 2ed8851 ] Include Envoy-supplied error detail in NACK warning logs to facilitate debugging. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit b2c73e7 ] When Reinitialize()-ing the datapath, transient iptables rules are set up to avoid dropping packets while Cilium's rules are not in place. In rare occasions, a failure to add those rules has been observed (see issue #11276), leading to an early exit from Reinitialize() and a failure to set up the daemon. But those transient rules are just used to lend a hand and keep packets going for a very small window of time: it does not actually matter much if we fail to install them, and it should not stop the reinitializing of the daemon. Let's simply log a warning and carry on if we fail to add those rules. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 66aa41d ] We do the same thing for IPv4 and IPv6, with just the name of the program changing. Then we do nearly the same thing for flushing and deleting a chain. Let's refactor (no functional change). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 052fa31 ] When Cilium reinitialises its daemon, transient iptables rules are set up to avoid dropping packets while the regular rules are not in place. On rare occasions, setting up those transient rules has been found to fail, for an unknown reason (see issue #11276). The error message states that the "Chain already exists", even though we try to flush and remove any leftover from previous transient rules before adding the new ones. It sounds likely that removing the leftovers is failing, but we were not able to understand why, because we quieten the function to avoid spurious warnings the first time we try to remove them (since none is existing). It would be helpful to get more information to understand what happens in those rare occasions where setting up transient rules fails. Let's find a way to get more logs, without making too much noise. We cannot warn unconditionally in remove() since we want removal in the normal case to remain quiet. What we can do is logging when the "quiet" flag is not passed, _or_ when the error is different from the chain being not found, i.e. when the error is different from the one we want to silence on start up. This means matching on the error message returned by ip(6)tables. It looks fragile, but at least this message has not changed since 2009, so it should be relatively stable and pretty much the same on all supported systems. Since remove() is used for chains other than for transient rules too, we also match on chain name to make sure we are dealing with transient rules if ignoring the "quiet" flag. This additional logging could be removed once we reproduce and fix the issue. Alternative approaches could be: - Uncoupling the remove() function for transient rules and regular rules, to avoid matching on chain name (but it sounds worse). - Logging on failure for all rules even when the "quiet" flag is passed, but on "info" level instead of "warning". This would still require a modified version of runProg(), with also a modified version of CombinedOutput() in package "exec". Here I chose to limit the number of logs and keep the changes local. - Listing the chain first before trying to remove it, so we only try to remove if it exists, but this would likely add unnecessary complexity and latency. Should help with (but does not solve): #11276 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit e78405a ] [ Backporter's notes: Minor conflicts against PR #10639 and CI failure log messages. ] In an attempt to catch #11276 in CI, let's add any message related to a failure to flush or delete the chain related to iptables transient rules to the list of badLogMessages we want to catch. We need to filter on the name of the chain for transient rules to avoid false positives, which requires exporting that name. We also need to modify the log message error, to avoid adding four disting logs to the list (combinations for iptables/ip6tables, flush/delete). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 477c487 ] The backport PRs shouldn't add the label requires-janitor-review since the review is automatically requested by the CODEOWNERS file. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
maintainer-s-little-helper
bot
added
backport/1.7
kind/backports
This PR provides functionality previously merged into master.
labels
Jun 11, 2020
joestringer
added
backport/1.7
kind/backports
This PR provides functionality previously merged into master.
labels
Jun 11, 2020
test-backport-1.7 |
qmonnet
approved these changes
Jun 12, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- iptables: carry on and log on failure to set up transient rules #12006 -- iptables: carry on and log on failure to set up transient rules (@qmonnet)
- Minor conflicts because PR daemon: add option to disable some feeder tables #10639 was not present on v1.7 branch.
- Also conflict in the messages we fail out the CI on. I only added the specific message that the PR adds, not all of the others in latest branch.
Looks good to me, thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
specific message that the PR adds, not all of the others in latest branch.
Once this PR is merged, you can update the PR labels via: