Skip to content

Commit

Permalink
iptables: log unexpected failures to delete transient rules
Browse files Browse the repository at this point in the history
[ upstream commit 052fa31 ]

When Cilium reinitialises its daemon, transient iptables rules are set
up to avoid dropping packets while the regular rules are not in place.

On rare occasions, setting up those transient rules has been found to
fail, for an unknown reason (see issue #11276). The error message states
that the "Chain already exists", even though we try to flush and remove
any leftover from previous transient rules before adding the new ones.
It sounds likely that removing the leftovers is failing, but we were not
able to understand why, because we quieten the function to avoid
spurious warnings the first time we try to remove them (since none is
existing).

It would be helpful to get more information to understand what happens
in those rare occasions where setting up transient rules fails. Let's
find a way to get more logs, without making too much noise.

We cannot warn unconditionally in remove() since we want removal in the
normal case to remain quiet. What we can do is logging when the "quiet"
flag is not passed, _or_ when the error is different from the chain
being not found, i.e. when the error is different from the one we want
to silence on start up. This means matching on the error message
returned by ip(6)tables. It looks fragile, but at least this message has
not changed since 2009, so it should be relatively stable and pretty
much the same on all supported systems. Since remove() is used for
chains other than for transient rules too, we also match on chain name
to make sure we are dealing with transient rules if ignoring the "quiet"
flag.

This additional logging could be removed once we reproduce and fix the
issue.

Alternative approaches could be:

- Uncoupling the remove() function for transient rules and regular
  rules, to avoid matching on chain name (but it sounds worse).
- Logging on failure for all rules even when the "quiet" flag is passed,
  but on "info" level instead of "warning". This would still require a
  modified version of runProg(), with also a modified version of
  CombinedOutput() in package "exec". Here I chose to limit the number
  of logs and keep the changes local.
- Listing the chain first before trying to remove it, so we only try to
  remove if it exists, but this would likely add unnecessary complexity
  and latency.

Should help with (but does not solve): #11276
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
  • Loading branch information
qmonnet authored and aanm committed Jun 11, 2020
1 parent c9aed20 commit 39805f7
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions pkg/datapath/iptables/iptables.go
Expand Up @@ -86,8 +86,13 @@ func getVersion(prog string) (go_version.Version, error) {
return versioncheck.Version(vString[1])
}

func runProgCombinedOutput(prog string, args []string, quiet bool) ([]byte, error) {
out, err := exec.WithTimeout(defaults.ExecTimeout, prog, args...).CombinedOutput(log, !quiet)
return out, err
}

func runProg(prog string, args []string, quiet bool) error {
_, err := exec.WithTimeout(defaults.ExecTimeout, prog, args...).CombinedOutput(log, !quiet)
_, err := runProgCombinedOutput(prog, args, quiet)
return err
}

Expand Down Expand Up @@ -205,9 +210,19 @@ func (m *IptablesManager) removeCiliumRules(table, prog, match string) {

func (c *customChain) remove(waitArgs []string, quiet bool) {
doProcess := func(c *customChain, prog string, args []string, operation string, quiet bool) {
err := runProg(prog, args, true)
if err != nil && !quiet {
log.WithError(err).WithField(logfields.Object, args).Warnf("Unable to %s Cilium %s chain", operation, prog)
combinedOutput, err := runProgCombinedOutput(prog, args, true)
if err != nil {
// If the chain is for transient rules and deletion
// fails for a reason other than the chain not being
// present, log the error.
// This is to help debug #11276.
msgChainNotFound := ": No chain/target/match by that name.\n"
debugTransientRules := c.name == ciliumTransientForwardChain &&
string(combinedOutput) != prog+msgChainNotFound
if !quiet || debugTransientRules {
log.Warnf(string(combinedOutput))
log.WithError(err).WithField(logfields.Object, args).Warnf("Unable to %s Cilium %s chain", operation, prog)
}
}
}
doRemove := func(c *customChain, prog string, waitArgs []string, quiet bool) {
Expand Down

0 comments on commit 39805f7

Please sign in to comment.