New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that the iptables masquerade rule is always installed #808

Merged
merged 1 commit into from Sep 22, 2017

Conversation

Projects
None yet
3 participants
@julia-stripe
Contributor

julia-stripe commented Sep 7, 2017

We ran into an issue where sometimes the iptables masquerade rules that flannel creates get deleted. Obviously ideally this wouldn't happen, but if it does happen it's really bad -- containers can't send packets to any hosts outside the cluster.

This PR makes flannel automatically recover from this situation by recreating the iptables rules, so that the cluster will remain healthy:

  • added a goroutine that automatically recreates the masquerade rules every second
  • added an alternate version of AppendUnique which returns whether or not the rule was appended. This allows flannel to only log when a rule was actually successfully added, instead of always logging (which, now that we're retrying every second, would add too much noise to the logs)

We're running this in our production cluster now

Show outdated Hide outdated main.go
@tomdee

This comment has been minimized.

Show comment
Hide comment
@tomdee

tomdee Sep 9, 2017

Member

Hi @julia-stripe, thanks for another well put together PR!

I see one key problem with this PR and then a couple of minor concerns (below) - the main concern is that if a single rule (of the four that flannel installs) is removed, then flannel just restores the one missing rule, which means it might end up being out of order! Since the ordering of the rules matters, I think this defeats the purpose of this PR a bit.

e.g. if I remove the first flannel rules iptables -D postrouting -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j RETURN then I can see your code restores it, but it restores it last in the list which means it won't get hit.

I'd be interested to hear if you have any thoughts on automated testing in this area? I know there aren't any existing tests but it would be great to have some.

Otherwise, the code looks good to me, though I'd like to leave it up for at least a few days before merging to give others a chance to comment.

The slight areas of concern I'd have about this are

  1. The additional IPTables churn that this could create. But I'm not too concerned since this should just be 4 attempted appends (one for each rule) per second. I assume that this is a constant operation and isn't going to cause a massive increased load for people with huge sets of iptables rules.
  2. The potential for multiple processes to "fight" over keeping these rules updated. If there is another process which is making changes to the NAT table then potentially that could cause additional churn.

@fasaxc I'd love to hear your thoughts on this.

Member

tomdee commented Sep 9, 2017

Hi @julia-stripe, thanks for another well put together PR!

I see one key problem with this PR and then a couple of minor concerns (below) - the main concern is that if a single rule (of the four that flannel installs) is removed, then flannel just restores the one missing rule, which means it might end up being out of order! Since the ordering of the rules matters, I think this defeats the purpose of this PR a bit.

e.g. if I remove the first flannel rules iptables -D postrouting -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j RETURN then I can see your code restores it, but it restores it last in the list which means it won't get hit.

I'd be interested to hear if you have any thoughts on automated testing in this area? I know there aren't any existing tests but it would be great to have some.

Otherwise, the code looks good to me, though I'd like to leave it up for at least a few days before merging to give others a chance to comment.

The slight areas of concern I'd have about this are

  1. The additional IPTables churn that this could create. But I'm not too concerned since this should just be 4 attempted appends (one for each rule) per second. I assume that this is a constant operation and isn't going to cause a massive increased load for people with huge sets of iptables rules.
  2. The potential for multiple processes to "fight" over keeping these rules updated. If there is another process which is making changes to the NAT table then potentially that could cause additional churn.

@fasaxc I'd love to hear your thoughts on this.

@fasaxc

This comment has been minimized.

Show comment
Hide comment
@fasaxc

fasaxc Sep 12, 2017

On (1), looks like the code checks for the existence of the rule first, so it's 4 reads per second. That could be moderately expensive if there are a lot of rules (since most iptables operations do a full table load from the kernel to userspace, even to check one rule).

On (2), as long as the other process isn't deleting these rules actively then we should be OK (and if it was actively trying to delete those rules then there's not much we can do.

Longer term, you may want to migrate to having only a single rule inserted into the root chain and then jumping fro there to a flannel-owned chain. That's the approach that we have in Calico; it reduces cloberring since there's only one rule and it avoids reordering issues. (And, since we own the calico chain, we can just flush and rewrite the whole chain at will if anyone ever tampers with it.)

fasaxc commented Sep 12, 2017

On (1), looks like the code checks for the existence of the rule first, so it's 4 reads per second. That could be moderately expensive if there are a lot of rules (since most iptables operations do a full table load from the kernel to userspace, even to check one rule).

On (2), as long as the other process isn't deleting these rules actively then we should be OK (and if it was actively trying to delete those rules then there's not much we can do.

Longer term, you may want to migrate to having only a single rule inserted into the root chain and then jumping fro there to a flannel-owned chain. That's the approach that we have in Calico; it reduces cloberring since there's only one rule and it avoids reordering issues. (And, since we own the calico chain, we can just flush and rewrite the whole chain at will if anyone ever tampers with it.)

@julia-stripe

This comment has been minimized.

Show comment
Hide comment
@julia-stripe

julia-stripe Sep 13, 2017

Contributor

I see one key problem with this PR and then a couple of minor concerns (below) - the main concern is that if a single rule (of the four that flannel installs) is removed, then flannel just restores the one missing rule, which means it might end up being out of order! Since the ordering of the rules matters, I think this defeats the purpose of this PR a bit.

this is a great point. Changed the PR so that it instead:

  1. checks if any rules are missing
  2. if so, deletes all the rules and recreates them in order.

I also added a mock IPTables struct & some tests to give us a little more confidence that this code actually works as intended.

This should be reviewed commit-by-commit -- the first commit just adds tests, and the second actually changes functionality.

Contributor

julia-stripe commented Sep 13, 2017

I see one key problem with this PR and then a couple of minor concerns (below) - the main concern is that if a single rule (of the four that flannel installs) is removed, then flannel just restores the one missing rule, which means it might end up being out of order! Since the ordering of the rules matters, I think this defeats the purpose of this PR a bit.

this is a great point. Changed the PR so that it instead:

  1. checks if any rules are missing
  2. if so, deletes all the rules and recreates them in order.

I also added a mock IPTables struct & some tests to give us a little more confidence that this code actually works as intended.

This should be reviewed commit-by-commit -- the first commit just adds tests, and the second actually changes functionality.

@tomdee

This comment has been minimized.

Show comment
Hide comment
@tomdee

tomdee Sep 18, 2017

Member

@fasaxc thanks for the comments. I agree that longer term, a single jump to a flannel owned chain would be better.

@julia-stripe That's looking much better thanks. This code doesn't actually delete the iptables rules when flannel stops, but after a bit of testing I see that the old code didn't either (ooops). I also think it's desirable that the rules aren't cleaned up (so the dataplane keeps working during an upgrade). It's up to you if you want to clean up the code or just leave it as is. I think in future it would be good to have a "--cleanup" command which which removes all traces of flannel from a system (removes the subnet.env file, any iptables rules, any flannel routes and flannel created devices).

Could you squash the two commits then I can merge it?

Member

tomdee commented Sep 18, 2017

@fasaxc thanks for the comments. I agree that longer term, a single jump to a flannel owned chain would be better.

@julia-stripe That's looking much better thanks. This code doesn't actually delete the iptables rules when flannel stops, but after a bit of testing I see that the old code didn't either (ooops). I also think it's desirable that the rules aren't cleaned up (so the dataplane keeps working during an upgrade). It's up to you if you want to clean up the code or just leave it as is. I think in future it would be good to have a "--cleanup" command which which removes all traces of flannel from a system (removes the subnet.env file, any iptables rules, any flannel routes and flannel created devices).

Could you squash the two commits then I can merge it?

@julia-stripe

This comment has been minimized.

Show comment
Hide comment
@julia-stripe

julia-stripe Sep 19, 2017

Contributor

Squashed.

I also think it's desirable that the rules aren't cleaned up (so the dataplane keeps working during an upgrade).

This makes sense to me.

Contributor

julia-stripe commented Sep 19, 2017

Squashed.

I also think it's desirable that the rules aren't cleaned up (so the dataplane keeps working during an upgrade).

This makes sense to me.

@tomdee

tomdee approved these changes Sep 22, 2017

@tomdee tomdee merged commit 5787c1f into coreos:master Sep 22, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment