Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iptables forwarding chain (filter) rules created for a bridge network are ordered incorrectly. #1643

Closed
djlwilder opened this issue Feb 9, 2017 · 4 comments

Comments

@djlwilder
Copy link

djlwilder commented Feb 9, 2017

The iptables forwarding chain (filter) rules created for a docker bridge network are ordered incorrectly. This is causing an elevated kernel si time when scaling to a large number of containers. As an example I created one bridge network in addition the default docker0 bridge. Here is the iptables filter table that was created:

iptables -t filter -L -v -n

Chain INPUT (policy ACCEPT 34M packets, 1792M bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
178K 177M DOCKER-ISOLATION all -- * * 0.0.0.0/0 0.0.0.0/0
118K 174M DOCKER all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
59588 3100K ACCEPT all -- br-538966114ea6 !br-538966114ea6 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- br-538966114ea6 br-538966114ea6 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT 36M packets, 327G bytes)
pkts bytes target prot opt in out source destination

Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
118K 174M ACCEPT tcp -- !br-538966114ea6 br-538966114ea6 0.0.0.0/0 172.18.0.2 tcp dpt:22001
6 1648 ACCEPT tcp -- !br-538966114ea6 br-538966114ea6 0.0.0.0/0 172.18.0.2 tcp dpt:12001
0 0 ACCEPT tcp -- !br-538966114ea6 br-538966114ea6 0.0.0.0/0 172.18.0.3 tcp dpt:22002
0 0 ACCEPT tcp -- !br-538966114ea6 br-538966114ea6 0.0.0.0/0 172.18.0.3 tcp dpt:12002
0 0 ACCEPT tcp -- !br-538966114ea6 br-538966114ea6 0.0.0.0/0 172.18.0.4 tcp dpt:22003

-------------

4 rules are created for each bridge network (including the Docker0 network).
DOCKER all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
ACCEPT all -- br-538966114ea6 !br-538966114ea6 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- br-538966114ea6 br-538966114ea6 0.0.0.0/0 0.0.0.0/0

The DOCKER chain contains one entry for each address/port exposed.
Take the rules one at a time.
ACCEPT all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
This rule accepts all packets belonging to a established connection.

DOCKER all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0
This rule targets the DOCKER chain, where each packet is validated against the list of exported ports. Unfortunately as this rule is listed first no packets are ever passed to the second rule. Therefore every packet is passed to the DOCKER chain. This can be observed with the pkts counts.

If the rules are switched then packets belonging to an established connection will be accepted with out the need to traverse the DOCKER chain. Only packets involved in connection setup should be passed to the DOCKER chain. The idea is that new flows will be validated by the DOCKER chain. Once a connection is established no further validation is needed.

I discovered this problem doing scaling tests. I have 6000 containers spread across 6 bridges each with 2 exposed ports. Therefore my DOCKER chain has 12,000 entries! As I scaled my tests to 3000 flows si time (softirq time) became very large because every packet must passed through the DOCKER chain. Reversing the two rules reduced si time by over 20% !!!

In this example I have switched the two rules, note the change in pkts counts.

iptables -t filter -L -v -n

Chain INPUT (policy ACCEPT 139 packets, 44772 bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1985K 1982M DOCKER-ISOLATION all -- * * 0.0.0.0/0 0.0.0.0/0
1322K 1948M ACCEPT all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
2 152 DOCKER all -- * br-538966114ea6 0.0.0.0/0 0.0.0.0/0
663K 35M ACCEPT all -- br-538966114ea6 !br-538966114ea6 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- br-538966114ea6 br-538966114ea6 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0

@djlwilder
Copy link
Author

$ docker -v
Docker version 1.14.0-dev, build f538c4b

@djlwilder djlwilder reopened this Feb 9, 2017
@aboch
Copy link
Contributor

aboch commented Feb 9, 2017

It could be a dupe of moby/moby#18911
for which a fix was pushed #961

Can you please check if solves also what you are reporting.
Thanks!

@djlwilder
Copy link
Author

Thanks for the comment. After reviewing #961 I agree this is the same issue and should solve my problem. I will give it a good test. I don't find that the change has been merged yet. Am I correct? When do you think this change will be merged?

@aboch
Copy link
Contributor

aboch commented Feb 10, 2017

Thanks.
It is not merged yet.
ping @sanimej @mavenugo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants