New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAProxy does not respond after update to 1409.2.0 #2022

Closed
liquid-sky opened this Issue Jun 24, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@liquid-sky

liquid-sky commented Jun 24, 2017

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1409.2.0
VERSION_ID=1409.2.0
BUILD_ID=2017-06-19-2321
PRETTY_NAME="Container Linux by CoreOS 1409.2.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

AWS t1.small instance HVM

Expected Behavior

Marathon-LB HAProxy should respond on port 9090

Actual Behavior

This is inside Docker container, but same applies when accessing the port from outside.

Marathon-LB was happily running on a host, but today, after update to 1409.2.0 it stopped responding on the port it binds to. curl, telnet or any other request simply hang. The service is running in a PRIVILEGED container with HOST network mode.

root@marathon-lb:/marathon-lb# telnet localhost 9091
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@marathon-lb:/marathon-lb# telnet localhost 9090
Trying ::1...
Trying 127.0.0.1...
^C
root@ip-10-10-5-77:/marathon-lb# netstat -antp | grep 90
tcp        0      0 0.0.0.0:9090            0.0.0.0:*               LISTEN      84/haproxy
tcp        0      0 0.0.0.0:9091            0.0.0.0:*               LISTEN      84/haproxy

Reproduction Steps

  1. Start marathon-lb:latest docker container with following arguments:
  "args": [
    "sse",
    "--group",
    "external",
    "--marathon",
    "http://master.mesos:8080"
  ]

--net="host" --privileged

  1. Try telneting to port 9090 or hitting health endpoint curl localhost:9090/_haproxy_health_check.

Other Information

All the requests hang as if they are being firewalled. All other ports are accessible. Tried changing port to a different number, but it still hangs. Works fine on previous OS versions, happen immediately after update.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 27, 2017

Contributor

I'm able to reproduce this with the following and comparing v1353.8.0 to v1409.5.0

$ docker run -p 2181:2181  --restart always -d zookeeper@sha256:6308fff92245ff7232e90046976d2c17ffb363ae88c0d6208866ae0ab5a4b886

$ docker run --privileged -e MESOS_WORK_DIR=/tmp --net=host -d mesosphere/marathon@sha256:55a0d07ab9182e0908d3256435679eede6158b6a0ac956d048c151ffcd8eee32 --master=local --zk zk://127.0.0.1:2181/marathon

$ docker run -d --net=host --privileged -it -e PORTS=9090 mesosphere/marathon-lb@sha256:563d84e8d1444f68d13f03be48d39ec0eb7d5bbaab0b4c26ba9b905de4009900 sse --group external --marathon http://127.0.0.1:8080

$ sleep 5

$ curl localhost:9090
# Either hangs or errors depending on the version

Notably, iptables rules post-update have the following rules which aren't present on the older version (1353.8.0):

 $ sudo iptables -v -L INPUT
Chain INPUT (policy ACCEPT 1068 packets, 70857 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    2   120 DROP       tcp  --  any    any     anywhere             anywhere             tcp dpt:9090 flags:FIN,SYN,RST,ACK/SYN
    1    60 DROP       tcp  --  any    any     anywhere             anywhere             tcp dpt:9090 flags:FIN,SYN,RST,ACK/SYN
Contributor

euank commented Jun 27, 2017

I'm able to reproduce this with the following and comparing v1353.8.0 to v1409.5.0

$ docker run -p 2181:2181  --restart always -d zookeeper@sha256:6308fff92245ff7232e90046976d2c17ffb363ae88c0d6208866ae0ab5a4b886

$ docker run --privileged -e MESOS_WORK_DIR=/tmp --net=host -d mesosphere/marathon@sha256:55a0d07ab9182e0908d3256435679eede6158b6a0ac956d048c151ffcd8eee32 --master=local --zk zk://127.0.0.1:2181/marathon

$ docker run -d --net=host --privileged -it -e PORTS=9090 mesosphere/marathon-lb@sha256:563d84e8d1444f68d13f03be48d39ec0eb7d5bbaab0b4c26ba9b905de4009900 sse --group external --marathon http://127.0.0.1:8080

$ sleep 5

$ curl localhost:9090
# Either hangs or errors depending on the version

Notably, iptables rules post-update have the following rules which aren't present on the older version (1353.8.0):

 $ sudo iptables -v -L INPUT
Chain INPUT (policy ACCEPT 1068 packets, 70857 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    2   120 DROP       tcp  --  any    any     anywhere             anywhere             tcp dpt:9090 flags:FIN,SYN,RST,ACK/SYN
    1    60 DROP       tcp  --  any    any     anywhere             anywhere             tcp dpt:9090 flags:FIN,SYN,RST,ACK/SYN
@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 27, 2017

Contributor

These rules, I think, are being added by marathon-lb's startup script. See https://github.com/mesosphere/marathon-lb/blob/b950d727be15be1e467fd56c458a015e907d861e/service/haproxy/run#L5-L17

There might be some sort of kernel change/regression causing this.
From here, it should at least be easier to make a simpler repro.

Contributor

euank commented Jun 27, 2017

These rules, I think, are being added by marathon-lb's startup script. See https://github.com/mesosphere/marathon-lb/blob/b950d727be15be1e467fd56c458a015e907d861e/service/haproxy/run#L5-L17

There might be some sort of kernel change/regression causing this.
From here, it should at least be easier to make a simpler repro.

@metral

This comment has been minimized.

Show comment
Hide comment
@metral

metral Jun 27, 2017

@euank I'm seeing odd network related oddities in 1409.x.0 where hitting the Tectonic console is returning ERR_EMPTY_RESPONSE, but using 1353.8.0 always works - it's being tracked in coreos/tectonic-installer#1171

Could you shine a light on if there is a possible kernel change/regression?

metral commented Jun 27, 2017

@euank I'm seeing odd network related oddities in 1409.x.0 where hitting the Tectonic console is returning ERR_EMPTY_RESPONSE, but using 1353.8.0 always works - it's being tracked in coreos/tectonic-installer#1171

Could you shine a light on if there is a possible kernel change/regression?

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 27, 2017

Contributor

Note with the host iptables version (v1.4.21) I can't reproduce this, but using a more recent iptables (e.g. one in fedora:25) I can reproduce this reliably with something like:

$ docker run --net=host --privileged fedora:25 \
sh -c 'dnf install -y iptables; iptables -w -I INPUT -p tcp --dport 9090 --syn -j DROP; iptables -w -D INPUT -p tcp --dport 9090 --syn -j DROP'
iptables: Bad rule (does a matching rule exist in that chain?). # on 1409.5.0

$ iptables-save | grep 9090
-A INPUT -p tcp -m tcp --dport 9090 --tcp-flags FIN,SYN,RST,ACK SYN -j DROP # On 1409.5.0

I believe this is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1459676, where a kernel patch is referenced, which is already included in an upstream point release it looks like.

As one fortunate data-point, some tools (such as kubernetes/kube-proxy) are packaged with old enough iptables versions that this doesn't impact them at the moment, and things on the host (e.g. docker itself) which exec iptables directly aren't impacted I think.

Contributor

euank commented Jun 27, 2017

Note with the host iptables version (v1.4.21) I can't reproduce this, but using a more recent iptables (e.g. one in fedora:25) I can reproduce this reliably with something like:

$ docker run --net=host --privileged fedora:25 \
sh -c 'dnf install -y iptables; iptables -w -I INPUT -p tcp --dport 9090 --syn -j DROP; iptables -w -D INPUT -p tcp --dport 9090 --syn -j DROP'
iptables: Bad rule (does a matching rule exist in that chain?). # on 1409.5.0

$ iptables-save | grep 9090
-A INPUT -p tcp -m tcp --dport 9090 --tcp-flags FIN,SYN,RST,ACK SYN -j DROP # On 1409.5.0

I believe this is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1459676, where a kernel patch is referenced, which is already included in an upstream point release it looks like.

As one fortunate data-point, some tools (such as kubernetes/kube-proxy) are packaged with old enough iptables versions that this doesn't impact them at the moment, and things on the host (e.g. docker itself) which exec iptables directly aren't impacted I think.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jun 27, 2017

Member

That patch is not in 4.11.7 and isn't currently in the 4.11 stable queue either.

Member

bgilbert commented Jun 27, 2017

That patch is not in 4.11.7 and isn't currently in the 4.11 stable queue either.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Jun 27, 2017

Contributor

@metral if it is this specific issue then looking at iptables rules on the host would show evidence in the form of rules left behind which should have been cleaned up.

@bgilbert my bad, you're right

Contributor

euank commented Jun 27, 2017

@metral if it is this specific issue then looking at iptables rules on the host would show evidence in the form of rules left behind which should have been cleaned up.

@bgilbert my bad, you're right

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jun 29, 2017

Member

This should be fixed in 4.11.8.

Member

bgilbert commented Jun 29, 2017

This should be fixed in 4.11.8.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jul 6, 2017

Member

This should now be fixed in the beta channel, and will be fixed in the next alpha and stable releases, due shortly.

Member

bgilbert commented Jul 6, 2017

This should now be fixed in the beta channel, and will be fixed in the next alpha and stable releases, due shortly.

@bgilbert bgilbert closed this Jul 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment