Idle connections over overlay network ends up in a broken state after 15 minutes #31208

christopherobin · 2017-02-21T04:03:44Z

Description

In a swarm setup using overlay networks, idle connections between 2 services will end up in a broken state after 15 minutes.

The issue is related to the way docker overlay routes packets, using first iptables to mark them and use ipvs to forward them to the right hosts but the default expiration for connections on ipvs is set to 900 seconds (ipvsadm -l --timeout) after which it will stop forwarding packets even though the connection still exists; If this happens then any new packet on this connection will now try to go to the virtual IP for that service that has no valid resolution, resulting in a broken state where it is stuck in limbo while the kernel forever tries to resolve that virtual IP.

Steps to reproduce the issue:

Start 2 services on the same network (on different hosts, though it should be reproducible even on a single host?)
docker exec in both of them, in one start a nc command in listen mode, in the other one connect to that nc server by using the service name DNS.
Send a packet from the client to the server, everything is fine
Find your netns and find your connection by doing nsenter --net=2cc18e502f81 ipvsadm -lnc
Wait for the connection to expire and be removed from the list
Send another packet, nothing ever gets there and the connection doesn't timeout, tcpdump shows lots of ARP packets going out

Describe the results you received:

Packet never reaches the target, kernel is stuck doing ARP requests over and over.

Describe the results you expected:

Either have the connection properly timeout, or find a way to restore the routing in ipvs.

Additional information you deem important (e.g. issue happens only occasionally):

Currently can be resolved by setting net.ipv4.tcp_keepalive_time to less than 900 seconds, to make sure the TCP connection doesn't expire but I'm not sure if it's a valid way to deal with this; At the very least this behavior should be documented.

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:38:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:38:28 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.13.1
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: fluentd
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: l3e2evjei4cvcdgjqavtrztgo
 Is Manager: false
 Node Address: 172.24.0.100
 Manager Addresses:
  172.24.0.200:2377
  172.24.0.50:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.796 GiB
Name: worker-1
ID: DR4G:LZEQ:YSQ7:CYTR:FAXW:ZNVJ:E4AZ:BX5L:QYYG:ZDY5:SO7U:TFZW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Labels:
 dawn.node.type=worker
 dawn.node.subtype=app
Experimental: false
Insecure Registries:
 172.24.0.50:5000
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

My current test setup is 5 vagrant boxes (2 managers + 3 workers), but it should happen in any environment.

The text was updated successfully, but these errors were encountered:

sanimej · 2017-02-21T09:22:48Z

@christopherchines With IPVS or any other man in the middle NAT/Firewall the TCP keep-alive timer has to be tuned when you have "silent" long lived sessions. I will add a note about this in the documentation.

The connection would have got terminated if the TCP packet was delivered to a different backend and resulted in a RST from that backend. But I guess whats happening here is after the initial session expires, when IPVS gets a TCP packet that is not SYN its dropping it and not sending it to backend. This makes sense because its a new TCP session for IPVS and doesn't have SYN bit set.

GabKlein · 2017-05-24T19:24:38Z

@christopherobin How did you manage to work around this issue? I'm having the problem between my app that create a pool and my db. After 15 minute being idle the app is not able to reconnect.
I tried adding a sysctl file echo "net.ipv4.tcp_keepalive_time = 60" > /etc/sysctl.d/60-keepalive.conf without success. My my app is still hanging after 15 minutes being idle :/

christopherobin · 2017-05-25T02:32:08Z

@GabKlein My current setup uses the following:

net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10

I took the values from https://access.redhat.com/solutions/23874 and tweaked them slightly for our setup. Didn't run in the issue since then.

To check if it's working you can use nsenter and ipvsadm to take a look at your connections and check if they are being pinged properly (see this article for details on how to do that)

GabKlein · 2017-05-25T05:38:00Z

Thanks you @christopherobin, I'm going to give it a shot. Adding this settings as a sysctl file is the best way? Do you have to reboot nodes or restart services to apply them?

christopherobin · 2017-05-25T07:13:18Z

I'm using ansible to provision my server and it stores the variables in a file in/etc/sysctl.d.

If you are not rebooting you can create the files and run sysctl --system to reload all configuration files, it will also tell you what was loaded in which order so you can see if anything else might be overriding your config.

mavenugo · 2017-07-14T16:17:31Z

Pls check if this comment is applicable here.

sixcounts · 2017-11-06T21:14:09Z

@mavenugo @christopherobin @GabKlein @sanimej Hey Team, this Moby GitHub has no assignee, can someone give us an overview of where this is at?

Thank you

bm-skutzke · 2017-11-19T00:19:07Z

@christopherobin Thanks a lot!

Tweaking net.ipv4.tcp_keepalive_time solved my issues with long-running curl queries to a REST API of a reporting service. This service consists of a Tomcat and a MySQL container running on different Docker Swarm nodes.
MySQL errors like "Aborted connection XXX to db ... (Got an error reading communication packets)" disappeared as well.

BenoitNorrin · 2017-11-23T14:53:04Z

We are facing this issue too and tweaking net.ipv4.tcp_keepalive_* didn't help too.

docker version:

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

docker info:

Containers: 23
 Running: 13
 Paused: 0
 Stopped: 10
Images: 15
Server Version: 17.09.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: c0yf55uumm2j1sr81l0bsuskn
 Is Manager: true
 ClusterID: wwv9hujonsqhlwpakwjfqacbt
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
  External CAs:
    cfssl: https://10.211.164.217:12381/api/v1/cfssl/sign
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.211.164.217
 Manager Addresses:
  10.211.164.217:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-98-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 15.67GiB
Name: vm-swm-overlay-1
ID: KFCC:IGCA:OSVX:62BR:7S6Z:LC3G:H6DT:IRPL:DHTG:DF7A:QSL4:2YUH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Here a simple way to reproduce this issue with netcat.

server.yml

version: '3.2'
services:
  server:
    image: multicloud/netcat
    ports:
      - 9898
    command: -lp 9898
    deploy:
      mode: replicated
      replicas: 1
    stdin_open: true
    tty: true
    networks:
      - test-timeout
networks:
  test-timeout:
    external: true

client.yml

version: '3.2'
services:
  client:
    image: multicloud/netcat
    command: server 9898
    deploy:
      mode: replicated
      replicas: 1
    stdin_open: true
    tty: true
    networks:
      - test-timeout

networks:
  test-timeout:
    external: true

Launch server container then client.
docker stack deploy -c server.yml netcat
docker stack deploy -c client.yml netcat
Attach a terminal to both containers :
docker attach $(docker ps -q -f name="netcat_server")
docker attach $(docker ps -q -f name="netcat_client")
Write something in one of the terminals, you will see the result in the other.
Wait at least 900 seconds.
Write something ...boom! the connection is broken, the container will crash

Tested with ubuntu and centos.

I think this problem is related to the default service discovery of swarm because does not occur in dnsrr mode.

harry75369 · 2017-12-05T12:38:50Z

This problem is due to the kernel module IPVS. Look at this line: https://github.com/torvalds/linux/blob/master/net/netfilter/ipvs/ip_vs_proto_tcp.c#L366

I changed the IP_VS_TCP_S_ESTABLISHED timeout from 900 to a larger value, recompile the module and reload ip_vs and ip_vs_rr kernel modules, this problem is gone. (maybe reload just ip_vs is also fine, not tested)

Compare with the following default kernel parameters, the IP_VS_TCP_S_ESTABLISHED value of IPVS is obviously too small!

net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

On the other side, tuning kernel parameters like net.ipv4.tcp_keepalive_timeout does not work for me. Even using the default values, I cannot capture tcp keepalive packages when it should. And thus the connection will always be dropped/reset by IPVS eventually. I think it is due to my application. Because even though kernel support TCP keepalive, the application has to be set up properly. See http://www.tldp.org/HOWTO/TCP-Keepalive-HOWTO/programming.html

vassilvk · 2018-01-12T17:15:30Z

@christopherobin, @bm-skutzke, how were you able to set net.ipv4.tcp_keepalive_time in service-task containers running in Docker swarm mode?

This is a namespaced kernel parameter and it looks like tuning sysctl parameters is not yet possible in Docker swarm mode: #25209, #33649.

I tried to bake net.ipv4.tcp_keepalive_time = 600 into the image through a new /etc/sysctl.d/* file as well as by directly modifying /etc/sysctl.conf, but those changes didn't take. Running that image as a service in Docker swarm through docker stack deploy and then shelling into it and probing net.ipv4.tcp_keepalive_time gives back the default value of 7200.

I am asking, because based on your comments, it seems like both of you managed to pull that off in a Docker swarm mode setup somehow?..

(Running Docker 17.12.0-ce, build c97c6d6 on Win10).

christopherobin · 2018-01-15T02:47:44Z

@vassilvk I have been running my own VMs and bare-metal servers so I didn't run into your issue. I'm not entirely sure what is the best way to do it for Docker on Windows.

Baking the parameters in the docker images themselves won't work (since the init in your container won't apply anything from those file and like you said they are namespaced) so you'll need to do it at the host level.

I'd recommend opening an issue on https://github.com/linuxkit/linuxkit to have it baked in the default image and maybe try to make your own image in the meantime.

It might also be possible to do set it by abusing nsenter from a privileged container, something like nsenter -t 1 -a sysctl -w net.ipv4.tcp_keepalive_time=600 maybe? Not sure if it works and changes will be lost every time docker or your machine restarts.

vassilvk · 2018-01-15T14:42:22Z

Thanks @christopherobin - makes sense.
I didn't realize that you were making those changes on the host.

ju-la-berger · 2018-06-15T13:32:59Z

We recently ran into this issue using Docker CE 18.03.1 on CentOS 7.

Using Swarm overlay networking and endpoint mode virtual IP (vip) on the server side (i.e. database Swarm service or another microservice Swarm service), causes TCP connections to break after being idle for 15 minutes. (This applies to our JDBC connection pool as well as the Netty HTTP client connection pool for inter-service communications.)

Our workaround is to set the database service to endpoint mode dnsrr and to disable the Netty HTTP connection pooling. (The database will not be a Swarm service in production anyway.)

My question is: Does anyone work on this issue or do you have any other recommendations regarding workarounds? (Other than switching to Kubernetes.)

Thanks in advance!

vassilvk · 2018-06-15T23:14:28Z

@ju-la-berger - I solved the issue on my end by using keep-alive for the application-level connection. This is protocol specific (I am using gRPC). If Netty HTTP supports keep-alive, maybe you can try that.

fcrisciani · 2018-07-17T15:26:22Z

Please refer to: #37466 (comment) and https://success.docker.com/article/ipvs-connection-timeout-issue

thaJeztah · 2018-07-17T16:29:45Z

let me close this issue, with the comments above referring to solutions / how to configure

thaJeztah · 2018-08-23T10:42:31Z

WIP Pull request for setting sysctl for swarm services: #37701 / moby/swarmkit#2729

… 15 minutes xref moby/moby#31208

GordonTheTurtle added area/networking version/1.13 labels Feb 21, 2017

This was referenced Jun 4, 2017

Stops responding after a little idle time puma/puma#1184

Closed

Swarm is having occasional network connection problems between nodes. #32195

Closed

rHorsey mentioned this issue Oct 12, 2017

Optimization Infinite Hang Fix NREL/OpenStudio-server#213

Merged

thaJeztah mentioned this issue Oct 27, 2017

Socket connection gets reset after a period of inactivity on overlay networks #33685

Open

ivanprado mentioned this issue Dec 22, 2017

500 error due to connection to database broken splashblot/dronedb#99

Closed

yaskoo mentioned this issue Jul 17, 2018

connections get "stuck" in swarm between wildfly and postgres #37466

Closed

thaJeztah closed this as completed Jul 17, 2018

dperny mentioned this issue Aug 23, 2018

Add support for sysctl options in services #37701

Merged

gypark mentioned this issue Sep 11, 2018

Docker Swarm: TCP connection between containers disconnected after being idle for about 900 seconds #37822

Open

bslizon mentioned this issue Sep 20, 2018

enable TCP keep-alives tap4fun/fasthttp#1

Merged

athurg mentioned this issue Sep 20, 2018

Enable TCP KeepAlive valyala/fasthttp#424

Closed

q210 mentioned this issue Sep 21, 2018

connection was closed in the middle of operation MagicStack/asyncpg#309

Open

athurg mentioned this issue Sep 26, 2018

Add tcp keepalive valyala/fasthttp#427

Merged

bslizon mentioned this issue Sep 28, 2018

Fasthttp behind Aws load balancer. Keepalive conn are causing trouble valyala/fasthttp#348

Closed

This was referenced Jan 6, 2019

gRPC streaming keepAlive doesn't work with docker swarm grpc-ecosystem/grpc-gateway#838

Closed

gRPC streaming keepAlive doesn't work with docker swarm grpc/grpc-go#2549

Closed

Raithybabes mentioned this issue Feb 9, 2019

strange reconnect error nicdex/node-eventstore-client#60

Closed

Ilhicas mentioned this issue May 2, 2019

Closed connections remain opened brettwooldridge/HikariCP#1237

Open

sehrope mentioned this issue Dec 4, 2019

Connection terminated unexpectedly with long running query brianc/node-postgres#2018

Closed

grefab mentioned this issue Dec 4, 2019

Workers stop working after ~10 minutes idle time bazelbuild/bazel-buildfarm#299

Closed

greenape mentioned this issue Jan 9, 2020

Broken connections when ETL ops last more than 15 minutes Flowminder/FlowKit#1771

Closed

sanderegg mentioned this issue Feb 12, 2020

Bug1242/fix postgres drops ITISFoundation/osparc-simcore#1284

Merged

5 tasks

NotJustAnna mentioned this issue Jun 4, 2020

help me fix this program : ReqlDriverError: Response pump closed rethinkdb/rethinkdb-java#53

Open

nijel mentioned this issue Sep 3, 2020

400 (connection already closed) error when upload soon after download WeblateOrg/docker#786

Closed

rnantes mentioned this issue Dec 3, 2020

Connection reset by peer not resolving vapor/postgres-kit#164

Closed

dargmuesli mentioned this issue Dec 29, 2020

Postgres connection timeout in Docker Swarm subzerocloud/pg-amqp-bridge#23

Closed

JonnyBeeGod mentioned this issue Feb 19, 2021

404 for grafana / prometheus requests tiangolo/dockerswarm.rocks#73

Closed

minhpq331 mentioned this issue Apr 27, 2021

server_unavaliable error emqx/emqx-auth-http#256

Open

dyrnq added a commit to dyrnq/kubeadm-vagrant that referenced this issue Sep 27, 2021

Idle connections over overlay network ends up in a broken state after…

2631862

… 15 minutes xref moby/moby#31208

pastewka mentioned this issue Jan 21, 2022

Memcache error ContactEngineering/topobank#737

Closed

mrostan mentioned this issue Dec 28, 2023

Add TCP Keep Alive settings for Redshift and Postgres connections monte-carlo-data/apollo-agent#63

Merged

ghhv mentioned this issue Jan 24, 2024

Add MaxLifeTime configuration for HikariCP to traccar.xml traccar/traccar#5255

Open

secain mentioned this issue Apr 8, 2024

Persistent timeout problems with triton http client triton-inference-server/server#6909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle connections over overlay network ends up in a broken state after 15 minutes #31208

Idle connections over overlay network ends up in a broken state after 15 minutes #31208

christopherobin commented Feb 21, 2017

sanimej commented Feb 21, 2017

GabKlein commented May 24, 2017

christopherobin commented May 25, 2017

GabKlein commented May 25, 2017

christopherobin commented May 25, 2017

mavenugo commented Jul 14, 2017

sixcounts commented Nov 6, 2017

bm-skutzke commented Nov 19, 2017

BenoitNorrin commented Nov 23, 2017 •

edited

Loading

harry75369 commented Dec 5, 2017 •

edited

Loading

vassilvk commented Jan 12, 2018 •

edited

Loading

christopherobin commented Jan 15, 2018

vassilvk commented Jan 15, 2018

ju-la-berger commented Jun 15, 2018

vassilvk commented Jun 15, 2018 •

edited

Loading

fcrisciani commented Jul 17, 2018

thaJeztah commented Jul 17, 2018

thaJeztah commented Aug 23, 2018 •

edited

Loading

Idle connections over overlay network ends up in a broken state after 15 minutes #31208

Idle connections over overlay network ends up in a broken state after 15 minutes #31208

Comments

christopherobin commented Feb 21, 2017

sanimej commented Feb 21, 2017

GabKlein commented May 24, 2017

christopherobin commented May 25, 2017

GabKlein commented May 25, 2017

christopherobin commented May 25, 2017

mavenugo commented Jul 14, 2017

sixcounts commented Nov 6, 2017

bm-skutzke commented Nov 19, 2017

BenoitNorrin commented Nov 23, 2017 • edited Loading

harry75369 commented Dec 5, 2017 • edited Loading

vassilvk commented Jan 12, 2018 • edited Loading

christopherobin commented Jan 15, 2018

vassilvk commented Jan 15, 2018

ju-la-berger commented Jun 15, 2018

vassilvk commented Jun 15, 2018 • edited Loading

fcrisciani commented Jul 17, 2018

thaJeztah commented Jul 17, 2018

thaJeztah commented Aug 23, 2018 • edited Loading

BenoitNorrin commented Nov 23, 2017 •

edited

Loading

harry75369 commented Dec 5, 2017 •

edited

Loading

vassilvk commented Jan 12, 2018 •

edited

Loading

vassilvk commented Jun 15, 2018 •

edited

Loading

thaJeztah commented Aug 23, 2018 •

edited

Loading