You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ive narrowed down an issue where it appears the DNS resolution in my docker containers is causing a delay of around 5seconds sending out HTTP post calls to an API.
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:57 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:16:44 2018
OS/Arch: linux/amd64
Experimental: false
My services are lifted with docker-compose 3.7 and a user defined bridge network.
With the particular container using all 3 networks and it is running "FROM node:11.10.1-stretch" so a debian based image, whilst the hosts are ubuntu 18.04.
Through testing, I have the following outputs from a simple ping within a container. As you can see the ping time is fine but its the dns resolution which takes up the time.
time docker run busybox ping -c 1 api.telegram.org
WARNING: Error loading config file: /home/xxx/.docker/config.json: stat /home/xxx/.docker/config.json: permission denied
PING api.telegram.org (149.154.167.220): 56 data bytes
64 bytes from 149.154.167.220: seq=0 ttl=49 time=122.568 ms
--- api.telegram.org ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 122.568/122.568/122.568 ms
real 0m9.498s
user 0m0.124s
sys 0m0.340s
Outside of container with the host it's near instant.
time ping -c 1 api.telegram.org
PING api.telegram.org (149.154.167.220) 56(84) bytes of data.
64 bytes from 149.154.167.220: icmp_seq=1 ttl=50 time=109 ms
--- api.telegram.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 109.608/109.608/109.608/0.000 ms
real 0m0.136s
user 0m0.000s
sys 0m0.008s
Whilst the above outputs are just a ping test, the same issues hit on nslookup:
I have previously been using node-alpine image but after some research I found that the muscl lib has a history of poor performance for DNS resolution. I've switched to node-slim which contains more of the jessie base image. This hasn't provided the improvements I was hoping for.
Today I spent some time narrowing down my API calls in the code to get some timings.
I noticed the following behaviour:
On first call to the API I hit around 4 seconds delay (not good)
On immediate 2nd call I have around 60ms delay (perfect!)
If I leave the application alone for a few minutes and then hit the API again I find the same 4 second delay.
If I keep hitting the api, after that initial 4 second delay I consistently receive the 60ms speeds I would expect.
The initial delay will be the resolution, but once this has been done I believe the agent keep alive (which appears to be 10s by default in underlying libs) connection is ensuring the ongoing hits are responding in good time.
DNS resolution is just damn slow to start with. I think this might be due to using my own bridge network in docker and some form of ipv6 failings.
Cache isn't lasting long (10 seconds at an application level), but this is fine if the resolution is quicker.
Listening on port 53 is systemd-resolve still so I went the route of symlink fix as per comment #2187 (comment) but this did not change anything.
I've checked strace on port 53:
sudo tcpdump -vvv -s 0 -l -n port 53
tcpdump: listening on wlp2s0, link-type EN10MB (Ethernet), capture size 262144 bytes
05:35:12.442852 IP (tos 0x0, ttl 64, id 48895, offset 0, flags [DF], proto UDP (17), length 60)
192.168.1.116.59581 > 192.168.1.141.53: [bad udp cksum 0x848b -> 0x24d4!] 57480+ A? localtunnel.me. (32)
05:35:12.442904 IP (tos 0x0, ttl 64, id 48896, offset 0, flags [DF], proto UDP (17), length 60)
192.168.1.116.59581 > 192.168.1.141.53: [bad udp cksum 0x848b -> 0xa6a5!] 24220+ AAAA? localtunnel.me. (32)
05:35:12.455301 IP (tos 0x0, ttl 64, id 47819, offset 0, flags [DF], proto UDP (17), length 76)
192.168.1.141.53 > 192.168.1.116.59581: [udp sum ok] 57480 q: A? localtunnel.me. 1/0/0 localtunnel.me. [13m50s] A 138.197.63.247 (48)
05:35:12.465921 IP (tos 0x0, ttl 64, id 47820, offset 0, flags [DF], proto UDP (17), length 60)
192.168.1.141.53 > 192.168.1.116.59581: [udp sum ok] 24220 q: AAAA? localtunnel.me. 0/0/0 (32)
05:35:12.522916 IP (tos 0x0, ttl 63, id 62222, offset 0, flags [DF], proto UDP (17), length 62)
192.168.1.116.39734 > 10.0.0.2.53: [bad udp cksum 0xcc59 -> 0x9375!] 1710+ A? api.telegram.org. (34)
05:35:16.523303 IP (tos 0x0, ttl 63, id 16594, offset 0, flags [DF], proto UDP (17), length 62)
192.168.1.116.53416 > 8.8.8.8.53: [bad udp cksum 0xd267 -> 0x57f5!] 1710+ A? api.telegram.org. (34)
05:35:16.543163 IP (tos 0x0, ttl 119, id 49810, offset 0, flags [none], proto UDP (17), length 78)
8.8.8.8.53 > 192.168.1.116.53416: [udp sum ok] 1710 q: A? api.telegram.org. 1/0/0 api.telegram.org. [2m29s] A 149.154.167.220 (50)
Here I can see the AAAA? v6 request and then the timeout for the v4's
Anyone come across this before and any ideas on how to fix? Im pretty stuck at this point with lots of conflicting info when searching.
Updates:
So I messed around with this some more today. I edited the containers resolv.conf to:
nameserver 8.8.4.4
options timeout:1
This allowed me to ping google.com and looks to be pretty instant. However, because the nameserver is not pointing to the default docker dns for my bridge network I cant get any connections to my other containers.
I've then made a few changes whilst the container was running to have the following resolv.conf
This results in a faster response (1.1second) instead of the 5, but its clear that this is due to the timeout I've added and also forcing the google dns to be ipv4 only.
The text was updated successfully, but these errors were encountered:
Ive narrowed down an issue where it appears the DNS resolution in my docker containers is causing a delay of around 5seconds sending out HTTP post calls to an API.
My services are lifted with docker-compose 3.7 and a user defined bridge network.
With the particular container using all 3 networks and it is running "FROM node:11.10.1-stretch" so a debian based image, whilst the hosts are ubuntu 18.04.
Through testing, I have the following outputs from a simple ping within a container. As you can see the ping time is fine but its the dns resolution which takes up the time.
Outside of container with the host it's near instant.
Whilst the above outputs are just a ping test, the same issues hit on nslookup:
Within containers:
Outside containers:
I have previously been using node-alpine image but after some research I found that the muscl lib has a history of poor performance for DNS resolution. I've switched to node-slim which contains more of the jessie base image. This hasn't provided the improvements I was hoping for.
Today I spent some time narrowing down my API calls in the code to get some timings.
I noticed the following behaviour:
The initial delay will be the resolution, but once this has been done I believe the agent keep alive (which appears to be 10s by default in underlying libs) connection is ensuring the ongoing hits are responding in good time.
Metrics:
https://i.stack.imgur.com/UkB03.png
I'm thinking that theres a couple of things here:
Listening on port 53 is systemd-resolve still so I went the route of symlink fix as per comment #2187 (comment) but this did not change anything.
I've checked strace on port 53:
Here I can see the AAAA? v6 request and then the timeout for the v4's
Anyone come across this before and any ideas on how to fix? Im pretty stuck at this point with lots of conflicting info when searching.
Updates:
So I messed around with this some more today. I edited the containers resolv.conf to:
This allowed me to ping google.com and looks to be pretty instant. However, because the nameserver is not pointing to the default docker dns for my bridge network I cant get any connections to my other containers.
I've then made a few changes whilst the container was running to have the following resolv.conf
This results in a faster response (1.1second) instead of the 5, but its clear that this is due to the timeout I've added and also forcing the google dns to be ipv4 only.
The text was updated successfully, but these errors were encountered: