-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coredns doesn't perform better despite having more cores #5595
Comments
Perhaps you are saturating the NIC throughput.
…On Sun, Sep 4, 2022 at 9:27 PM Isogram ***@***.***> wrote:
We are running CoreDNS 1.9.3 (retrieved from the official releases on
GitHub), and have been having difficulty with increasing performance of a
single instance of coredns.
With GOMAXPROCS set to 2, we seem to hit a performance limit of ~90-100k
qps.
With GOMAXPROCS set to 4, we observe that coredns will use all 4 cores -
but throughput does not increase, and latency seems to be the same.
We have the following corefile:
.:55 {
file db.example.org example.org
cache 100
whoami
}
db.example.org
$ORIGIN example.org.
@ 3600 IN SOA sns.dns.icann.org. noc.dns.icann.org. 2017042745 7200 3600 1209600 3600
3600 IN NS a.iana-servers.net.
3600 IN NS b.iana-servers.net.
www IN A 127.0.0.1
IN AAAA ::1
We are using dnsperf: https://github.com/DNS-OARC/dnsperf
And the following command:
dnsperf -d test.txt -s 127.0.0.1 -p 55 -Q 10000000 -c 1 -l 10000000 -S .1 -t 8
test.txt:
www.example.com AAAA
Is there anything we could be missing?
Thanks!
—
Reply to this email directly, view it on GitHub
<#5595>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACIHRM4L7BEGHO6KORZ7GCDV4VZDHANCNFSM6AAAAAAQET65CM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I guess with local host that shouldn’t be the case.
…On Sun, Sep 4, 2022 at 9:27 PM Isogram ***@***.***> wrote:
We are running CoreDNS 1.9.3 (retrieved from the official releases on
GitHub), and have been having difficulty with increasing performance of a
single instance of coredns.
With GOMAXPROCS set to 2, we seem to hit a performance limit of ~90-100k
qps.
With GOMAXPROCS set to 4, we observe that coredns will use all 4 cores -
but throughput does not increase, and latency seems to be the same.
We have the following corefile:
.:55 {
file db.example.org example.org
cache 100
whoami
}
db.example.org
$ORIGIN example.org.
@ 3600 IN SOA sns.dns.icann.org. noc.dns.icann.org. 2017042745 7200 3600 1209600 3600
3600 IN NS a.iana-servers.net.
3600 IN NS b.iana-servers.net.
www IN A 127.0.0.1
IN AAAA ::1
We are using dnsperf: https://github.com/DNS-OARC/dnsperf
And the following command:
dnsperf -d test.txt -s 127.0.0.1 -p 55 -Q 10000000 -c 1 -l 10000000 -S .1 -t 8
test.txt:
www.example.com AAAA
Is there anything we could be missing?
Thanks!
—
Reply to this email directly, view it on GitHub
<#5595>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACIHRM4L7BEGHO6KORZ7GCDV4VZDHANCNFSM6AAAAAAQET65CM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hello, If you could collect profiling data (CPU profile) exposed by pprof plugin, this could greatly benefit the investigation @gpl |
Attaching profiles for gomaxprocs 1,2,4,8,16. |
Could be something like that. Generally, if giving more CPU doesn't fix it, it is because you are hitting other bottlenecks. The question is whether those are in the CoreDNS code (for example, some mutex contention or somethign), or in the underlying OS or hardware. In this case it looks like writing to the UDP socket. Look into tuning UDP performance on your kernel. You may want to look at your UDP write buffer sizes, for example. |
Hmm, I don't believe either of those are the issue here -- We had previously attempted to adjust a various number of kernel parameters and haven't seen any significant deviance in performance - additionally, from our telemetry I don't believe we're seeing any issues on that front. Notably, the following values were adjusted on all hosts involved in this test:
We've also adjusted The tests were also run from various combination of hosts; and we observed the same results if the tests and server were on different hosts (identical hardware). |
Does the same CPU usage means CoreDNS use up all 8-64 cores? If so, have you check whether those CPU usage was all from CoreDNS? For instance other process/system service can steal some of those CPU time. Another idea is to measure CPU time in different categories(user, system, softirq, etc.). That can be helpful to find out the bottleneck. |
I tried off CPU analysis, the off CPU flame graph looks similar to perf's. With more than 4 CPUs assigned to CoreDNS, time spent in serveUDP increased significantly. Haven't got any clue though. |
I did more digging after that, seems like the bottleneck is the network IO pattern. CoreDNS starts 1 listener goroutine for each server instance, and creates 1 new goroutine for each new request. So we have a single-producer(reads request packets), and multi-consumers(handles requests and writes response packets) workflow. With more CPUs assigned to the CoreDNS process, consumers' processing speed can scale correspondingly but producer's cannot. And when the Corefile only uses some light plugins, the consumers' job is relatively simple so the handling process doesn't need much CPU time. We can hit producer's limit under high load because it has only 1 goroutine, it cannot utilize more than 1 core of CPU. I ran some tests with the following Corefile on my laptop:
tests:
|
Interesting. Any proposal for improvement? |
I could try to find a way. But I do agree with the idea of redis team: PS: @Lobshunter86 is me, too. |
I'm also interested in this issue; I haven't contributed, but this sounds fun to work on. Could I claim this? |
Please go ahead and have fun😉. I have been occupied at work recently. |
I threw the UDP message read within miekg/dns into a Goroutine pool. results are ok.
With my change:
So notably the CPU utilization went up, but QPS went up ~50%, and avg latency went down semi-significantly. |
In my understanding, golang/go#45886 should improve the performance of UDP long connections(i.e. read a bunch of data from the same UDP socket, like QUIC). Would it help improve DNS workload? Since every DNS request-response belongs to different socket. |
This CloudFlare blog post seems keenly relevant to this issue: Go, don't collect my garbage The author describes a performance puzzle very similar to what was described in the first post - namely, 1-4 cores works well, with quickly diminishing returns with higher concurrency. He achieved vastly improved performance by experimenting with Go Memory Garbage Collection tuning using the
Before After One caveat, is that his performance benchmark only ran for 10 seconds, which may have skewed the results in unexpected ways. The challenge with this is that it's highly hardware dependent, so there's no one "right answer" for setting the @gpl perhaps performing some tuning of the P.S. |
A memo: I found an interesting approach that uses I shall give it a try when I got time. |
yes @lobshunter that is correct. I think lwn article explain the improvements and few caveats (esp. with TCP) of using diff --git a/core/dnsserver/register.go b/core/dnsserver/register.go
index 8de55906..ac581eca 100644
--- a/core/dnsserver/register.go
+++ b/core/dnsserver/register.go
@@ -3,6 +3,8 @@ package dnsserver
import (
"fmt"
"net"
+ "os"
+ "strconv"
"time"
"github.com/coredns/caddy"
@@ -157,36 +159,43 @@ func (h *dnsContext) MakeServers() ([]caddy.Server, error) {
}
// then we create a server for each group
var servers []caddy.Server
- for addr, group := range groups {
- // switch on addr
- switch tr, _ := parse.Transport(addr); tr {
- case transport.DNS:
- s, err := NewServer(addr, group)
- if err != nil {
- return nil, err
- }
- servers = append(servers, s)
- case transport.TLS:
- s, err := NewServerTLS(addr, group)
- if err != nil {
- return nil, err
- }
- servers = append(servers, s)
+ numSock, err := strconv.ParseInt(os.Getenv("NUM_SOCK"), 10, 64)
+ if err != nil {
+ numSock = 1
+ }
+ for i := 0; i < int(numSock); i++ {
+ for addr, group := range groups {
+ // switch on addr
+ switch tr, _ := parse.Transport(addr); tr {
+ case transport.DNS:
+ s, err := NewServer(addr, group)
+ if err != nil {
+ return nil, err
+ }
+ servers = append(servers, s)
- case transport.GRPC:
- s, err := NewServergRPC(addr, group)
- if err != nil {
- return nil, err
- }
- servers = append(servers, s)
+ case transport.TLS:
+ s, err := NewServerTLS(addr, group)
+ if err != nil {
+ return nil, err
+ }
+ servers = append(servers, s)
- case transport.HTTPS:
- s, err := NewServerHTTPS(addr, group)
- if err != nil {
- return nil, err
+ case transport.GRPC:
+ s, err := NewServergRPC(addr, group)
+ if err != nil {
+ return nil, err
+ }
+ servers = append(servers, s)
+
+ case transport.HTTPS:
+ s, err := NewServerHTTPS(addr, group)
+ if err != nil {
+ return nil, err
+ }
+ servers = append(servers, s)
}
- servers = append(servers, s)
}
} Essentially, I've just exposed an env var
1. With single listen socket, I'm able to achieve ~130K qps throughput from dnsperf on some private cloud instance. $ NUM_SOCK=1 taskset -c 2-35 ./coredns-fix
.:55
CoreDNS-1.10.1
linux/amd64, go1.19.3 $ taskset -c 38-71 dnsperf -d test.txt -s 127.0.0.1 -p 55 -c 1000 -l 100000 -S .1 -T 16
Queries sent: 5919568
Queries completed: 5919470 (100.00%)
Queries lost: 0 (0.00%)
Queries interrupted: 98 (0.00%)
Response codes: NOERROR 5919470 (100.00%)
Average packet size: request 33, response 103
Run time (s): 45.693927
Queries per second: 129546.099200
Average Latency (s): 0.000756 (min 0.000016, max 0.006743)
Latency StdDev (s): 0.000400
2. With two listen socket, I'm able to achieve ~235K qps throughput from dnsperf. $ NUM_SOCK=2 taskset -c 2-35 ./coredns-fix
.:55
.:55
CoreDNS-1.10.1
linux/amd64, go1.19.3 $ ss -u -a | grep 55
UNCONN 0 0 *:55 *:*
UNCONN 0 0 *:55 *:* $ taskset -c 38-71 dnsperf -d test.txt -s 127.0.0.1 -p 55 -c 1000 -l 100000 -S .1 -T 16
Queries sent: 17760093
Queries completed: 17759997 (100.00%)
Queries lost: 0 (0.00%)
Queries interrupted: 96 (0.00%)
Response codes: NOERROR 17759997 (100.00%)
Average packet size: request 33, response 103
Run time (s): 75.404526
Queries per second: 235529.588768
Average Latency (s): 0.000411 (min 0.000018, max 0.006754)
Latency StdDev (s): 0.000379
3. With 4 listen socket, I'm able to achieve ~400K qps throughput from dnsperf. $ NUM_SOCK=4 taskset -c 2-35 ./coredns-fix
.:55
.:55
.:55
.:55
CoreDNS-1.10.1
linux/amd64, go1.19.3 $ ss -u -a | grep 55
UNCONN 0 0 *:55 *:*
UNCONN 0 0 *:55 *:*
UNCONN 0 0 *:55 *:*
UNCONN 0 0 *:55 *:* $ taskset -c 38-71 dnsperf -d test.txt -s 127.0.0.1 -p 55 -c 1000 -l 100000 -S .1 -T 16
Queries sent: 20535534
Queries completed: 20535443 (100.00%)
Queries lost: 0 (0.00%)
Queries interrupted: 91 (0.00%)
Response codes: NOERROR 20535443 (100.00%)
Average packet size: request 33, response 103
Run time (s): 51.342591
Queries per second: 399968.965337
Average Latency (s): 0.000235 (min 0.000020, max 0.003655)
Latency StdDev (s): 0.000197
So, I think bottleneck was indeed due to throughput limitation on single socket & we are able to scale throughput almost linearly as we increase no. of listen socket. I'll create a pull request after validating the tcp traffic (non tls based) as I gets some more time. Thanks. |
@iyashu Excellent productivity 👍. |
@iyashu Really looking forward for this PR |
We are running CoreDNS 1.9.3 (retrieved from the official releases on GitHub), and have been having difficulty with increasing performance of a single instance of coredns.
With GOMAXPROCS set to 1, we observe ~60k qps and full utilization of one core.
With GOMAXPROCS set to 2, we seem to hit a performance limit of ~90-100k qps, but it consumes almost entirely two cores.
With GOMAXPROCS set to 4, we observe that coredns will use all 4 cores - but throughput does not increase, and latency seems to be the same.
With GOMAXPROCS set to 8-64, we observe the same CPU usage and throughput.
We have the following corefile:
db.example.org
We are using
dnsperf
: https://github.com/DNS-OARC/dnsperfAnd the following command:
test.txt:
Is there anything we could be missing?
Thanks!
The text was updated successfully, but these errors were encountered: