request spike creates memory spike #2593

szuecs · 2019-02-21T10:08:42Z

To show the numbers in our tests, created by https://github.com/mikkeloscar/go-dnsperf

RPS

Memory

During an outage we had 18k RPS and 800MB memory consumption per CoreDNS instance. 800MB shown in grafana, but I expect the peak was much higher, because we had to increase memory from 1GB to 2 GB to survive). Before the outage we had 3.5k RPS and 64MB memory consumption per CoreDNS instance.

The usage pattern in test and outage are to have a couple of external (not cluster.local) DNS names to resolve.
CoreDNS configuration
CoreDNS deployment
/etc/resolv.conf is kubernetes default with ndots 5 and search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal. A call to www.example.org will do 5x 2 DNS queries.

More careful and detailed results by load tests with CoreDNS as daemonset and dnsmasq in front of that daemonset show these numbers:

CoreDNS with 100Mi could handle ~5-6k RPS (beyond that crashing CoreDNS)
CoreDNS with 1000Mi could handle ~10-11k RPS (beyond that crashing CoreDNS)
dnsmasq in front we can handle with 100Mi 35k RPS without crash

old setup

We used this config in our old setup:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  labels:
    application: coredns
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            upstream
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
    }

If you need the version from the outage and start params of the deployment: https://github.com/zalando-incubator/kubernetes-on-aws/blob/dc008aa07ae480d9ba25dc9f6ca8d9d56aa813f4/cluster/manifests/coredns/deployment-coredns.yaml

new setup

Tests were running with CoreDNS 1.2, daemonset:
https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/manifests/coredns-local/daemonset-coredns.yaml
configmap: https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/cluster/manifests/coredns-local/configmap-local.yaml

The text was updated successfully, but these errors were encountered:

fturib · 2019-02-21T22:13:32Z

@szuecs : Thanks you for isolating this problem in its own issue.

QUESTION: I guess that the outage you experiences is the one that is reported in this issue : #2554.

I read that post-mortem outage description but did not relate that was the description of THIS outage.

Let me verify I understand properly the different scenario you went throuhg:
1- you experienced an outage in production

During an outage we had 18k RPS and 800MB memory consumption per CoreDNS instance. 800MB shown in grafana, but I expect the peak was much higher, because we had to increase memory from 1GB to 2 GB to survive). Before the outage we had 3.5k RPS and 64MB memory consumption per CoreDNS instance.

During this outage, you think you queries at most hundredth of different domains.

2- your created a setup to show the problem using "old setup" and the result is what is visible in the graphs here

The test is just pinging always the same upstream domain (www.exemple.org).

3- you run 3 uses cases of the similar test, just changing the config of CoreDNS and dnsmasq, using "new setup", and modifying the max allocation of memory.

More careful and detailed results by load tests with CoreDNS as daemonset and dnsmasq in front of that daemonset show these numbers:

QUESTION: in this latter case (3). You are still using the same tool for sending load to CoreDNS ?

I guess you modified the options of the test until finding the crashing point of coreDNS. Am I correct ?
I mean, modifying the deployment-xxxx.yaml file.

       -names=example.org
        **-rps=100** <= here going up to 10000
        - timeout=10s
        - enable-logging=true

Without running the test yet .. I came to same conclusions as here

@rajansandeep is proposing to reproduce locally the same configuration so we can investigate what is really happening (and validate the above hypothesis or not).
In progress ....

szuecs · 2019-02-22T10:06:34Z

@szuecs : Thanks you for isolating this problem in its own issue.

QUESTION: I guess that the outage you experiences is the one that is reported in this issue : #2554.

I read that post-mortem outage description but did not relate that was the description of THIS outage.

Let me verify I understand properly the different scenario you went throuhg:
1- you experienced an outage in production

During an outage we had 18k RPS and 800MB memory consumption per CoreDNS instance. 800MB shown in grafana, but I expect the peak was much higher, because we had to increase memory from 1GB to 2 GB to survive). Before the outage we had 3.5k RPS and 64MB memory consumption per CoreDNS instance.

During this outage, you think you queries at most hundredth of different domains.

Yes

2- your created a setup to show the problem using "old setup" and the result is what is visible in the graphs here

The test is just pinging always the same upstream domain (www.exemple.org).

no it was requesting not one but multiple names (up to 100).

3- you run 3 uses cases of the similar test, just changing the config of CoreDNS and dnsmasq, using "new setup", and modifying the max allocation of memory.

More careful and detailed results by load tests with CoreDNS as daemonset and dnsmasq in front of that daemonset show these numbers:

QUESTION: in this latter case (3). You are still using the same tool for sending load to CoreDNS ?

Yes

I guess you modified the options of the test until finding the crashing point of coreDNS. Am I correct ?
I mean, modifying the deployment-xxxx.yaml file.
       -names=example.org
        **-rps=100** <= here going up to 10000
        - timeout=10s
        - enable-logging=true

No, we increased the replicas and set the same 100 names on all of them to make it more similar to one nodejs application with 150 replicas that hit the dns setup.

Without running the test yet .. I came to same conclusions as here

I think the best would be to use perf and maybe pprof to identify the memory peak.

@rajansandeep is proposing to reproduce locally the same configuration so we can investigate what is really happening (and validate the above hypothesis or not).
In progress ....

+1 for try to reproduce it locally, sometimes it's hard but if you have success in this it will be much easier to pinpoint the cause.
If you are not able to do this locally then it might make sense to create a cluster setup and run the test in this isolated environment and use cssh to run pprof/perf to get the data you need to find this.
If you see nothing in pprof (go runtime inspection) that shows it, then check perf (kernel view, dump to file, check later: socket,tcp,udp send/recv queues).

miekg · 2019-02-24T17:08:26Z

I'm (once again) coming from the other side with a bare minimal setup and going from there. This is running on Packet, with a dst and src machine that query via the network. Both dst and src running coredns, where the one on src forwards to dst.

A bare forward clause (not testing w/ proxy 'cause I will announce that it will be remove in the next-next release), testing with dnsperf (https://github.com/DNS-OARC/dnsperf), does about 22K qps for forwarded traffic on these machines

  Response codes:       NOERROR 224089 (100.00%)
  Average packet size:  request 29, response 91
  Run time (s):         10.005022
  Queries per second:   22397.651899

Add more plugins as go, prometheus and errors enabled; this then drops then to 20Kqps; depending on the output of this issue we may also want to look into that; there is still a defer (IIRC) in errors that can be removed.

The following prometheus metrics are from a half our run, which this outcome:

Statistics:

  Queries sent:         19745884
  Queries completed:    19745884 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 19745884 (100.00%)
  Average packet size:  request 29, response 93
  Run time (s):         1000.003402
  Queries per second:   19745.816825

  Average Latency (s):  0.004773 (min 0.000158, max 0.040260)
  Latency StdDev (s):   0.002272

miekg · 2019-02-24T17:22:59Z

Adding all plugins, except kubernetes, because that's so too hard to test outside k8s (see #2575). Drops a few qps. Memory according to Prometheus. (right most graph).

If its easy to swap out, can you try a new coredns and change proxy to forward?

miekg · 2019-02-24T18:12:44Z

Ok, now having the dst coredns forward to 1.1.1.1, 8.8.8.8 and 8.8.4.4 and running alexa 1k. Running this again, for 65 second (expunges the cache at least once) and also because of the 2nd forwarding this is actually internet data.

DNS Performance Testing Tool
Version 2.2.1

[Status] Command line: dnsperf -s 127.0.0.1 -p 1053 -l 65 -d ./top-1k.dnsperf
[Status] Sending queries (to 127.0.0.1)
[Status] Started at: Sun Feb 24 18:08:38 2019
[Status] Stopping after 65.000000 seconds
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         1808564
  Queries completed:    1808564 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 1804946 (99.80%), SERVFAIL 3618 (0.20%)
  Average packet size:  request 29, response 101
  Run time (s):         65.071978
  Queries per second:   27793.284538

  Average Latency (s):  0.003284 (min 0.000057, max 2.007104)
  Latency StdDev (s):   0.010106

process_resident_memory_bytes also hovers in the 40/60 MB range.

So this cuts out everything except the k8s plugin.

miekg · 2019-02-25T06:30:52Z

Note this Corefile

.:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            upstream
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
    }

Is inefficient, the entire reverse trees are tunneled through k8s and only if they are nxdomain (falltthrough) they are resolved on the internet. In the original configmap there was also a rewrite meaning you apply a regexp on every request as well.

Much better would be to split this up into multiple servers and specify a more specific reverse for k8s:

cluster.local 10.x/16 ::1/16 { # or whatever the reverse v6 is
    errors
    health
    kubernetes  {
            pods insecure
            upstream
   }
 cache 30 # even this is border line, because of internal k8s caching
}

. {
        errors
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
    }

miekg · 2019-02-25T06:31:32Z

I think what we need to do is perf just the k8s plugin and check where memory is being used .

szuecs · 2019-02-25T09:20:01Z

@miekg the problem with all dnsperf tools is that they do not create load that makes sense for general case. https://github.com/mikkeloscar/go-dnsperf uses /etc/resolv.conf settings to generate queries. We got completely other results when we used other tools to create load.

miekg · 2019-02-25T09:51:31Z

[ Quoting <notifications@github.com> in "Re: [coredns/coredns] request spike..." ]

@miekg the problem with all dnsperf tools is that they do not create load that makes sense for general case. https://github.com/mikkeloscar/go-dnsperf uses /etc/resolv.conf settings to generate queries. We got completely other results when we used other tools to create load.

this should not matter, a request is a request.

chrisohaver · 2019-02-25T14:20:36Z

What @szuecs is saying if i'm not mistaken, is that most DNS performance tools (for good reason) send requests directly (as a single request), whereas go-dnsperf appends search domains from /etc/resolv.conf. With the k8s "search-path/ndots:5" situation, this multiplies the actual number of queries being sent by a large amount (X4-5). So, from the client POV, it's only making 1000 RPS, the Server sees 4000-5000 RPS.

szuecs · 2019-02-25T15:04:47Z

@chrisohaver exactly, but ( 1 + number of search path) * 2 (A and AAAA records are requested separately)

@miekg while you are correct, it's not, because of TLD nameservers are different, caching might be different, .... Details really matter in this case.

miekg · 2019-02-25T18:57:53Z

[ Quoting <notifications@github.com> in "Re: [coredns/coredns] request spike..." ]

@chrisohaver exactly, but `( 1 + number of search path) * 2` (A and AAAA records are requested separately)

Yep. Noted. That's why you shouldn't use that to debug or test. This can be easily done with the performance tools that are available and a generic (fully qualified) lists of domain is also preferable to having a dependency on /etc/resolv.conf. I'll add a bunch of reverse names and names that I know will return nxdomain. Coming back to the Corefile used in the first comment - any reverse lookup will give atrocious performance.

chrisohaver · 2019-02-25T19:01:32Z

@szuecs, are there a large number of reverse lookups in your load?

chrisohaver · 2019-02-25T19:04:36Z

any reverse lookup will give atrocious performance.

Actually, any reverse lookup of an IP outside the cluster would have bad performance. For reverse lookups inside the cluster, there would be no performance penalty.

miekg · 2019-02-25T19:15:28Z

re: mem usage
there is one obvious candidate for unbounded memory growth and that's the kube-cache that caches things. Figuring that out what exactlyrequires navigating the client-go libs again (*sigh, as these a complex and opaque)

rajansandeep · 2019-02-25T20:08:03Z

The usage pattern in test and outage are to have a couple of external (not cluster.local) DNS names to resolve.

@szuecs So the queries in the test are all external names or are there internal name queries as well?

chrisohaver · 2019-02-25T20:10:00Z

@szuecs, can you share the list of names you tested with?

szuecs · 2019-02-25T20:26:32Z

We don’t use PTR records and the last time I saw something like a reverse lookup was when Apache did a reverse lookup for every access log. :D

The host name pattern looks like this:

svcname.clustername.example.com and we have often also cross cluster calls. If you have 5 different clusters and 10 different services and multiply it should be fine for the test. These hostnames were all external names. They just started to move workloads into this cluster.

Our current tested idea to make it even better is ndots: 2 and caching with dnsmasq in front of coredns.

chrisohaver · 2019-02-25T21:28:53Z

OK thanks. Due to the ndots/search path thing, svcname.clustername.example.com results in about 60% of the query load being destined to the kubernetes plugin, the rest being forwarded upstream. The queries that go to k8s plugin though get rejected pretty early, mostly during qname parsing (in parseRequest()), before diving into the k8s go-client cache.

Regarding the go-client, there is the k8s api watch (asynchronous from queries), but the resource usage there should not be correlated to the RPS load. i.e. we should not expect the client-go watch to start taking up more resources when the RPS ramps up.

Of course there is also the response cache (cache plugin). Which means that at these high RPS loads, practically 100% of queries are actually being served from cache, and with a small set of distinct query names (5 or so), the cache should not be write locked very much at all during the test.

miekg · 2019-02-28T06:31:34Z

@szuecs do you have a graph of the number of goroutines and cpu?

also not sure anymore if this is kubernetes plugin related

rajansandeep · 2019-02-28T20:09:28Z

I think I have reproduced the memory issue.
TL;DR: CoreDNS gets OOMKilled at high RPS.

Setup

I used the perf-tool used by @szuecs from https://github.com/mikkeloscar/go-dnsperf to check performance of CoreDNS in Kubernetes.

4 node Kubernetes v1.13.3 ( 1 Master and 3 worker nodes)
CoreDNS v1.3.1 with 2 replicas deployed on the Master node with following default ConfigMap Corefile:

 Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }

My /etc/resolv.conf is as follows:

cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

I have used https://github.com/mikkeloscar/go-dnsperf/blob/master/deployment-ubuntu.yaml to spin up clients on the 3 worker nodes and added 10 external names to query.

Case 1

Number of client replicas deployed : 50
RPS of each replica :100

CoreDNS was able to handle the requests, with memory consumption peaking at 220 MiB and going down slightly as time went by, stabilizing at around 157Mi.

Case 2

Number of client replicas deployed :90
RPS of each replica :100

CoreDNS gets OOMKilled constantly and is not able to handle all the requests from the clients.

Logs from one of the client replica:

2019/02/28 18:04:23 [ERROR] lookup kubernetes.io on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout
2019/02/28 18:04:23 [ERROR] lookup example.org on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout
2019/02/28 18:04:23 [ERROR] lookup google.com on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout

Looking at the memory consumption of CoreDNS, it seems to take up around 1.2 GiB of memory before getting OOMKilled and restarting.
I do not understand yet why it gets OOMKilled, since the Memory Limit is set at 1.66GiB.

CPU Usage:

Requests handled: CoreDNS is unable to keep up.

Cache Hitrate: Looks like we are hitting the cache as expected.

Cache Size:

I will be continuing my investigation further.

miekg · 2019-03-01T10:51:43Z

I will be continuing my investigation further.

Thank you!

rajansandeep · 2019-03-05T16:42:58Z

Continuing my investigation, on the same setup as #2593 (comment),

I have 1 instance of CoreDNS on the Master node.
RPS of each DNS client replica: 100
All queries to CoreDNS were external queries.
Initially, the number of DNS client replica was kept at 25 and was increased until I observed OOMKills in the CoreDNS pod (which was at 70 DNS client replicas)

Observations made during the test:

The maximum incoming request the CoreDNS pod could handle was ~21.5kpps at 25 DNS client replicas.
When the client replicas was increased beyond 25 replicas, CoreDNS continued to serve a maximum of ~21.5k kpps.
For every step that I increased the client replicas, the CoreDNS memory used kept increasing (possibly due to the number of Goroutines increasing)
This continued till I increased the client replicas to 70, after which CoreDNS started to get OOMKilled repeatedly and couldn't recover - This is because as the pod restarts, it is flooded with requests from all 70 replicas at the same time. CoreDNS handles requests better when the load is incremental rather than a burst of requests flooding it.
When I decreased the replicas to 60, CoreDNS was able to recover, serving the same ~21.5kpps at a considerable high memory.
After the recovery, the number of goroutines was constant at ~30k

Further test analysis:

At 25 replicas, when CoreDNS is able to process all requested queries, the memory is stable at around 200MiB, with goroutines at around ~4k
The go routines (server’s worker) goes up until being able to process that quantity
Througout the test, it seems there are always 5kpps processed in < 25ms
Extra queries time or process depends of the client QPS : if pressure is high, these queries are processed slower and as number of go routines increase, memory increases too.
When we reach the limit of 70 client replicas, then CoreDNS starts to crash, and go routine goes up to 75k - 85k and memory blows-up.

I have attached the metrics (Can be zoomed in for better readability) in the following order:

Total requests processed by CoreDNS
Goroutines
Memory
CPU
Query response time

Also attaching pprof:

pprof.coredns.samples.cpu.007.pb.gz
pprof.coredns.alloc_objects.alloc_space.inuse_objects.inuse_space.007.pb.gz

szuecs · 2019-03-05T17:41:12Z

Very interesting observations and data!

You reproduced the same we saw in our production outage.
The memory spikes are happening, if coredns instance crashed, similar to our outage and this is why we had to set the memory limit super high to make sure it survives the start (maybe caused by the first flood of requests ?).
Are the pprof files during the time of the spikes?

szuecs · 2019-11-06T22:35:44Z

@tommyulfsparre good catch, I could not disclose it before, because we had to create a fix for our infrastructure first. The underlying issue is more huge, than I expected.
I also wrote to security@golang.org at 2010-10-21. It was not considered, but I was asked to create a public issue, which is fine.

@miekg probably this is helpful, not DNS but this will crash any go http/proxy, that has unbounded growth of goroutines and this leads to memory spikes and oom kill if you run in a memory limited cgroup.

golang/go#35407

miekg · 2019-11-15T16:49:05Z

Thanks @szuecs for filing that. As go dns closely mimics how net/http does (these kinds) of things, I wonder what they will implement. Meanwhile we need to do something in the forward plugin, or more generic in miekg/dns

This introduces improvements to the CoreDNS configuration as suggested in coredns/coredns#2593 (comment) The change is to use multiple server directives to avoid expensive lookup from Kubernetes plugin in terms of reverse DNS lookup or expensive regex matching for `ingress.cluster.local` names. * Use the `ready` plugin for readinessProbe https://github.com/coredns/coredns/tree/master/plugin/ready Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

miekg · 2020-02-04T13:13:31Z

see #3640 for a potential fix. You need to manually fiddle with the new max_concurrent setting though

szuecs · 2020-02-05T15:03:38Z

@miekg thanks!
https://github.com/coredns/coredns/pull/3640/files#diff-e01203f369c90be1ca31ffd87006062fR53
and https://github.com/coredns/coredns/pull/3640/files#diff-e01203f369c90be1ca31ffd87006062fR87 seems not to be aligned on the name max_queries vs. max_concurrent. As far as I read the PR max_queries should be changed to max_concurrent.

chrisohaver · 2020-02-05T15:05:24Z

seems not to be aligned on the name

Thanks - we changed the name during the review, and i missed a place.

szuecs · 2020-03-05T11:16:16Z

Since we have now the possibility to fix the issue, can we have a release and close this issue?
:)

miekg · 2020-03-05T12:57:35Z

[ Quoting <notifications@github.com> in "Re: [coredns/coredns] request spike..." ]

Since we have now the possibility to fix the issue, can we have a release and close this issue? :)

yeah, was pondering going for 1.6.x or 1.7.0, but doing another 1.6.x won't hurt

szuecs · 2020-04-15T11:12:46Z

Since https://github.com/coredns/coredns/releases/tag/v1.6.9 we can set a concurrency limit

willzgli · 2021-03-18T07:32:37Z

any reverse lookup will give atrocious performance.

Actually, any reverse lookup of an IP outside the cluster would have bad performance. For reverse lookups inside the cluster, there would be no performance penalty.

@chrisohaver why? Could you please explain it for me? thanks

chrisohaver · 2021-03-18T13:47:58Z

@rootdeep, in the default Kubernetes deployment, all reverse lookups are intercepted by the kubernetes plugin which searches for the IP address in the Service and Endpoints indexed object cache. If no IPs match, then the request is passed to the next plugin, forward, which forwards the request upstream. Thus any reverse lookup of an IP outside the cluster results in extra work (e.g. parsing IP from qname, and two indexed object lookups) before the request is forwarded upstream. This could be avoided by more precisely defining the reverse zones for the kubernetes plugin in the Corefile so they match the actual Cluster IP and Pod IP subnets. However, it is not trivial to automatically determine those subnets during an install of Kubernetes, hence the default behavior.

erwbgy · 2021-04-07T18:47:32Z

@chrisohaver Could you provide an example optimal Corefile somewhere with placeholders for the cluster IP and pod subnets? Then we could substitute in the values for our clusters and have better performance.

chrisohaver · 2021-04-07T19:18:34Z

It would be as per the default kubernetes CoreDNS configuration, with the kubernetes plugin replaced with, for example ...

kubernetes cluster.local 8.9.10.in-addr.arpa 0.172.in-addr.arpa

... for a cluster with a ClusterIP subnet 10.9.8.0/24 and Pod subnet 172.0.0.0/16 . Note that there should not be a fallthrough in-addr.arpa ip6.arp in the kubernetes stanza.

erwbgy · 2021-04-07T19:27:48Z

Ok, so replace:

        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }

with:

        kubernetes cluster.local 8.9.10.in-addr.arpa 0.172.in-addr.arpa {
            pods insecure
        }

Perfect. Thank you @chrisohaver.

szuecs mentioned this issue Feb 21, 2019

High memory usage in k8s (was: plugin/cache: change to per RRset) #2469

Closed

rajansandeep self-assigned this Feb 21, 2019

miekg mentioned this issue Feb 25, 2019

fyi: Zalando post-mortem DNS outage #2554

Closed

miekg mentioned this issue Feb 25, 2019

inefficient corefile coredns/deployment#131

Closed

miekg added the kubernetes label Feb 25, 2019

miekg mentioned this issue Feb 28, 2019

Request spike use CPU and memory miekg/dns#916

Closed

miekg unassigned rajansandeep Nov 6, 2019

miekg mentioned this issue Nov 6, 2019

too many goroutines miekg/dns#997

Closed

szuecs mentioned this issue Nov 6, 2019

net: mass connection spike leads to unpredictable amount of memory usage golang/go#35407

Open

miekg changed the title ~~request spike creates memory spike in docker~~ request spike creates memory spike Nov 15, 2019

miekg added plugin/forward bug labels Nov 15, 2019

mikkeloscar mentioned this issue Nov 26, 2019

Improve CoreDNS configuration zalando-incubator/kubernetes-on-aws#2725

Merged

miekg mentioned this issue Jan 30, 2020

plugins/forward: Add max_concurrent option #3640

Merged

szuecs mentioned this issue Apr 15, 2020

DNS intermittent delays of 5s kubernetes/kubernetes#56903

Closed

szuecs closed this as completed Apr 15, 2020

mikkeloscar mentioned this issue Sep 28, 2021

added option to route traces traffic to local availability zone zalando-incubator/kubernetes-on-aws#4655

Merged

request spike creates memory spike #2593

request spike creates memory spike #2593

Comments

szuecs commented Feb 21, 2019

old setup

new setup

fturib commented Feb 21, 2019

szuecs commented Feb 22, 2019

miekg commented Feb 24, 2019

miekg commented Feb 24, 2019

miekg commented Feb 24, 2019

miekg commented Feb 25, 2019

miekg commented Feb 25, 2019

szuecs commented Feb 25, 2019

miekg commented Feb 25, 2019 via email

chrisohaver commented Feb 25, 2019

szuecs commented Feb 25, 2019 • edited

miekg commented Feb 25, 2019 via email

chrisohaver commented Feb 25, 2019

chrisohaver commented Feb 25, 2019

miekg commented Feb 25, 2019

rajansandeep commented Feb 25, 2019

chrisohaver commented Feb 25, 2019

szuecs commented Feb 25, 2019

chrisohaver commented Feb 25, 2019 • edited

miekg commented Feb 28, 2019 • edited

rajansandeep commented Feb 28, 2019

Setup

Case 1

Case 2

miekg commented Mar 1, 2019 via email

rajansandeep commented Mar 5, 2019 • edited

szuecs commented Mar 5, 2019

szuecs commented Nov 6, 2019 • edited

miekg commented Nov 15, 2019

miekg commented Feb 4, 2020

szuecs commented Feb 5, 2020

chrisohaver commented Feb 5, 2020

szuecs commented Mar 5, 2020

miekg commented Mar 5, 2020 via email

szuecs commented Apr 15, 2020

willzgli commented Mar 18, 2021

chrisohaver commented Mar 18, 2021

erwbgy commented Apr 7, 2021

chrisohaver commented Apr 7, 2021

erwbgy commented Apr 7, 2021

szuecs commented Feb 25, 2019 •

edited

chrisohaver commented Feb 25, 2019 •

edited

miekg commented Feb 28, 2019 •

edited

rajansandeep commented Mar 5, 2019 •

edited

szuecs commented Nov 6, 2019 •

edited