[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

gjtempleton · 2021-07-29T11:08:37Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Currently, the AWS-managed Add-on for CoreDNS simply deploys a deployment with 2 pods, with no autoscaling of the deployment as the cluster scales. This results into users encountering problems when their cluster scales to the point where these two pods are no longer able to cope with the level of NS traffic. Given that kOps deployed clusters provide this autoscaling by default, this is likely to confuse a number of users migrating from kOps to EKS.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Provide DNS in our clusters that scales as our clusters do by default.

Are you currently working around this issue?
By deleting the AWS managed deploy of CoreDNS and deploying our own with an autoscaler via a helm chart on cluster creation.

Additional context
N/A
Attachments
N/A

sanusatyadarshi · 2021-07-29T11:25:12Z

We do some configuration in the config map of coredns to avoid dns resolution in case of intra-cluster queries and do some caching.

While migrating to EKS and using EKS managed CoreDNS ADD-ON we would also like to configure these options, preferably using eksctl.

    {
        errors
        health {
          lameduck 5s
        }
        kubernetes cluster.local. in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        cache 30 {
            success 10000
            denial  5000
            serve_stale 8s
        }
        forward . /etc/resolv.conf {
            except cluster.local cluster.local.ap-south-1.compute.internal myorg.xyz.ap-south-1.compute.internal
        }
        prometheus :9153
        loop
        loadbalance
        reload
    }

gjtempleton · 2021-07-29T11:33:46Z

@sanusatyadarshi There's an existing issue (#1275 ) for supporting customising the configuration of CoreDNS

matti · 2022-03-18T14:28:56Z

related #1679

ryanisnan · 2023-02-21T23:26:41Z

We ran into an issue today where CoreDNS was not auto-scaled properly, and was the cause of a partial cluster outage.

doryer · 2023-03-13T11:10:31Z

Hey, any update on this one? advanced configuration for CoreDNS is supported that is very nice, but autoscaling with CPA for example is not supported as a flag to the advanced configuration like kOps is enabling.

beatrizdemiguelperez · 2023-05-05T06:59:25Z

In our project we have also had some incident and now we change it manually, but this should be temporary. please could you update us?

bryantbiggs · 2023-05-05T11:12:33Z

In our project we have also had some incident and now we change it manually, but this should be temporary. please could you update us?

You do not need to scale it manually, you can utilize https://github.com/kubernetes-sigs/cluster-proportional-autoscaler which is commonly used to autoscale CoreDNS

jwenz723 · 2023-08-21T23:40:47Z

It would be very nice to have cluster-proportional-autoscaler available as an EKS addon to solve this problem.

alex0z1 · 2023-09-01T00:10:17Z

Make sure to set resolve-conflicts to Preserve if you use HPA or CPA with CoreDNS addon in update-addon API call, if you use OVERWRITE and upgrade/downgrade your addon , it will reset your replica count to 2
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/eks/update-addon.html

it is resolve_conflicts_on_update = "PRESERVE" in AWS terrafrom provider
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_addon.html#example-update-add-on-usage-with-resolve_conflicts_on_update-and-preserve

mdrobny · 2023-10-26T09:53:41Z

My team also experienced a significant outage of the cluster because CoreDNS was overloaded.
Autoscaling of such crucial application should be delivered by default or at least it should be possible to easily turn it on

matti · 2023-10-26T10:42:30Z

If you don't need kubernetes internal dns in your containers (some workloads don't) you can add a dns server as a sidecar. This way every container has it's own dns server which proxies queries forward to.

By doing this you can skip coredns completely - dns queries just work at all times.

I understand that this is not something that everybody can do, but just a thought.

I wrote a special dns proxy for this purpose https://github.com/matti/harderdns - It would not be too hard to also add coredns as upstream - this way harderdns would proxy the request to coredns and retry it without causing client to fail.

It's also possible to run this in the same container like this:

COPY --from=mattipaksula/harderdns:sha-90d790b /* /usr/local/bin
# allow binding :53
RUN setcap CAP_NET_BIND_SERVICE=+eip /usr/local/bin/harderdns

and then launch it on background in your entrypoint.

JohnDzialo · 2023-12-08T17:35:52Z

Piggybacking on @jwenz723.

I have seen CoreDNS get overwhelmed with requests with default settings a number of times as well.

It seems Kubernetes has a solution documented using the cluster-proportional-autoscaler.

Kubernetes dns-autoscaler.

Could there be a configuration option for the EKS Plugin Add-On for CoreDNS where you could toggle on or off the autoscaler? It would be off by default.

Using CoreDNS suggestions here you could come up with a default setting and allow customers to override the defaults through the same configuration values.

sjastis · 2024-05-15T16:12:27Z

Thank you all for patiently waiting.
Good News! We have enabled native support for CoreDNS autoscaling in EKS addon version. For more details on getting started experience, EKS version and CoreDNS version support matrix refer to the user guide [1].

[1] https://docs.aws.amazon.com/eks/latest/userguide/coredns-autoscaling.html

alex0z1 · 2024-05-15T16:51:49Z

@sjastis from the doc you shared

This CoreDNS autoscaler continuously monitors the cluster state, including the number of nodes and CPU cores. Based on that information, the controller will dynamically adapt the number of replicas of the CoreDNS deployment in an EKS cluster.

how many nodes or cores it needs to trigger coredns scale up ?

sjastis · 2024-05-15T17:12:00Z

Our default algo scales per "coresPerReplica":256,"nodesPerReplica":16. However, the idea is abstract this away from users and evolve this based on internal heuristics.

kenny-monster · 2024-05-22T02:45:15Z

@sjastis I've gone ahead and tried the feature out in one of our smaller EKS clusters (EKS 1.28 with addon version v1.10.1-eksbuild.11).

My configuration values look like:

autoScaling:
  enabled: true
  minReplicas: 4
  maxReplicas: 10

I've set conflict resolution to use overwrite. The changes seem to be made successfully. After the changes have been applied, I don't see any new resources related to the autoscaler created. My replica count also doesn't scale up from 2. What should I be seeing in the cluster (if anything)?

Also, everything I can see related to this implementation suggests that a cluster proportional autoscaler is being used to do the autoscaling. As per coredns/coredns#5915 (comment), can we have the option to do the autoscaling using an HPA?

M00nF1sh · 2024-05-22T18:47:52Z

@kenny-monster
You need to on latest eks platform version to be able to use this feature, the needed platform version is listed in https://docs.aws.amazon.com/eks/latest/userguide/coredns-autoscaling.html.
Currently only new created clusters have this platform version and we are working on upgrading existing eks cluster to be those platform version(shall complete in following weeks).

At EKS, the primary scale limit of coreDNS is the 1024 PPS limit from coreDNS to the Amazon-provided DNS servers, that's the reason we choose to horizontally scale coreDNS based on overall cluster size. We don't choose to use HPA at this time due to it's requirement of Metrics Server(which is not enabled in EKS clusters by default).

Our implementation is separate from the upstream CPA, and we have designed the api to hide the implementation details so based on the usage and feedbacks we'll evolve our implementation to consider more metrics in the future. (e.g. if Metrics Server is installed, leverage more metrics such as cpu/memory as feedback loop)

kenny-monster · 2024-05-22T23:11:30Z

Ah. Yep, looks like that's the reason. Sorry for not reading more carefully.

I'll be keeping an eye on this feature.

mogopz · 2024-05-26T22:40:14Z

@M00nF1sh FYI - 1.30 is missing from the compatibility table.

gjtempleton added the Proposed Community submitted issue label Jul 29, 2021

mikestef9 added EKS Add-Ons EKS Amazon Elastic Kubernetes Service and removed Proposed Community submitted issue labels Jul 29, 2021

mikestef9 added this to Researching in containers-roadmap via automation Jul 29, 2021

matti mentioned this issue Mar 18, 2022

[Bug] coredns addon prevents scaledown because of missing PodDisruptionPolicy and local storage eksctl-io/eksctl#4969

Closed

kr3cj mentioned this issue Jul 3, 2023

[EKS] [request]: CoreDNS does not have pod disruption budget #1028

Closed

sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023

sjastis moved this from Researching to We're Working On It in containers-roadmap Dec 14, 2023

mikestef9 moved this from We're Working On It to Coming Soon in containers-roadmap Apr 11, 2024

sjastis closed this as completed May 15, 2024

containers-roadmap automation moved this from Coming Soon to Shipped May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

gjtempleton commented Jul 29, 2021

sanusatyadarshi commented Jul 29, 2021 •

edited

gjtempleton commented Jul 29, 2021

matti commented Mar 18, 2022 •

edited

ryanisnan commented Feb 21, 2023

doryer commented Mar 13, 2023

beatrizdemiguelperez commented May 5, 2023

bryantbiggs commented May 5, 2023

jwenz723 commented Aug 21, 2023

alex0z1 commented Sep 1, 2023 •

edited

mdrobny commented Oct 26, 2023

matti commented Oct 26, 2023

JohnDzialo commented Dec 8, 2023 •

edited

sjastis commented May 15, 2024

alex0z1 commented May 15, 2024

sjastis commented May 15, 2024

kenny-monster commented May 22, 2024

M00nF1sh commented May 22, 2024 •

edited

kenny-monster commented May 22, 2024

mogopz commented May 26, 2024

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

Comments

gjtempleton commented Jul 29, 2021

Community Note

sanusatyadarshi commented Jul 29, 2021 • edited

gjtempleton commented Jul 29, 2021

matti commented Mar 18, 2022 • edited

ryanisnan commented Feb 21, 2023

doryer commented Mar 13, 2023

beatrizdemiguelperez commented May 5, 2023

bryantbiggs commented May 5, 2023

jwenz723 commented Aug 21, 2023

alex0z1 commented Sep 1, 2023 • edited

mdrobny commented Oct 26, 2023

matti commented Oct 26, 2023

JohnDzialo commented Dec 8, 2023 • edited

sjastis commented May 15, 2024

alex0z1 commented May 15, 2024

sjastis commented May 15, 2024

kenny-monster commented May 22, 2024

M00nF1sh commented May 22, 2024 • edited

kenny-monster commented May 22, 2024

mogopz commented May 26, 2024

sanusatyadarshi commented Jul 29, 2021 •

edited

matti commented Mar 18, 2022 •

edited

alex0z1 commented Sep 1, 2023 •

edited

JohnDzialo commented Dec 8, 2023 •

edited

M00nF1sh commented May 22, 2024 •

edited