Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

Closed
gjtempleton opened this issue Jul 29, 2021 · 19 comments
Closed

[EKS] [request]: Automatic autoscaling for CoreDNS addon #1458

gjtempleton opened this issue Jul 29, 2021 · 19 comments
Labels
EKS Add-Ons EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service

Comments

@gjtempleton
Copy link

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Currently, the AWS-managed Add-on for CoreDNS simply deploys a deployment with 2 pods, with no autoscaling of the deployment as the cluster scales. This results into users encountering problems when their cluster scales to the point where these two pods are no longer able to cope with the level of NS traffic. Given that kOps deployed clusters provide this autoscaling by default, this is likely to confuse a number of users migrating from kOps to EKS.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Provide DNS in our clusters that scales as our clusters do by default.

Are you currently working around this issue?
By deleting the AWS managed deploy of CoreDNS and deploying our own with an autoscaler via a helm chart on cluster creation.

Additional context
N/A
Attachments
N/A

@gjtempleton gjtempleton added the Proposed Community submitted issue label Jul 29, 2021
@sanusatyadarshi
Copy link

sanusatyadarshi commented Jul 29, 2021

We do some configuration in the config map of coredns to avoid dns resolution in case of intra-cluster queries and do some caching.

While migrating to EKS and using EKS managed CoreDNS ADD-ON we would also like to configure these options, preferably using eksctl.

    {
        errors
        health {
          lameduck 5s
        }
        kubernetes cluster.local. in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        cache 30 {
            success 10000
            denial  5000
            serve_stale 8s
        }
        forward . /etc/resolv.conf {
            except cluster.local cluster.local.ap-south-1.compute.internal myorg.xyz.ap-south-1.compute.internal
        }
        prometheus :9153
        loop
        loadbalance
        reload
    }

@gjtempleton
Copy link
Author

@sanusatyadarshi There's an existing issue (#1275 ) for supporting customising the configuration of CoreDNS

@mikestef9 mikestef9 added EKS Add-Ons EKS Amazon Elastic Kubernetes Service and removed Proposed Community submitted issue labels Jul 29, 2021
@mikestef9 mikestef9 added this to Researching in containers-roadmap via automation Jul 29, 2021
@matti
Copy link

matti commented Mar 18, 2022

related #1679

@ryanisnan
Copy link

We ran into an issue today where CoreDNS was not auto-scaled properly, and was the cause of a partial cluster outage.

@doryer
Copy link

doryer commented Mar 13, 2023

Hey, any update on this one? advanced configuration for CoreDNS is supported that is very nice, but autoscaling with CPA for example is not supported as a flag to the advanced configuration like kOps is enabling.

@beatrizdemiguelperez
Copy link

In our project we have also had some incident and now we change it manually, but this should be temporary. please could you update us?

@bryantbiggs
Copy link
Member

In our project we have also had some incident and now we change it manually, but this should be temporary. please could you update us?

You do not need to scale it manually, you can utilize https://github.com/kubernetes-sigs/cluster-proportional-autoscaler which is commonly used to autoscale CoreDNS

@jwenz723
Copy link

It would be very nice to have cluster-proportional-autoscaler available as an EKS addon to solve this problem.

@sjastis sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023
@alex0z1
Copy link

alex0z1 commented Sep 1, 2023

Make sure to set resolve-conflicts to Preserve if you use HPA or CPA with CoreDNS addon in update-addon API call, if you use OVERWRITE and upgrade/downgrade your addon , it will reset your replica count to 2
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/eks/update-addon.html

it is resolve_conflicts_on_update = "PRESERVE" in AWS terrafrom provider
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_addon.html#example-update-add-on-usage-with-resolve_conflicts_on_update-and-preserve

@mdrobny
Copy link

mdrobny commented Oct 26, 2023

My team also experienced a significant outage of the cluster because CoreDNS was overloaded.
Autoscaling of such crucial application should be delivered by default or at least it should be possible to easily turn it on

@matti
Copy link

matti commented Oct 26, 2023

If you don't need kubernetes internal dns in your containers (some workloads don't) you can add a dns server as a sidecar. This way every container has it's own dns server which proxies queries forward to.

By doing this you can skip coredns completely - dns queries just work at all times.

I understand that this is not something that everybody can do, but just a thought.

I wrote a special dns proxy for this purpose https://github.com/matti/harderdns - It would not be too hard to also add coredns as upstream - this way harderdns would proxy the request to coredns and retry it without causing client to fail.

It's also possible to run this in the same container like this:

COPY --from=mattipaksula/harderdns:sha-90d790b /* /usr/local/bin
# allow binding :53
RUN setcap CAP_NET_BIND_SERVICE=+eip /usr/local/bin/harderdns

and then launch it on background in your entrypoint.

@JohnDzialo
Copy link

JohnDzialo commented Dec 8, 2023

Piggybacking on @jwenz723.

I have seen CoreDNS get overwhelmed with requests with default settings a number of times as well.

It seems Kubernetes has a solution documented using the cluster-proportional-autoscaler.

Kubernetes dns-autoscaler.

Could there be a configuration option for the EKS Plugin Add-On for CoreDNS where you could toggle on or off the autoscaler? It would be off by default.

Using CoreDNS suggestions here you could come up with a default setting and allow customers to override the defaults through the same configuration values.

@sjastis sjastis moved this from Researching to We're Working On It in containers-roadmap Dec 14, 2023
@mikestef9 mikestef9 moved this from We're Working On It to Coming Soon in containers-roadmap Apr 11, 2024
@sjastis
Copy link

sjastis commented May 15, 2024

Thank you all for patiently waiting.
Good News! We have enabled native support for CoreDNS autoscaling in EKS addon version. For more details on getting started experience, EKS version and CoreDNS version support matrix refer to the user guide [1].

[1] https://docs.aws.amazon.com/eks/latest/userguide/coredns-autoscaling.html

@alex0z1
Copy link

alex0z1 commented May 15, 2024

@sjastis from the doc you shared

This CoreDNS autoscaler continuously monitors the cluster state, including the number of nodes and CPU cores. Based on that information, the controller will dynamically adapt the number of replicas of the CoreDNS deployment in an EKS cluster. 

how many nodes or cores it needs to trigger coredns scale up ?

@sjastis
Copy link

sjastis commented May 15, 2024

Our default algo scales per "coresPerReplica":256,"nodesPerReplica":16. However, the idea is abstract this away from users and evolve this based on internal heuristics.

@sjastis sjastis closed this as completed May 15, 2024
containers-roadmap automation moved this from Coming Soon to Shipped May 15, 2024
@kenny-monster
Copy link

@sjastis I've gone ahead and tried the feature out in one of our smaller EKS clusters (EKS 1.28 with addon version v1.10.1-eksbuild.11).

My configuration values look like:

autoScaling:
  enabled: true
  minReplicas: 4
  maxReplicas: 10

I've set conflict resolution to use overwrite. The changes seem to be made successfully. After the changes have been applied, I don't see any new resources related to the autoscaler created. My replica count also doesn't scale up from 2. What should I be seeing in the cluster (if anything)?

Also, everything I can see related to this implementation suggests that a cluster proportional autoscaler is being used to do the autoscaling. As per coredns/coredns#5915 (comment), can we have the option to do the autoscaling using an HPA?

@M00nF1sh
Copy link

M00nF1sh commented May 22, 2024

@kenny-monster
You need to on latest eks platform version to be able to use this feature, the needed platform version is listed in https://docs.aws.amazon.com/eks/latest/userguide/coredns-autoscaling.html.
Currently only new created clusters have this platform version and we are working on upgrading existing eks cluster to be those platform version(shall complete in following weeks).

At EKS, the primary scale limit of coreDNS is the 1024 PPS limit from coreDNS to the Amazon-provided DNS servers, that's the reason we choose to horizontally scale coreDNS based on overall cluster size. We don't choose to use HPA at this time due to it's requirement of Metrics Server(which is not enabled in EKS clusters by default).

Our implementation is separate from the upstream CPA, and we have designed the api to hide the implementation details so based on the usage and feedbacks we'll evolve our implementation to consider more metrics in the future. (e.g. if Metrics Server is installed, leverage more metrics such as cpu/memory as feedback loop)

@kenny-monster
Copy link

Ah. Yep, looks like that's the reason. Sorry for not reading more carefully.

I'll be keeping an eye on this feature.

@mogopz
Copy link

mogopz commented May 26, 2024

@M00nF1sh FYI - 1.30 is missing from the compatibility table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Add-Ons EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service
Projects
Development

No branches or pull requests