Skip to content

aws-samples/k8s-cloudwatch-operator

Autoscaling Kubernetes deployments based on custom Prometheus metrics using CloudWatch Container Insights

This Git repository contains software artifacts that enable autoscaling microservices deployed to an Amazon EKS cluster or a self-managed Kubernetes cluster on AWS, based on custom Prometheus metrics collected from the workloads. It has a custom Kubernetes controller to manage Amazon CloudWatch metric alarms that watch custom metrics data and trigger scaling actions. AWS Lambda is used to autoscale the microservices.

Please refer to related blog post for details about how this works.

Architecture

The architecture used to implement this autoscaling solution comprises the following elements.

  • A Kubernetes Operator implemented using Kubernetes Java SDK. This operator packages a custom resource named K8sMetricAlarm defined by a CustomResourceDefinition, a custom controller implemented as a Deployment, which responds to events in the Kubernetes cluster pertaining to add/update/delete actions on the K8sMetricAlarm custom resource, and Role/RoleBinding resources to grant necessary permissions to the custom controller. The customer controller runs under the identity of a Kubernetes service account which is associated with an IAM role that has permissions to manages resources in CloudWatch.
  • CloudWatch agent for Prometheus metrics collection which is installed as a Deployment with a single replica in the Amazon EKS cluster.
  • Amazon CloudWatch metric alarms which are managed by the custom controller in conjunction with the K8sMetricAlarm custom resource.
  • Amazon SNS topic which is configured to receive notifications when a CloudWatch alarm breaches a specified threshold.
  • AWS Lambda function whose execution is triggered when a notification is sent to the Amazon SNS topic. The Lambda function acts as a Kubernetes client and performs the autoscaling operation on the target resource.
  • One or more microservices deployed to the cluster that are the target of autoscaling. These services have been instrumented with Prometheus client library to collect application-specific metrics and expose them over HTTP, to be read by the CloudWatch agent for Prometheus.

Autoscaling architecture

Build & Installation Instructions

First, build the Docker image for the custom controller per instructions here.

Next, build and deploy the Lambda Kubernetes client per the instructions here.

Execute the shell script createIRSA.sh after defining the variable CLUSTER_NAME with the name of the Kubernetes cluster. The script executes the following tasks:

  • Create an IAM role named EKS-CloudWatch-Role
  • Attach the AWS managed policy named CloudWatchFullAccess to this role
  • Create a Kubernetes service account cloudwatchalarm-controller in the kube-system namespace and associate it with the above IAM role using a Kubernetes annotation. The custom controller is configured to run under the identity of this service account

Then, deploy the operator to a Kubernetes cluster as follows:

kubectl apply -f operator.yaml

The custom controller is deployed with an image from a public repository. You may want to replace it with the image URL from your repository.

Make the following changes to the YAML manifest aws-auth-configmap.yaml

  • Replace WORKER_NODE_ROLE_ARN with the ARN of the IAM role assigned to the worker nodes in the EKS cluster.
  • Replace LAMBDA_ROLE_ARN with the ARN of the IAM role that was used in the Environment.Variables.ASSUMED_ROLE configuration parameter for the Lambda function. This role is mapped to a Kubernetes group lambda-client.

Update this ConfigMap as follows:

kubectl apply -f aws-auth-configmap.yaml

We will have to grant the lambda-client Kubernetes group permission to list K8sMetricAlarm custom resources as well as list/update Deployment resources. In order to do that, create a Kubernetes ClusterRole and ClusterRoleBinding as follows:

kubectl apply -f rbac-lambda-client.yaml

Sample definitions of the K8sMetricAlarm custom resource are provided in http-rate-alarm.yaml and sqs-alarm.yaml

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published