EMR on EKS example update (#675)

* EMR on EKS example update with multiple node groups * EMR on EKS example update with multiple node groups * EMR on EKS example with updated config * EMR on EKS example with updated config * removed redundant cluster version from locals * New add-on for cluster proportional autoscaler * Added condition for coredns helm config
aws-ia · Jun 24, 2022 · e80009e · e80009e
1 parent fdf0da7
commit e80009e
Show file tree

Hide file tree

Showing 45 changed files with 101,072 additions and 617 deletions.
diff --git a/docs/add-ons/cluster-proportional-autoscaler.md b/docs/add-ons/cluster-proportional-autoscaler.md
@@ -0,0 +1,81 @@
+# Horizontal cluster-proportional-autoscaler container
+
+Horizontal cluster-proportional-autoscaler watches over the number of schedulable nodes and cores of the cluster and resizes the number of replicas for the required resource. This functionality may be desirable for applications that need to be autoscaled with the size of the cluster, such as CoreDNS and other services that scale with the number of nodes/pods in the cluster.
+
+The [cluster-proportional-autoscaler](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler) helps to scale the applications using deployment or replicationcontroller or replicaset. This is an alternative solution to Horizontal Pod Autoscaling.
+It is typically installed as a **Deployment** in your cluster.
+
+## Usage
+
+This add-on requires both `enable_coredns_autoscaler` and `coredns_autoscaler_helm_config` as mandatory fields.
+
+[cluster-proportional-autoscaler](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/cluster-proportional-autoscaler) can be deployed by enabling the add-on via the following.
+
+The example shows how to enable `cluster-proportional-autoscaler` for `CoreDNS Deployment`. CoreDNS deployment is not configured with HPA. So, this add-on helps to scale CoreDNS Add-on according to the size of the nodes and cores.
+
+This Add-on can be used to scale any application with Deployment objects.
+
+```hcl
+enable_coredns_autoscaler = true
+coredns_autoscaler_helm_config = {
+  name        = "cluster-proportional-autoscaler"
+  chart       = "cluster-proportional-autoscaler"
+  repository  = "https://kubernetes-sigs.github.io/cluster-proportional-autoscaler"
+  version     = "1.0.0"
+  namespace   = "kube-system"
+  timeout     = "300"
+  values = [
+    <<-EOT
+    nameOverride: kube-dns-autoscaler
+
+     # Formula for controlling the replicas. Adjust according to your needs
+     #  replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) )
+    config:
+      linear:
+      coresPerReplica: 256
+      nodesPerReplica: 16
+      min: 1
+      max: 100
+      preventSinglePointFailure: true
+      includeUnschedulableNodes: true
+
+    # Target to scale. In format: deployment/*, replicationcontroller/* or replicaset/* (not case sensitive).
+    options:
+      target: deployment/coredns # Notice the target as `deployment/coredns`
+
+    serviceAccount:
+      create: true
+      name: kube-dns-autoscaler
+
+    podSecurityContext:
+      seccompProfile:
+      type: RuntimeDefault
+      supplementalGroups: [ 65534 ]
+      fsGroup: 65534
+
+    resources:
+      limits:
+        cpu: 100m
+        memory: 128Mi
+      requests:
+        cpu: 100m
+        memory: 128Mi
+
+    tolerations:
+      - key: "CriticalAddonsOnly"
+        operator: "Exists"
+        description = "Cluster Proportional Autoscaler for CoreDNS Service"
+    EOT
+  ]
+}
+```
+
+### GitOps Configuration
+
+The following properties are made available for use when managing the add-on via GitOps.
+
+```
+corednsAutoscaler = {
+  enable = true
+}
+```
diff --git a/docs/modules/emr-on-eks.md b/docs/modules/emr-on-eks.md
@@ -13,6 +13,7 @@ This module deploys the necessary resources to run EMR Spark Jobs on EKS Cluster
 
 [EMR on EKS](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/emr-on-eks) can be deployed by enabling the module via the following.
 
+Checkout this [Blog](https://aws.amazon.com/blogs/mt/monitoring-amazon-emr-on-eks-with-amazon-managed-prometheus-and-amazon-managed-grafana/) to setup Observability for EMR on EKS Spark Jobs
 
 ```hcl
     #---------------------------------------

diff --git a/examples/analytics/emr-on-eks/README.md b/examples/analytics/emr-on-eks/README.md
@@ -2,9 +2,10 @@
 
 This example deploys the following resources
 
-- Creates EKS Cluster Control plane with public endpoint (for demo purpose only) with one managed node group
-- Deploys Metrics server, Cluster Autoscaler, Prometheus and EMR on EKS Addon
-- Creates Amazon managed Prometheus and configures Prometheus addon to remote write metrics to AMP
+- Creates EKS Cluster Control plane with public endpoint (for demo purpose only) with two managed node groups
+- Deploys Metrics server with HA, Cluster Autoscaler, Prometheus, VPA, CoreDNS Autoscaler
+- EMR on EKS Teams and EMR Virtual cluster for `emr-data-team-a`
+- Creates Amazon managed Prometheus Endpoint and configures Prometheus Server addon with remote write configuration to Amazon Managed Prometheus
 
 ## Prerequisites:
 
@@ -14,9 +15,9 @@ Ensure that you have installed the following tools on your machine.
 2. [kubectl](https://Kubernetes.io/docs/tasks/tools/)
 3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
 
-_Note: Currently Amazon Prometheus supported only in selected regions. Please see this [userguide](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) for supported regions._
+_Note: Currently Amazon Managed Prometheus supported only in selected regions. Please see this [userguide](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) for supported regions._
 
-## Step 1: Deploy EKS Clusters with EMR on EKS feature
+## Deploy EKS Clusters with EMR on EKS feature
 
 Clone the repository
 
@@ -46,22 +47,22 @@ terraform apply
 
 Enter `yes` to apply.
 
-## Step 3: Verify the resources
+## Verify the resources
 
 Let’s verify the resources created by Step 4.
 
 Verify the Amazon EKS Cluster and Amazon Managed service for Prometheus
 
 ```sh
-aws eks describe-cluster --name aws001-preprod-test-eks
+aws eks describe-cluster --name emr-on-eks
 
-aws amp list-workspaces --alias amp-ws-aws001-preprod-test-eks
+aws amp list-workspaces --alias amp-ws-emr-on-eks
 ```
 
 ```sh
 Verify EMR on EKS Namespaces emr-data-team-a and emr-data-team-b and Pod status for Prometheus, Vertical Pod Autoscaler, Metrics Server and Cluster Autoscaler.
 
-aws eks --region <ENTER_YOUR_REGION> update-kubeconfig --name aws001-preprod-test-eks # Creates k8s config file to authenticate with EKS Cluster
+aws eks --region <ENTER_YOUR_REGION> update-kubeconfig --name emr-on-eks # Creates k8s config file to authenticate with EKS Cluster
 
 kubectl get nodes # Output shows the EKS Managed Node group nodes
 
@@ -76,41 +77,20 @@ kubectl get pods --namespace=kube-system | grep  metrics-server # Output shows M
 kubectl get pods --namespace=kube-system | grep  cluster-autoscaler # Output shows Cluster Autoscaler pod
 ```
 
-## Step 4: Create EMR Virtual Cluster for EKS
-
-We are using AWS CLI to create EMR on EKS Clusters. You can leverage Terraform Module once the [EMR on EKS TF provider](https://github.com/hashicorp/terraform-provider-aws/pull/20003) is available.
-
-```sh
-vi examples/analytics/emr-on-eks/examples/create_emr_virtual_cluster_for_eks.sh
-```
-
-Update the following variables.
-
-Extract the cluster_name as **EKS_CLUSTER_ID** from Terraform Outputs (**Step 1**)
-**EMR_ON_EKS_NAMESPACE** is same as what you passed from **Step 1**
-
-    EKS_CLUSTER_ID='aws001-preprod-test-eks'
-    EMR_ON_EKS_NAMESPACE='emr-data-team-a'
-
-Execute the shell script to create virtual cluster
-
-```sh
-cd examples/analytics/emr-on-eks/examples/
-./create_emr_virtual_cluster_for_eks.sh
-```
-
-## Step 5: Execute Spark job on EMR Virtual Cluster
+## Execute Spark job on EMR Virtual Cluster
 
 Execute the Spark job using the below shell script.
 
-This script requires two input parameters.
+This script requires three input parameters which can be extracted from `terraform apply` output values
 
-    EMR_VIRTUAL_CLUSTER_ID=$1  # EMR Cluster ID e.g., aws001-preprod-test-eks-emr-data-team-a
-    S3_BUCKET=$2               # S3 bucket for storing the scripts and spark output data e.g., s3://<bucket-name>
+    EMR_VIRTUAL_CLUSTER_ID=$1     # Terraform output variable is emrcontainers_virtual_cluster_id
+    S3_BUCKET=$2                  # This script requires s3 bucket as input parameter e.g., s3://<bucket-name>
+    EMR_JOB_EXECUTION_ROLE_ARN=$3 # Terraform output variable is emr_on_eks_role_arn
 
 ```sh
 cd examples/analytics/emr-on-eks/examples/spark-execute/
-./5-spark-job-with-AMP-AMG.sh aws001-preprod-test-eks-emr-data-team-a <ENTER_S3_BUCKET_NAME>
+
+./spark-job-with-AMP-AMG.sh "<ENTER_EMR_VIRTUAL_CLUSTER_ID>" "s3://<ENTER-YOUR-BUCKET-NAME>" "<EMR_JOB_EXECUTION_ROLE_ARN>"
 ```
 
 Verify the job execution
@@ -119,7 +99,7 @@ Verify the job execution
 kubectl get pods --namespace=emr-data-team-a -w
 ```
 
-## Step 6: Cleanup
+## Cleanup
 
 ### Delete EMR Virtual Cluster for EKS
 
@@ -150,13 +130,11 @@ terraform destroy -auto-approve
 
 Add these to `applicationConfiguration`.`properties`
 
-          "spark.kubernetes.node.selector.topology.kubernetes.io/zone":"<availability zone>",
-          "spark.kubernetes.node.selector.node.kubernetes.io/instance-type":"<instance type>"
+    "spark.kubernetes.node.selector.topology.kubernetes.io/zone":"<availability zone>",
+    "spark.kubernetes.node.selector.node.kubernetes.io/instance-type":"<instance type>"
 
 ### JDBC example
-
 In this example we are connecting to mysql db, so mariadb-connector-java.jar needs to be passed with --jars option
-https://aws.github.io/aws-emr-containers-best-practices/metastore-integrations/docs/hive-metastore/
 
       "sparkSubmitJobDriver": {
       "entryPoint": "s3://<s3 prefix>/hivejdbc.py",
@@ -172,7 +150,6 @@ https://aws.github.io/aws-emr-containers-best-practices/metastore-integrations/d
     }
 
 ### Storage
-
 Spark supports using volumes to spill data during shuffles and other operations.
 To use a volume as local storage, the volume’s name should starts with spark-local-dir-,
 for example:
@@ -190,15 +167,14 @@ Specifically, you can use persistent volume claims if the jobs require large shu
       spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
 
 ## Debugging
-
 ##### Issue1: Error: local-exec provisioner error
 
 ```sh
 Error: local-exec provisioner error \
 with module.eks-blueprints.module.emr_on_eks["data_team_b"].null_resource.update_trust_policy,\
  on .terraform/modules/eks-blueprints/modules/emr-on-eks/main.tf line 105, in resource "null_resource" \
  "update_trust_policy":│ 105: provisioner "local-exec" {│ │ Error running command 'set -e│ │ aws emr-containers update-role-trust-policy \
- │ --cluster-name aws001-preprod-test-eks \│ --namespace emr-data-team-b \│ --role-name aws001-preprod-test-eks-emr-eks-data-team-b
+ │ --cluster-name emr-on-eks \│ --namespace emr-data-team-b \│ --role-name emr-on-eks-emr-eks-data-team-b
 ```
 
 ##### Solution :

diff --git a/examples/analytics/emr-on-eks/examples/create_emr_virtual_cluster_for_eks.sh b/examples/analytics/emr-on-eks/examples/create_emr_virtual_cluster_for_eks.sh
diff --git a/examples/analytics/emr-on-eks/examples/delete_emr_virtual_cluster_for_eks.sh b/examples/analytics/emr-on-eks/examples/delete_emr_virtual_cluster_for_eks.sh
diff --git a/examples/analytics/emr-on-eks/examples/spark-execute/1-basic-spark-job.sh b/examples/analytics/emr-on-eks/examples/spark-execute/1-basic-spark-job.sh
diff --git a/examples/analytics/emr-on-eks/examples/spark-execute/2-spark-job-with-logs.sh b/examples/analytics/emr-on-eks/examples/spark-execute/2-spark-job-with-logs.sh