Skip to content

Commit

Permalink
EMR on EKS example update (#675)
Browse files Browse the repository at this point in the history
* EMR on EKS example update with multiple node groups

* EMR on EKS example update with multiple node groups

* EMR on EKS example with updated config

* EMR on EKS example with updated config

* removed redundant cluster version from locals

* New add-on for cluster proportional autoscaler

* Added condition for coredns helm config
  • Loading branch information
vara-bonthu committed Jun 24, 2022
1 parent fdf0da7 commit e80009e
Show file tree
Hide file tree
Showing 45 changed files with 101,072 additions and 617 deletions.
81 changes: 81 additions & 0 deletions docs/add-ons/cluster-proportional-autoscaler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Horizontal cluster-proportional-autoscaler container

Horizontal cluster-proportional-autoscaler watches over the number of schedulable nodes and cores of the cluster and resizes the number of replicas for the required resource. This functionality may be desirable for applications that need to be autoscaled with the size of the cluster, such as CoreDNS and other services that scale with the number of nodes/pods in the cluster.

The [cluster-proportional-autoscaler](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler) helps to scale the applications using deployment or replicationcontroller or replicaset. This is an alternative solution to Horizontal Pod Autoscaling.
It is typically installed as a **Deployment** in your cluster.

## Usage

This add-on requires both `enable_coredns_autoscaler` and `coredns_autoscaler_helm_config` as mandatory fields.

[cluster-proportional-autoscaler](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/kubernetes-addons/cluster-proportional-autoscaler) can be deployed by enabling the add-on via the following.

The example shows how to enable `cluster-proportional-autoscaler` for `CoreDNS Deployment`. CoreDNS deployment is not configured with HPA. So, this add-on helps to scale CoreDNS Add-on according to the size of the nodes and cores.

This Add-on can be used to scale any application with Deployment objects.

```hcl
enable_coredns_autoscaler = true
coredns_autoscaler_helm_config = {
name = "cluster-proportional-autoscaler"
chart = "cluster-proportional-autoscaler"
repository = "https://kubernetes-sigs.github.io/cluster-proportional-autoscaler"
version = "1.0.0"
namespace = "kube-system"
timeout = "300"
values = [
<<-EOT
nameOverride: kube-dns-autoscaler
# Formula for controlling the replicas. Adjust according to your needs
# replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) )
config:
linear:
coresPerReplica: 256
nodesPerReplica: 16
min: 1
max: 100
preventSinglePointFailure: true
includeUnschedulableNodes: true
# Target to scale. In format: deployment/*, replicationcontroller/* or replicaset/* (not case sensitive).
options:
target: deployment/coredns # Notice the target as `deployment/coredns`
serviceAccount:
create: true
name: kube-dns-autoscaler
podSecurityContext:
seccompProfile:
type: RuntimeDefault
supplementalGroups: [ 65534 ]
fsGroup: 65534
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
description = "Cluster Proportional Autoscaler for CoreDNS Service"
EOT
]
}
```

### GitOps Configuration

The following properties are made available for use when managing the add-on via GitOps.

```
corednsAutoscaler = {
enable = true
}
```
1 change: 1 addition & 0 deletions docs/modules/emr-on-eks.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ This module deploys the necessary resources to run EMR Spark Jobs on EKS Cluster

[EMR on EKS](https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/modules/emr-on-eks) can be deployed by enabling the module via the following.

Checkout this [Blog](https://aws.amazon.com/blogs/mt/monitoring-amazon-emr-on-eks-with-amazon-managed-prometheus-and-amazon-managed-grafana/) to setup Observability for EMR on EKS Spark Jobs

```hcl
#---------------------------------------
Expand Down
66 changes: 21 additions & 45 deletions examples/analytics/emr-on-eks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

This example deploys the following resources

- Creates EKS Cluster Control plane with public endpoint (for demo purpose only) with one managed node group
- Deploys Metrics server, Cluster Autoscaler, Prometheus and EMR on EKS Addon
- Creates Amazon managed Prometheus and configures Prometheus addon to remote write metrics to AMP
- Creates EKS Cluster Control plane with public endpoint (for demo purpose only) with two managed node groups
- Deploys Metrics server with HA, Cluster Autoscaler, Prometheus, VPA, CoreDNS Autoscaler
- EMR on EKS Teams and EMR Virtual cluster for `emr-data-team-a`
- Creates Amazon managed Prometheus Endpoint and configures Prometheus Server addon with remote write configuration to Amazon Managed Prometheus

## Prerequisites:

Expand All @@ -14,9 +15,9 @@ Ensure that you have installed the following tools on your machine.
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/)
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)

_Note: Currently Amazon Prometheus supported only in selected regions. Please see this [userguide](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) for supported regions._
_Note: Currently Amazon Managed Prometheus supported only in selected regions. Please see this [userguide](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html) for supported regions._

## Step 1: Deploy EKS Clusters with EMR on EKS feature
## Deploy EKS Clusters with EMR on EKS feature

Clone the repository

Expand Down Expand Up @@ -46,22 +47,22 @@ terraform apply

Enter `yes` to apply.

## Step 3: Verify the resources
## Verify the resources

Let’s verify the resources created by Step 4.

Verify the Amazon EKS Cluster and Amazon Managed service for Prometheus

```sh
aws eks describe-cluster --name aws001-preprod-test-eks
aws eks describe-cluster --name emr-on-eks

aws amp list-workspaces --alias amp-ws-aws001-preprod-test-eks
aws amp list-workspaces --alias amp-ws-emr-on-eks
```

```sh
Verify EMR on EKS Namespaces emr-data-team-a and emr-data-team-b and Pod status for Prometheus, Vertical Pod Autoscaler, Metrics Server and Cluster Autoscaler.

aws eks --region <ENTER_YOUR_REGION> update-kubeconfig --name aws001-preprod-test-eks # Creates k8s config file to authenticate with EKS Cluster
aws eks --region <ENTER_YOUR_REGION> update-kubeconfig --name emr-on-eks # Creates k8s config file to authenticate with EKS Cluster

kubectl get nodes # Output shows the EKS Managed Node group nodes

Expand All @@ -76,41 +77,20 @@ kubectl get pods --namespace=kube-system | grep metrics-server # Output shows M
kubectl get pods --namespace=kube-system | grep cluster-autoscaler # Output shows Cluster Autoscaler pod
```

## Step 4: Create EMR Virtual Cluster for EKS

We are using AWS CLI to create EMR on EKS Clusters. You can leverage Terraform Module once the [EMR on EKS TF provider](https://github.com/hashicorp/terraform-provider-aws/pull/20003) is available.

```sh
vi examples/analytics/emr-on-eks/examples/create_emr_virtual_cluster_for_eks.sh
```

Update the following variables.

Extract the cluster_name as **EKS_CLUSTER_ID** from Terraform Outputs (**Step 1**)
**EMR_ON_EKS_NAMESPACE** is same as what you passed from **Step 1**

EKS_CLUSTER_ID='aws001-preprod-test-eks'
EMR_ON_EKS_NAMESPACE='emr-data-team-a'

Execute the shell script to create virtual cluster

```sh
cd examples/analytics/emr-on-eks/examples/
./create_emr_virtual_cluster_for_eks.sh
```

## Step 5: Execute Spark job on EMR Virtual Cluster
## Execute Spark job on EMR Virtual Cluster

Execute the Spark job using the below shell script.

This script requires two input parameters.
This script requires three input parameters which can be extracted from `terraform apply` output values

EMR_VIRTUAL_CLUSTER_ID=$1 # EMR Cluster ID e.g., aws001-preprod-test-eks-emr-data-team-a
S3_BUCKET=$2 # S3 bucket for storing the scripts and spark output data e.g., s3://<bucket-name>
EMR_VIRTUAL_CLUSTER_ID=$1 # Terraform output variable is emrcontainers_virtual_cluster_id
S3_BUCKET=$2 # This script requires s3 bucket as input parameter e.g., s3://<bucket-name>
EMR_JOB_EXECUTION_ROLE_ARN=$3 # Terraform output variable is emr_on_eks_role_arn

```sh
cd examples/analytics/emr-on-eks/examples/spark-execute/
./5-spark-job-with-AMP-AMG.sh aws001-preprod-test-eks-emr-data-team-a <ENTER_S3_BUCKET_NAME>

./spark-job-with-AMP-AMG.sh "<ENTER_EMR_VIRTUAL_CLUSTER_ID>" "s3://<ENTER-YOUR-BUCKET-NAME>" "<EMR_JOB_EXECUTION_ROLE_ARN>"
```

Verify the job execution
Expand All @@ -119,7 +99,7 @@ Verify the job execution
kubectl get pods --namespace=emr-data-team-a -w
```

## Step 6: Cleanup
## Cleanup

### Delete EMR Virtual Cluster for EKS

Expand Down Expand Up @@ -150,13 +130,11 @@ terraform destroy -auto-approve

Add these to `applicationConfiguration`.`properties`

"spark.kubernetes.node.selector.topology.kubernetes.io/zone":"<availability zone>",
"spark.kubernetes.node.selector.node.kubernetes.io/instance-type":"<instance type>"
"spark.kubernetes.node.selector.topology.kubernetes.io/zone":"<availability zone>",
"spark.kubernetes.node.selector.node.kubernetes.io/instance-type":"<instance type>"

### JDBC example

In this example we are connecting to mysql db, so mariadb-connector-java.jar needs to be passed with --jars option
https://aws.github.io/aws-emr-containers-best-practices/metastore-integrations/docs/hive-metastore/

"sparkSubmitJobDriver": {
"entryPoint": "s3://<s3 prefix>/hivejdbc.py",
Expand All @@ -172,7 +150,6 @@ https://aws.github.io/aws-emr-containers-best-practices/metastore-integrations/d
}

### Storage

Spark supports using volumes to spill data during shuffles and other operations.
To use a volume as local storage, the volume’s name should starts with spark-local-dir-,
for example:
Expand All @@ -190,15 +167,14 @@ Specifically, you can use persistent volume claims if the jobs require large shu
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false

## Debugging

##### Issue1: Error: local-exec provisioner error

```sh
Error: local-exec provisioner error \
with module.eks-blueprints.module.emr_on_eks["data_team_b"].null_resource.update_trust_policy,\
on .terraform/modules/eks-blueprints/modules/emr-on-eks/main.tf line 105, in resource "null_resource" \
"update_trust_policy":│ 105: provisioner "local-exec" {│ │ Error running command 'set -e│ │ aws emr-containers update-role-trust-policy \
│ --cluster-name aws001-preprod-test-eks \│ --namespace emr-data-team-b \│ --role-name aws001-preprod-test-eks-emr-eks-data-team-b
│ --cluster-name emr-on-eks \│ --namespace emr-data-team-b \│ --role-name emr-on-eks-emr-eks-data-team-b
```
##### Solution :
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit e80009e

Please sign in to comment.