This is a project developed in Python CDKv2. It includes few Spark examples that create external hive tables on top of sample dataset stored in S3. These jobs will run with EMR on EKS.
The infrastructure deployment includes the following:
- A new S3 bucket to store sample data and job code
- An EKS cluster in a new VPC across 2 AZs
- A RDS Aurora database (MySQL engine) in the same VPC
- A small EMR on EC2 cluster in the same VPC
- 1 master & 1 core node (m5.xlarge)
- use master node to query the remote hive metastore database
- An EMR virtual cluster in the same VPC
- registered to
emr
namespace in EKS - EMR on EKS configuration is done
- Connect to RDS and initialize metastore schema via schematool
- registered to
- A standalone Hive metastore service (HMS) in EKS
- Helm Chart hive-metastore-chart is provided.
- run in the same
emr
namespace - thrift server is provided for client connections
- doesn't initialize/upgrade metastore schemas via schematool
- 1. Connect remote Hive metastore via JDBC
- 2. Connect Hive via EMR on EC2
- 3. Connect Hive via EMR on EKS
- 4. Connect Hive via hms sidecar
- 5. Hudi with hms sidecar
- 6. Hudi with Glue catalog
- 7. Run Hive SQL with EMR on EKS
- Job source code - deployment/app_code/job.
- HMS sidecar pod template - deployment/app_code/job/sidecar_hms_pod_template.yaml.
- Standalone hive-metastore docker image - Follow the README instruction to build your own. Don't forget to update your sidecar pod template or helm chart value file with your own ECR URL.
The provisioning takes about 30 minutes to complete. Two ways to deploy:
NOTE: HMS helm chart requires k8s >= 1.23, ie. EKS version must be 1.23+.
Install the folowing tools:
- AWS CLI. Configure the CLI by
aws configure
. - kubectl & jq
Can use AWS CloudShell that has included all the neccessary software for a quick start.
Region | Launch Template |
---|---|
--------------------------- | ----------------------- |
US East (N. Virginia) |
- To launch in a different AWS Region, check out the following customization section, or use the CDK deployment option.
You can customize the solution, for example deploy to a different AWS region:
export BUCKET_NAME_PREFIX=<my-bucket-name> # bucket where customized code will reside
export AWS_REGION=<your-region>
export SOLUTION_NAME=hive-emr-on-eks
export VERSION=v2.0.0 # version number for the customized code
./deployment/build-s3-dist.sh $BUCKET_NAME_PREFIX $SOLUTION_NAME $VERSION
# OPTIONAL: create the bucket where customized code will reside
aws s3 mb s3://$BUCKET_NAME_PREFIX-$AWS_REGION --region $AWS_REGION
# Upload deployment assets to the S3 bucket
aws s3 cp ./deployment/global-s3-assets/ s3://$BUCKET_NAME_PREFIX-$AWS_REGION/$SOLUTION_NAME/$VERSION/ --recursive --acl bucket-owner-full-control
aws s3 cp ./deployment/regional-s3-assets/ s3://$BUCKET_NAME_PREFIX-$AWS_REGION/$SOLUTION_NAME/$VERSION/ --recursive --acl bucket-owner-full-control
echo -e "\nIn web browser, paste the URL to launch the CFN template: https://console.aws.amazon.com/cloudformation/home?region=$AWS_REGION#/stacks/quickcreate?stackName=HiveEMRonEKS&templateURL=https://$BUCKET_NAME_PREFIX-$AWS_REGION.s3.amazonaws.com/$SOLUTION_NAME/$VERSION/HiveEMRonEKS.template\n"
Alternatively, deploy the infrastructure via CDK. It requires to pre-install the following tools as once-off tasks:
- Python 3.6+
- Nodejs 10.3.0+
- CDK toolkit
- Run the CDK bootstrap after the 'pip install' requirement step as below.
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
cdk deploy
Make sure AWS CLI
, kubectl
and jq
are installed.
One-off setup:
- Set environment variables in .bash_profile and connect to EKS cluster.
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/post-deployment.sh | bash
source ~/.bash_profile
Can use Cloud9 or Cloudshell, if you don’t want to install anything on your computer or change your bash_profile,
- [OPTIONAL] Build HMS docker image and replace the hive metastore docker image name in hive-metastore-chart/values.yaml by the new one if needed:
cd docker
export DOCKERHUB_USERNAME=<your_dockerhub_name_OR_ECR_URL>
docker build -t $DOCKERHUB_USERNAME/hive-metastore:3.0.0 .
docker push $DOCKERHUB_USERNAME/hive-metastore:3.0.0
- Copy sample data to your S3 bucket: NOTE: amazon-reviews-pds is not a public dataset anymore. Either skip this step or copy your own review data or use other public dataset you know of.
aws s3 cp s3://amazon-reviews-pds/parquet/product_category=Toys/ s3://$S3BUCKET/app_code/data/toy --recursive
hivejdbc.py:
import sys
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.config("spark.sql.warehouse.dir", sys.argv[1]+"/warehouse/" ) \
.enableHiveSupport() \
.getOrCreate()
spark.sql("SHOW DATABASES").show()
spark.sql("CREATE DATABASE IF NOT EXISTS `demo`")
spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS `demo`.`amazonreview`( `marketplace` string,`customer_id`string,`review_id` string,`product_id` string,`product_parent` string,`product_title` string,`star_rating` integer,`helpful_votes` integer,`total_votes` integer,`vine` string,`verified_purchase` string,`review_headline` string,`review_body` string,`review_date` date,`year` integer) STORED AS PARQUET LOCATION '"+sys.argv[1]+"/app_code/data/toy/'")
spark.sql("SELECT count(*) FROM demo.amazonreview").show()
spark.stop()
run the script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/submit-job-via-jdbc.sh | bash
OR
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name spark-hive-via-jdbc \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/hivejdbc.py",
"entryPointArguments":["s3://'$S3BUCKET'"],
"sparkSubmitParameters": "--conf spark.jars.packages=mysql:mysql-connector-java:8.0.28 --conf spark.driver.cores=1 --conf spark.executor.memory=4G --conf spark.driver.memory=1G --conf spark.executor.cores=2"}}' \
--configuration-overrides '{
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.dynamicAllocation.enabled":"false",
"spark.hadoop.javax.jdo.option.ConnectionDriverName": "com.mysql.cj.jdbc.Driver",
"spark.hadoop.javax.jdo.option.ConnectionUserName": "'$USER_NAME'",
"spark.hadoop.javax.jdo.option.ConnectionPassword": "'$PASSWORD'",
"spark.hadoop.javax.jdo.option.ConnectionURL": "jdbc:mysql://'$HOST_NAME':3306/'$DB_NAME'?createDatabaseIfNotExist=true"
}
}
],
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
hivethrift_emr.py:
from os import environ
import sys
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.config("spark.sql.warehouse.dir", sys.argv[1]+"/warehouse/" ) \
.config("hive.metastore.uris","thrift://"+sys.argv[2]+":9083") \
.enableHiveSupport() \
.getOrCreate()
spark.sql("SHOW DATABASES").show()
spark.sql("CREATE DATABASE IF NOT EXISTS `demo`")
spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS `demo`.`amazonreview2`( `marketplace` string,`customer_id`string,`review_id` string,`product_id` string,`product_parent` string,`product_title` string,`star_rating` integer,`helpful_votes` integer,`total_votes` integer,`vine` string,`verified_purchase` string,`review_headline` string,`review_body` string,`review_date` date,`year` integer) STORED AS PARQUET LOCATION '"+sys.argv[1]+"/app_code/data/toy/'")
spark.sql("SELECT count(*) FROM demo.amazonreview2").show()
spark.stop()
Run the script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/submit-job-via-thrift_emr.sh | bash
OR
#!/bin/bash
export STACK_NAME=HiveEMRonEKS
export EMR_MASTER_DNS_NAME=$(aws ec2 describe-instances --filter Name=tag:project,Values=HiveEMRonEKS Name=tag:aws:elasticmapreduce:instance-group-role,Values=MASTER --query Reservations[].Instances[].PrivateDnsName --output text | xargs)
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name spark-hive-via-thrift \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/hivethrift_emr.py",
"entryPointArguments":["s3://'$S3BUCKET'",'$EMR_MASTER_DNS_NAME],
"sparkSubmitParameters": "--conf spark.driver.cores=1 --conf spark.executor.memory=4G --conf spark.driver.memory=1G --conf spark.executor.cores=2"}}' \
--configuration-overrides '{
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
hivethrift_eks.py:
from os import environ
import sys
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.config("spark.sql.warehouse.dir", sys.argv[1]+"/warehouse/" ) \
.config("hive.metastore.uris","thrift://"+environ['HIVE_METASTORE_SERVICE_HOST']+":9083") \
.enableHiveSupport() \
.getOrCreate()
spark.sql("SHOW DATABASES").show()
spark.sql("CREATE DATABASE IF NOT EXISTS `demo`")
spark.sql("DROP TABLE IF EXISTS demo.amazonreview3")
spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS `demo`.`amazonreview3`( `marketplace` string,`customer_id`string,`review_id` string,`product_id` string,`product_parent` string,`product_title` string,`star_rating` integer,`helpful_votes` integer,`total_votes` integer,`vine` string,`verified_purchase` string,`review_headline` string,`review_body` string,`review_date` date,`year` integer) STORED AS PARQUET LOCATION '"+sys.argv[1]+"/app_code/data/toy/'")
Run the script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/submit-job-via-thrift_eks.sh | bash
OR
#!/bin/bash
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name spark-hive-via-thrift \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/hivethrift_eks.py",
"entryPointArguments":["s3://'$S3BUCKET'"],
"sparkSubmitParameters": "--conf spark.driver.cores=1 --conf spark.executor.memory=4G --conf spark.driver.memory=1G --conf spark.executor.cores=2"}}' \
--configuration-overrides '{
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
** Prerequisite **
NOTE: This repo's CFN/CDK template installs the followings by default.
-
- Kubernetes External Secrets controller - it fetchs hive metastore DB credentials from AWS Secrets Manager. This is a recommended best practice. Alternatively, without installing the controller, simply modify the HMS sidecar pod template with hard coded DB credentials.
# does it exist?
kubectl get pod -n kube-system
If the controller doesn't exist in your EKS cluster, replace the variable placeholder: YOUR_REGION
& YOUR_IAM_ROLE_ARN_TO_GET_SECRETS_FROM_SM
in the command, then run the installation. Refer to the IAM permissions used by CDK to create your IAM role.
helm repo add external-secrets https://external-secrets.github.io/kubernetes-external-secrets/
helm install external-secret external-secrets/kubernetes-external-secrets -n kube-system --set AWS_REGION=YOUR_REGION --set securityContext.fsGroup=65534 --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"='YOUR_IAM_ROLE_ARN_TO_GET_SECRETS_FROM_SM' --debug
-
- Two sidecar config maps should be created in EKS, which are pointing to the metastore-site.xml, core-site.xml templates to configure the standalone HMS. The sidecar termination script is copied from the EMR document, in order to workaround the well-known sidecar lifecyle issue in kubernetes.
kubectl get configmap sidecar-hms-conf-templates sidecar-terminate-script -n emr
If they don't exist, run the command to create the configs:
# get remote metastore RDS secret name
secret_name=$(aws secretsmanager list-secrets --query 'SecretList[?starts_with(Name,`RDSAuroraSecret`) == `true`].Name' --output text)
# download the config and apply to EKS
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/source/app_resources/hive-metastore-config.yaml | sed 's/{SECRET_MANAGER_NAME}/'$secret_name'/g' | kubectl apply -f -
-
- the HMS sidecar pod template is uploaded to an S3 bucket that your Spark job can access.
sidecar_hivethrift_eks.py:
import sys
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.config("spark.sql.warehouse.dir", sys.argv[1]+"/warehouse/" ) \
.enableHiveSupport() \
.getOrCreate()
spark.sql("SHOW DATABASES").show()
spark.sql("CREATE DATABASE IF NOT EXISTS `demo`")
spark.sql("DROP TABLE IF EXISTS demo.amazonreview4")
spark.sql("CREATE EXTERNAL TABLE `demo`.`amazonreview4`( `marketplace` string,`customer_id`string,`review_id` string,`product_id` string,`product_parent` string,`product_title` string,`star_rating` integer,`helpful_votes` integer,`total_votes` integer,`vine` string,`verified_purchase` string,`review_headline` string,`review_body` string,`review_date` date,`year` integer) STORED AS PARQUET LOCATION '"+sys.argv[1]+"/app_code/data/toy/'")
# read from files
sql_scripts=spark.read.text(sys.argv[1]+"/app_code/job/set-of-hive-queries.sql").collect()
cmd_str=' '.join([x[0] for x in sql_scripts]).split(';')
for query in cmd_str:
if (query != ""):
spark.sql(query).show()
spark.stop()
Assign the sidecar pod template to Spark Driver. Run the script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/sidecar_submit-job-via-thrift_eks.sh | bash
OR
#!/bin/bash
# test HMS sidecar on EKS
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name sidecar-hms \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/sidecar_hivethrift_eks.py",
"entryPointArguments":["s3://'$S3BUCKET'"],
"sparkSubmitParameters": "--conf spark.driver.cores=1 --conf spark.executor.memory=4G --conf spark.driver.memory=1G --conf spark.executor.cores=2"}}' \
--configuration-overrides '{
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.kubernetes.driver.podTemplateFile": "s3://'$S3BUCKET'/app_code/job/sidecar_hms_pod_template.yaml",
"spark.hive.metastore.uris": "thrift://localhost:9083"
}
}
],
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
- Sample job - HudiEMRonEKS.py
- Job submission script - sidecar_submit-hudi-hms.sh. The sidecar hms container inside your Spark driver will provide the connection to a remote hive metastore DB in RDS.
Note: the latest Hudi-spark3-bundle jar is needed to support the HMS hive sync mode. The jar will be included from EMR 6.5+.
Run the submission script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/sidecar_submit-hudi-hms.sh | bash
OR
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name hudi-test1 \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/HudiEMRonEKS.py",
"entryPointArguments":["s3://'$S3BUCKET'"],
"sparkSubmitParameters": "--jars https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3-bundle_2.12/0.9.0/hudi-spark3-bundle_2.12-0.9.0.jar --conf spark.executor.cores=1 --conf spark.executor.instances=2"}}' \
--configuration-overrides '{
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.sql.hive.convertMetastoreParquet": "false",
"spark.hive.metastore.uris": "thrift://localhost:9083",
"spark.kubernetes.driver.podTemplateFile": "s3://'$S3BUCKET'/app_code/job/sidecar_hms_pod_template.yaml"
}}
],
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
Note: make esure the database ** default ** exists in your Glue catalog
- Same Hudi job - HudiEMRonEKS.py
- Job submission with Glue catalog - submit-hudi-glue.sh
Run the submission script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/submit-hudi-glue.sh | bash
OR
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name hudi-test1 \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.3.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/HudiEMRonEKS.py",
"entryPointArguments":["s3://'$S3BUCKET'"],
"sparkSubmitParameters": "--jars https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3-bundle_2.12/0.9.0/hudi-spark3-bundle_2.12-0.9.0.jar --conf spark.executor.cores=1 --conf spark.executor.instances=2"}}' \
--configuration-overrides '{
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.sql.hive.convertMetastoreParquet": "false",
"spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}}
],
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
We can run Hive SQL script with multiple lines using the Spark execution engine. From EMR 6.7
, EMR on EKS now supports the ability to run Spark SQL, using a .sql
file as the entrypoint script in the StartJobRun API. Make sure your AWS CLI version is 2.7.31+ or 1.25.70+
.
See the full version of the sample Hive sql script. code snippet:
DROP DATABASE IF EXISTS hiveonspark CASCADE;
CREATE DATABASE hiveonspark;
USE hiveonspark;
--create hive managed table
CREATE TABLE IF NOT EXISTS testtable (`key` INT, `value` STRING) using hive;
LOAD DATA LOCAL INPATH '/usr/lib/spark/examples/src/main/resources/kv1.txt' OVERWRITE INTO TABLE testtable;
SELECT * FROM testtable WHERE key=238;
Run the submission script:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/job/submit-sparksql.sh | bash
OR run the following:
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name sparksql-test \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.8.0-latest \
--job-driver '{
"sparkSqlJobDriver": {
"entryPoint": "s3://'$S3BUCKET'/app_code/job/set-of-hive-queries.sql",
"sparkSqlParameters": "-hivevar S3Bucket='$S3BUCKET' -hivevar Key_ID=238"}}' \
--configuration-overrides '{
"applicationConfiguration": [
{
"classification": "spark-defaults",
"properties": {
"spark.sql.warehouse.dir": "s3://'$S3BUCKET'/warehouse/",
"spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
}
],
"monitoringConfiguration": {
"s3MonitoringConfiguration": {"logUri": "s3://'$S3BUCKET'/elasticmapreduce/emr-containers"}}}'
In the spark-defaults
config, we use Glue catalog as the hive metastore for a serverless design, so the table can be queried in Athena.
Alternatively, we can replace the config by a standalone HMS setting "spark.hive.metastore.uris": "thrift://hive-metastore:9083"
which is running as a k8s pod in the namespace emr
. It is pointing to the Remote RDS hive metastore database created by this project.
NOTE: to directly submit Hive scripts to EMR on EKS, replace the following 2 attributes in the job submission script:
- change from
sparkSubmitJobDriver
tosparkSqlJobDriver
- change from
sparkSubmitParameters
tosparkSqlParameters
kubectl get po -n emr
kubectl logs -n emr -c spark-kubernetes-driver <YOUR-DRIVER-POD-NAME>
Will see the count result in the driver log: Total records on S3:
+--------+
|count(1)|
+--------+
| 4981601|
+--------+
- Hive metastore login info:
echo -e "\n host: $HOST_NAME\n DB: $DB_NAME\n passowrd: $PASSWORD\n username: $USER_NAME\n"
- Find EMR master node EC2 instance:
aws ec2 describe-instances --filter Name=tag:project,Values=$stack_name Name=tag:aws:elasticmapreduce:instance-group-role,Values=MASTER --query Reservations[].Instances[].InstanceId
- Go to EC2 console, connect the instance via Session Manager without a SSH key.
- Check the remote hive metastore in mysqlDB:
mysql -u admin -P 3306 -p -h <YOUR_HOST_NAME>
Enter password:<YOUR_PASSWORD>
# Query in the metastore
MySQL[(none)]> Use HiveEMRonEKS;
MySQL[HiveEMRonEKS]> select * from DBS;
MySQL[HiveEMRonEKS]> select * from TBLS;
- Query Hive tables:
sudo su
hive
hive> use demo;
hive> select count(*) from amazonreview2;
+--------+
Launching Job 1 out of 1
........
OK
4981601
Time taken: 23.742 seconds, Fetched: 1 row(s)
+--------+
s3://$S3BUCKET/elasticmapreduce/emr-containers/$VIRTUAL_CLUSTER_ID/jobs/<YOUR_JOB_ID>/containers/spark-<YOUR-JOB-ID>-driver/
kubectl get pod -n emr
list running Spark jobskubectl delete pod --all -n emr
delete all Spark jobskubectl logs -n emr -c spark-kubernetes-driver YOUR-DRIVER-POD-NAME
job logs in realtimekubectl get node --label-columns=eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
check EKS compute capacity types and AZ distribution.
Run the clean-up script with:
curl https://raw.githubusercontent.com/aws-samples/hive-emr-on-eks/main/deployment/app_code/delete_all.sh | bash
Go to the CloudFormation console, manually delete the remaining resources if needed.