How to set custom EMR classification #193

OElesin · 2020-04-19T13:22:40Z

Is your idea related to a problem? Please describe.
I already made use of the library and it was super helpful. I tried setting custom EMR classifications so as to make use of EMR 6.0.0, but I could not set custom classification. Is it possible to do this currently, or it has to be a feature request?

igorborgest · 2020-04-20T16:34:30Z

Hi @OElesin! Thanks for reaching out, really relevant topic.

Currently there is no support for "container-executor", "docker" and custom classifications.

But we will address all for sure.

igorborgest · 2020-04-25T21:09:35Z

Hi @OElesin!

I just added support for Docker and Custom Classification.

Docker example:

import awswrangler as wr

cluster_id = wr.emr.create_cluster(
    subnet_id="SUBNET_ID",
    spark_docker=True,
    spark_docker_image="{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{IMAGE_NAME}:{TAG}",
    ecr_credentials_step=True
)

Custom Classification example:

cluster_id = wr.emr.create_cluster(
    subnet_id="SUBNET_ID",
    custom_classifications=[
        {
            "Classification": "livy-conf",
            "Properties": {
                "livy.spark.master": "yarn",
                "livy.spark.deploy-mode": "cluster",
                "livy.server.session.timeout": "16h",
            },
        }
    ],
)

I also create two new tutorials about it:

To install the related branch:
pip install git+https://github.com/awslabs/aws-data-wrangler.git@emr-6

Please, could you test it and give us feedback?

OElesin · 2020-04-25T21:17:12Z

This is excellent! I will give this a try.

Is there a plan to add this to the master branch?

igorborgest · 2020-04-25T21:23:03Z

@OElesin The plain is to release this features on version 1.1.0 on next Friday!

Will be really nice if you could help us with some feedback. Thanks!

OElesin · 2020-04-26T06:34:12Z

@igorborgest, Thanks for this. Tested it in the following conditions:

Using your example, it worked but only started the cluster with master instance only.
Tested with a master instance and core instance, see below:

cluster_id = wr.emr.create_cluster(
    cluster_name="my-demo-cluster-v2",
    logging_s3_path=f"s3://my-logs-bucket/emr-logs/",
    emr_release="emr-6.0.0",
    subnet_id="SUBNET_ID",
    emr_ec2_role="EMR_EC2_DefaultRole",
    emr_role="EMR_DefaultRole",
    instance_type_master="m5.2xlarge",
    instance_type_core="m5.2xlarge",
    instance_ebs_size_master=50,
    instance_ebs_size_core=50,
    instance_num_on_demand_master=0,
    instance_num_on_demand_core=0,
    instance_num_spot_master=1,
    instance_num_spot_core=2,
    spot_bid_percentage_of_on_demand_master=50,
    spot_bid_percentage_of_on_demand_core=50,
    spot_provisioning_timeout_master=5,
    spot_provisioning_timeout_core=5,
    spot_timeout_to_on_demand_master=False,
    spot_timeout_to_on_demand_core=False,
    python3=True,
    ecr_credentials_step=True,
    spark_docker=True,
    spark_docker_image=DOCKER_IMAGE,
    spark_glue_catalog=True,
    hive_glue_catalog=True,
    presto_glue_catalog=True,
    debugging=True,
    applications=["Hadoop", "Spark", "Hive", "Zeppelin", "Livy"],
    visible_to_all_users=True,
    maximize_resource_allocation=True,
    keep_cluster_alive_when_no_steps=True,
    termination_protected=False,
    spark_pyarrow=True
)

Error message:

/bin/bash: docker: command not found
Command exiting with ret '127'

igorborgest · 2020-04-26T19:41:53Z

Hi @OElesin, thanks a lot for the quick response!

You are right, I just figured out EMR does not have docker installed in the master node, only in the core ones.
Due that, we will not be able to refresh the ECR credentials programatically without an external file on s3.

I revisited the implementation and the tutorial and now the expected usage is:

import awswrangler as wr

cluster_id = wr.emr.create_cluster(subnet, docker=True)

wr.emr.submit_ecr_credentials_refresh(cluster_id, path="s3://bucket/emr/")

wr.emr.submit_spark_step(
    cluster_id,
    "s3://bucket/app.py",
    docker_image=DOCKER_IMAGE
)

What you think?

P.S. The custom_classifications usage keeps the same.

igorborgest · 2020-05-05T20:14:00Z

Available on version 1.1.0

OElesin added the enhancement New feature or request label Apr 19, 2020

OElesin assigned igorborgest Apr 19, 2020

igorborgest added major release Will be addressed in the next major release minor release Will be addressed in the next minor release and removed major release Will be addressed in the next major release labels Apr 20, 2020

igorborgest added this to the 1.1.0 milestone Apr 20, 2020

igorborgest added a commit that referenced this issue Apr 25, 2020

Add support to EMR with Docker #193

c2db8cd

igorborgest mentioned this issue Apr 25, 2020

Add support for Docker and custom classification on EMR #205

Closed

igorborgest added the WIP Work in progress label Apr 25, 2020

igorborgest added a commit that referenced this issue Apr 25, 2020

Improve EMR tutorials #193

9611a0a

igorborgest added a commit that referenced this issue Apr 26, 2020

Splitting up the ecr_credentials to a individual function #193

3c3ca64

igorborgest mentioned this issue Apr 27, 2020

Add support for Docker and custom classification on EMR #208

Merged

igorborgest closed this as completed May 5, 2020

igorborgest removed the WIP Work in progress label May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set custom EMR classification #193

How to set custom EMR classification #193

OElesin commented Apr 19, 2020

igorborgest commented Apr 20, 2020

igorborgest commented Apr 25, 2020

OElesin commented Apr 25, 2020

igorborgest commented Apr 25, 2020 •

edited

OElesin commented Apr 26, 2020 •

edited

igorborgest commented Apr 26, 2020 •

edited

igorborgest commented May 5, 2020

How to set custom EMR classification #193

How to set custom EMR classification #193

Comments

OElesin commented Apr 19, 2020

igorborgest commented Apr 20, 2020

igorborgest commented Apr 25, 2020

OElesin commented Apr 25, 2020

igorborgest commented Apr 25, 2020 • edited

OElesin commented Apr 26, 2020 • edited

igorborgest commented Apr 26, 2020 • edited

igorborgest commented May 5, 2020

igorborgest commented Apr 25, 2020 •

edited

OElesin commented Apr 26, 2020 •

edited

igorborgest commented Apr 26, 2020 •

edited