New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set custom EMR classification #193
Comments
Hi @OElesin! Thanks for reaching out, really relevant topic. Currently there is no support for "container-executor", "docker" and custom classifications. But we will address all for sure. |
Hi @OElesin! I just added support for Docker and Custom Classification. Docker example: import awswrangler as wr
cluster_id = wr.emr.create_cluster(
subnet_id="SUBNET_ID",
spark_docker=True,
spark_docker_image="{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{IMAGE_NAME}:{TAG}",
ecr_credentials_step=True
) Custom Classification example: cluster_id = wr.emr.create_cluster(
subnet_id="SUBNET_ID",
custom_classifications=[
{
"Classification": "livy-conf",
"Properties": {
"livy.spark.master": "yarn",
"livy.spark.deploy-mode": "cluster",
"livy.server.session.timeout": "16h",
},
}
],
) I also create two new tutorials about it: To install the related branch: Please, could you test it and give us feedback? |
This is excellent! I will give this a try. Is there a plan to add this to the master branch? |
@OElesin The plain is to release this features on version Will be really nice if you could help us with some feedback. Thanks! |
@igorborgest, Thanks for this. Tested it in the following conditions:
cluster_id = wr.emr.create_cluster(
cluster_name="my-demo-cluster-v2",
logging_s3_path=f"s3://my-logs-bucket/emr-logs/",
emr_release="emr-6.0.0",
subnet_id="SUBNET_ID",
emr_ec2_role="EMR_EC2_DefaultRole",
emr_role="EMR_DefaultRole",
instance_type_master="m5.2xlarge",
instance_type_core="m5.2xlarge",
instance_ebs_size_master=50,
instance_ebs_size_core=50,
instance_num_on_demand_master=0,
instance_num_on_demand_core=0,
instance_num_spot_master=1,
instance_num_spot_core=2,
spot_bid_percentage_of_on_demand_master=50,
spot_bid_percentage_of_on_demand_core=50,
spot_provisioning_timeout_master=5,
spot_provisioning_timeout_core=5,
spot_timeout_to_on_demand_master=False,
spot_timeout_to_on_demand_core=False,
python3=True,
ecr_credentials_step=True,
spark_docker=True,
spark_docker_image=DOCKER_IMAGE,
spark_glue_catalog=True,
hive_glue_catalog=True,
presto_glue_catalog=True,
debugging=True,
applications=["Hadoop", "Spark", "Hive", "Zeppelin", "Livy"],
visible_to_all_users=True,
maximize_resource_allocation=True,
keep_cluster_alive_when_no_steps=True,
termination_protected=False,
spark_pyarrow=True
) Error message: /bin/bash: docker: command not found
Command exiting with ret '127' |
Hi @OElesin, thanks a lot for the quick response! You are right, I just figured out EMR does not have docker installed in the master node, only in the core ones. I revisited the implementation and the tutorial and now the expected usage is: import awswrangler as wr
cluster_id = wr.emr.create_cluster(subnet, docker=True)
wr.emr.submit_ecr_credentials_refresh(cluster_id, path="s3://bucket/emr/")
wr.emr.submit_spark_step(
cluster_id,
"s3://bucket/app.py",
docker_image=DOCKER_IMAGE
) What you think? P.S. The |
Available on version 1.1.0 |
Is your idea related to a problem? Please describe.
I already made use of the library and it was super helpful. I tried setting custom EMR classifications so as to make use of EMR 6.0.0, but I could not set custom classification. Is it possible to do this currently, or it has to be a feature request?
The text was updated successfully, but these errors were encountered: