Add auto termination to emr job flow#22980
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
| class EmrAutoTerminatePolicyOperator(BaseOperator): | ||
| """ | ||
| An operator to put auto terminate policy on a given cluster/jobflow | ||
| Note: auto terminate policy is supported with Amazon EMR versions 5.30.0 and 6.1.0 and later. |
There was a problem hiding this comment.
Should / Can we generalize it to be EmrChangePolicyOperator thus allowing to change setting rather than focusing on a specific one?
For example this operator is using put_auto_termination_policy what if someone else wants put_auto_scaling_policy will we create a new dedicated operator?
There was a problem hiding this comment.
Good point, totally agree.
I did it so specific because I think this policy is the unique one that can not be set when creating the EMR cluster with boto3.
There was a problem hiding this comment.
@ferruzzi @o-nikolas @vincbeck maybe you have some idea how can we generalize it?
There was a problem hiding this comment.
I would do it that way. I would add 3 parameters:
policy_nameto indicates which policy you want to update. This value will tell which API to call through boto3policy_contentwhich contains the actual content of the policy. e.g.
{
AutoTerminationPolicy={
'IdleTimeout': 123
}
}
instance_group_id. Optional parameter needed ifpolicy_name== "auto_scaling"
The code would look like this (pseudo code, please bare with me):
def __init__(
self,
policy_name: str,
policy_content: dict,
instance_group_id: Optional[str] = None,
job_flow_id: Optional[str] = None,
job_flow_name: Optional[str] = None,
cluster_states: Optional[List[str]] = None,
aws_conn_id: str = 'aws_default',
**kwargs
):
...
def execute(self, context: 'Context') -> None:
...
if self.policy_name == "auto_termination":
response = emr.put_auto_termination_policy(
ClusterId=job_flow_id,
**self.policy_content,
)
elif self.policy_name == "auto_scaling":
response = emr.put_auto_scaling_policy(
ClusterId=job_flow_id,
InstanceGroupId=self.instance_group_id,
**self.policy_content,
)
elif self.policy_name == "managed_scaling":
response = emr.put_managed_scaling_policy(
ClusterId=job_flow_id,
**self.policy_content,
)
else:
raise ...|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This PR's adds a new operator that allows to set the auto termination policy to an EMR cluster.
With this operator after creating an emr jobflow/cluster you can ensure it will be terminated if something unexpected happens on your DAG execution and airflow is not able to order the termination of the cluster.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.