Skip to content

Adjusted the EMRServerlessStartJobOperator to cancel failed jobs #51883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dominikhei
Copy link
Contributor


closes: #42401

I have introduced a cancel_job method to the EMRServerlessHook, which wraps the cancel_job_run method from boto3.

In cases of a non deferrable job run, if an Exception that waiter_max_attempts has been reached is thrown, cancel_job is executed. If deferrable is set to True, the cancellation logic is placed inside execute_complete, as this method evaluates the job state in this case.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 18, 2025
@dominikhei dominikhei marked this pull request as ready for review June 18, 2025 12:55
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.

I'd like to hear more thoughts on that from others.

@dominikhei
Copy link
Contributor Author

dominikhei commented Jun 18, 2025

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.

I'd like to hear more thoughts on that from others.

Apologies if there is an obvious answer, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

@vincbeck
Copy link
Contributor

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.
I'd like to hear more thoughts on that from others.

Apologies if this is an obvious question, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

Hard to know all the different user use cases but I think you're correct, I do not see any, so I am probably wrong in my perception :)

@dominikhei
Copy link
Contributor Author

I feel like this is a very opinionated decision. I am wondering if this is not something the user should set by using on_failure_callback and not us to take this decision.
I'd like to hear more thoughts on that from others.

Apologies if this is an obvious question, but is there a use case where you would want the job to not be cancelled in EMR if a new one is created due to retries, now running / pending concurrently?

Hard to know all the different user use cases but I think you're correct, I do not see any, so I am probably wrong in my perception :)

That’s true, there’s definetly a point in letting the user decide, albeit introducing more complexity. As you said lets wait on other opinions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EmrServerlessStartJobOperator does not cancel EMR Serverless job when waiter_max_attempts is reached
2 participants