Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow waiter to be configured via EmrServerless Operators #27784

Merged
merged 5 commits into from
Dec 6, 2022

Conversation

syedahsn
Copy link
Contributor

Currently, the waiter function used in the wait_for_completion logic for EMR Serverless operators is not configurable. There have been user requests to allow the operators to pass parameters to the waiter. This PR adds this feature. In addition, the waiter function is generic enough to be used outside of the EmrServerlessHook, so it is being moved to a generic location.

closes: aws-samples/emr-serverless-samples#27


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/providers/amazon/aws/hooks/emr.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/emr.py Show resolved Hide resolved
@syedahsn
Copy link
Contributor Author

@eladkal can you take another look at this when you have some time? I think I've addressed the issues you brought up. Thanks!

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @eladkal ?

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +506 to +512
:param waiter_countdown: Total amount of time, in seconds, the operator will wait for
the application to start. Defaults to 25 minutes.
:param waiter_check_interval_seconds: Number of seconds between polling the state of the application.
Defaults to 60 seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder about

EmrServerlessCreateApplicationOperator(
...,
wait_for_completion=False,
waiter_countdown=40,
waiter_check_interval_seconds=70
)

In this case the waiter_countdown, waiter_check_interval_seconds will not be used.
I think we should notify the user by log entry that parameters passed are ignored.
Or we can do the other way around, if user passes waiter_countdown or waiter_check_interval_seconds we can force wait_for_completion to True even if user override this (We have similar case is in BaseOperator with wait_for_downstream and depends_on_past as far as I remember)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not entirely true. In both of the EMR Serverless operators, one call to the waiter function is made regardless of whether or not wait_for_completion is True. In this case, I think its better to leave it as is but I'm open to hearing other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is diffeent behavior for created and for started states.
Started state depends also on wait_for_completion

I'm not sure if functionality change is needed. That is your call
But I do think that clarification is required (doc wise?)
Given the code I shared earlier the parameters I passed will be consider for Created but not for Started.
I find this confusing. Do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're correct in that when EmrServerlessCreateApplicationOperator is used, waiter can be called potentially twice, but definitely once. The first time, it will wait for the application to be in CREATED state, and it will use waiter_countdown and waiter_check_interval_seconds. If wait_for_completion is True, it will call waiter again and wait for the application to be in STARTED state.
For the EmrServerlessStartJobOperator, the waiter could potentially be called twice as well, but may not be called at all depending on whether the application is in a Started state or not. I added a change in the doc string to reflect this. Let me know if it is unclear. Thanks for the review!

@potiuk potiuk merged commit 0da0048 into apache:main Dec 6, 2022
dacort added a commit to aws-samples/emr-serverless-samples that referenced this pull request Dec 16, 2022
@vandonr-amz vandonr-amz deleted the syedahsn/waiter-refactor branch May 24, 2023 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parameter countdown cannot be passed to EmrServerlessStartJobOperator
3 participants