Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deferrable mode in Redshift delete cluster #30244

Merged
merged 1 commit into from
Jun 4, 2023

Conversation

pankajastro
Copy link
Member

@pankajastro pankajastro commented Mar 22, 2023

Add the deferrable param in RedshiftDeleteClusterOperator.
This will allow running RedshiftDeleteClusterOperator in an async way
that means we only submit a job from the worker to delete a redshift cluster
then defer to the trigger for the polling and waiter for a cluster to get removed
and the worker slot won't be occupied for the whole period of
task execution.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch 5 times, most recently from 1d4814f to 0a97c7b Compare March 23, 2023 07:31
@pankajastro pankajastro marked this pull request as ready for review March 23, 2023 09:08
@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch 5 times, most recently from 000f937 to b648783 Compare April 11, 2023 15:32
Copy link
Contributor

@syedahsn syedahsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments. A more general comment on the PR is that the way it is written means that we are forced to duplicate logic in operators to Triggers if we want to have the same functionality between deferrable and non deferrable modes. This is seen in the retry logic that's implemented in the delete_cluster method in the operator, which is now being copied over to the Trigger. I think this approach will lead to more cpmplicated, error-prone code because any change made to one will have to be copied to the other. I'm currently in the middle of writing a PR for supporting deferrable operators as well, and I took a slightly different approach. I'd love to hear what your thoughts are on how I've gone about it.

airflow/providers/amazon/aws/hooks/redshift_cluster.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/triggers/redshift_cluster.py Outdated Show resolved Hide resolved
@pankajastro
Copy link
Member Author

Added a few more comments. A more general comment on the PR is that the way it is written means that we are forced to duplicate logic in operators to Triggers if we want to have the same functionality between deferrable and non deferrable modes. This is seen in the retry logic that's implemented in the delete_cluster method in the operator, which is now being copied over to the Trigger. I think this approach will lead to more cpmplicated, error-prone code because any change made to one will have to be copied to the other. I'm currently in the middle of writing a PR for supporting deferrable operators as well, and I took a slightly different approach. I'd love to hear what your thoughts are on how I've gone about it.

Rebased on top of your PR but I see you have also created a PR for the same #30870

@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch from a88789d to a20eae1 Compare May 25, 2023 10:30
@pankajastro pankajastro marked this pull request as ready for review May 25, 2023 14:00
@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch 3 times, most recently from 41e4bb4 to f19ab65 Compare May 26, 2023 09:39
@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch 2 times, most recently from 8b91cf1 to e5c4c61 Compare May 29, 2023 18:09
Copy link
Contributor

@syedahsn syedahsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nits, and some more test cases.

@pankajastro pankajastro requested a review from syedahsn May 30, 2023 06:36
@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch from dc26d94 to b8cb678 Compare May 30, 2023 07:04
Comment on lines +396 to +398
@cached_property
def hook(self):
return RedshiftHook(aws_conn_id=self.aws_conn_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity, why are you using RedshiftHook instead of RedshiftAsyncHook? (I see it used in 3 of the other 4 triggers)

Copy link
Member Author

@pankajastro pankajastro May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In main we have RedshiftAsyncHook in one trigger RedshiftClusterTrigger which I'll clean once this PR will be merged
https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/triggers/redshift_cluster.py can you please point me to which trigger you are looking at? Also, we decided to go with boto waiter approach and not the async hook for aws async operator #30032

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it in #31610 PR

@pankajastro
Copy link
Member Author

Hi @hussein-awala / @eladkal requesting a re-review on this. thank you!

Add the deferrable param in RedshiftDeleteClusterOperator.
This will allow running RedshiftDeleteClusterOperator in an async way
that means we only submit a job from the worker to delete a redshift cluster
then defer to the trigger for the polling and waiter for a cluster to get removed
and the worker slot won't be occupied for the whole period of
task execution.
@pankajastro pankajastro force-pushed the async_redshift_delete_cluster branch from a1a1b43 to 3598464 Compare June 4, 2023 17:52
@hussein-awala hussein-awala requested a review from ashb June 4, 2023 20:13
@potiuk potiuk merged commit a247a8f into apache:main Jun 4, 2023
42 checks passed
pankajastro added a commit to astronomer/airflow that referenced this pull request Jun 5, 2023
for some reason CI was green in apache#30244 despite some issue.
Most probably because of wrong rebase. This PR remove unimported value to make to consistent with other
and fix the main.
@pankajastro pankajastro mentioned this pull request Jun 5, 2023
@pankajastro pankajastro deleted the async_redshift_delete_cluster branch June 5, 2023 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants