Fix Redshift DAGs to catch appropriate exceptions #348

pankajkoti · 2022-05-13T13:26:11Z

Cluster delete and snapshot delete tasks keep on waiting while
the status is in deleting state. Since, the status get is part
of a while loop, the last iteration when they get deleted raises
an exception that the corresponding resource is not found and it
marks the task as failed. We are fixing this by catching the relevant
status code and other exceptions are re-raised.

Additionally, it is observed quite often that the DAG tasks fail
without any logs. @ephraimbuddy suggested that this could be due
to DAG processing timeouts occurring due to time spent in importing
heavy libraries (although we could not find the dag processing logs in Astro cloud).
So taking this guess & the reference of
https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code
we're delaying the import of boto to the task execution stage and
avoiding to import it at top module level.

We're also also renaming the operator's python reference variable
names to be consistent with the task_id.

Closes: #279

@ephraimbuddy

Cluster delete and snapshot delete tasks keep on waiting while the status is in 'deleting' state. Since, the status get is part of a while loop, the last iteration when they get deleted raises an exception that the corresponding resource is not found and it marks the task as failed. We are fixing this by catching the relevant status code and other exceptions are re-raised. Additionally, it is observed quite often that the DAG tasks fail without any logs. @ephraimbuddy suggested that this could be due to DAG processing timeouts occuring due to time spent in importing heavy libraries. So taking reference of https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code we're delaying the import of 'boto' to the task execution stage and avoiding to it import it at top module level. We're also also renaming the operator's python reference variable names to be consistent with the 'task_id'.

codecov · 2022-05-13T13:30:54Z

Codecov Report

Merging #348 (ee2216a) into main (a33125c) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage   96.78%   96.78%           
=======================================
  Files          56       56           
  Lines        2925     2925           
=======================================
  Hits         2831     2831           
  Misses         94       94

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a33125c...ee2216a. Read the comment docs.

astronomer/providers/amazon/aws/example_dags/example_redshift_sql.py

astronomer/providers/amazon/aws/example_dags/example_redshift_cluster_management.py

kaxil · 2022-05-13T17:43:43Z

astronomer/providers/amazon/aws/example_dags/example_redshift_sql.py

    except ClientError as exception:
-        logging.exception("Error deleting redshift cluster")
-        raise exception
+        if exception.response.get("Error", {}).get("Code", "") == "ClusterNotFound":


Should we add a check on L55 to see if the cluster exists or not? try..except is good too :) but just thinking out loud

yes, like the idea of proactively checking rather than reacting with a try-catch. We have the except more for the while loop on L60 where the cluster remains in deleting state for a while and then throws this error. In my opinion it makes sense for L55 to have that check as it will be only once; but for L60 it will then make 2 API calls each time in the loop before the cluster is finally deleted.
However, I do not seem to find a relevant method available to check beforehand that the cluster exists https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html

bharanidharan14

Overall looks good to me

pankajastro · 2022-05-16T10:27:18Z

LGTM

pankajkoti requested review from phanikumv, bharanidharan14, rajaths010494 and pankajastro as code owners May 13, 2022 13:26

pankajkoti requested review from ephraimbuddy and kaxil May 13, 2022 13:34

pankajastro reviewed May 13, 2022

View reviewed changes

astronomer/providers/amazon/aws/example_dags/example_redshift_sql.py Show resolved Hide resolved

kaxil reviewed May 13, 2022

View reviewed changes

astronomer/providers/amazon/aws/example_dags/example_redshift_cluster_management.py Show resolved Hide resolved

kaxil reviewed May 13, 2022

View reviewed changes

astronomer/providers/amazon/aws/example_dags/example_redshift_cluster_management.py Show resolved Hide resolved

kaxil reviewed May 13, 2022

View reviewed changes

Add logs to indicate sleep

cc2cc63

pankajkoti requested review from kaxil and pankajastro May 16, 2022 07:35

Merge branch 'main' into 279-fix-redshift-dags

ee2216a

bharanidharan14 reviewed May 16, 2022

View reviewed changes

pankajastro approved these changes May 16, 2022

View reviewed changes

phanikumv approved these changes May 16, 2022

View reviewed changes

phanikumv merged commit ed4d0bb into main May 16, 2022

phanikumv deleted the 279-fix-redshift-dags branch May 16, 2022 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Redshift DAGs to catch appropriate exceptions #348

Fix Redshift DAGs to catch appropriate exceptions #348

pankajkoti commented May 13, 2022 •

edited

Loading

codecov bot commented May 13, 2022 •

edited

Loading

kaxil May 13, 2022

pankajkoti May 16, 2022 •

edited

Loading

bharanidharan14 left a comment

pankajastro commented May 16, 2022

Fix Redshift DAGs to catch appropriate exceptions #348

Fix Redshift DAGs to catch appropriate exceptions #348

Conversation

pankajkoti commented May 13, 2022 • edited Loading

codecov bot commented May 13, 2022 • edited Loading

Codecov Report

kaxil May 13, 2022

Choose a reason for hiding this comment

pankajkoti May 16, 2022 • edited Loading

Choose a reason for hiding this comment

bharanidharan14 left a comment

Choose a reason for hiding this comment

pankajastro commented May 16, 2022

pankajkoti commented May 13, 2022 •

edited

Loading

codecov bot commented May 13, 2022 •

edited

Loading

pankajkoti May 16, 2022 •

edited

Loading