Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests: Run Example DAGs on Cloud envs #124

Closed
phanikumv opened this issue Mar 14, 2022 · 9 comments · Fixed by #207
Closed

Integration tests: Run Example DAGs on Cloud envs #124

phanikumv opened this issue Mar 14, 2022 · 9 comments · Fixed by #207
Assignees
Labels
pri/high High priority testing Unit or integration testing improvements

Comments

@phanikumv
Copy link
Collaborator

phanikumv commented Mar 14, 2022

Have one of the following that pushes all the example DAG in Gen2 cloud deployment, creates necessary connections and runs all the DAGs. And send a Slack message with the summary of the run :
(a) Scheduled CI job
(b) A master DAG that runs on a schedule in the same deployment

@phanikumv phanikumv changed the title Automate DAG integration tests Integration tests: Run Example DAGs on Cloud envs Mar 14, 2022
@phanikumv phanikumv added pri/high High priority testing Unit or integration testing improvements labels Mar 14, 2022
@pankajastro
Copy link
Contributor

Update

Push all example DAG to Cloud env => Done
Master DAG to run example => Done
Dag run summary collection => Done
SLACK message of summary => TODO

@pankajastro
Copy link
Contributor

Update

Integrated slack alert but still need to test it.
IT ticket for slack token: #1595

@pankajastro
Copy link
Contributor

Update

Integrated and tested slack alert.

@phanikumv
Copy link
Collaborator Author

Getting below issue during integration test for the RedshiftPauseClusterOperatorAsync. @bharanidharan14 to check and update the status here.

[2022-03-22, 16:38:10 UTC] {taskinstance.py:1264} INFO - Executing <Task(RedshiftPauseClusterOperatorAsync): pause_redshift_cluster> on 2022-03-22 16:36:53.839205+00:00 [2022-03-22, 16:38:10 UTC] {standard_task_runner.py:52} INFO - Started process 130 to run task [2022-03-22, 16:38:10 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'example_async_redshift_cluster_management', 'pause_redshift_cluster', 'manual__2022-03-22T16:36:53.839205+00:00', '--job-id', '1505', '--raw', '--subdir', 'DAGS_FOLDER/example_redshift_cluster_management.py', '--cfg-path', '/tmp/tmp8xg8ofms', '--error-file', '/tmp/tmpmv5t_s78'] [2022-03-22, 16:38:10 UTC] {standard_task_runner.py:77} INFO - Job 1505: Subtask pause_redshift_cluster [2022-03-22, 16:38:10 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: example_async_redshift_cluster_management.pause_redshift_cluster manual__2022-03-22T16:36:53.839205+00:00 [running]> on host cometary-asterism-3668-worker-c95d7f59-7gvjf [2022-03-22, 16:38:10 UTC] {taskinstance.py:1429} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=example_async_redshift_cluster_management AIRFLOW_CTX_TASK_ID=pause_redshift_cluster AIRFLOW_CTX_EXECUTION_DATE=2022-03-22T16:36:53.839205+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2022-03-22T16:36:53.839205+00:00 [2022-03-22, 16:38:10 UTC] {taskinstance.py:1718} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1334, in _run_raw_task self._execute_task_with_callbacks(context) File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1460, in _execute_task_with_callbacks result = self._execute_task(context, self.task) File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1489, in _execute_task raise TaskDeferralError(next_kwargs.get("error", "Unknown"))

@bharanidharan14
Copy link
Contributor

@pankajastro @phanikumv on this issue there is execution_timeout parameter is fixed to 60 seconds normally the redshift cluster management operators takes more then 60seconds, so we need to increase the execution_timeout

Another findings:
There is one more findings

  • we deleted the cluster redshift-cluster-1 without taking the backup of the snapshot, again we created the cluster with the same name redshift-cluster-1 it got created and we triggered the DAG it run and we got this exception.

03/23 13:33:36[2022-03-23 08:03:36,805] {triggerer_job.py:359} INFO - Trigger <astronomer.providers.amazon.aws.triggers.redshift_cluster.RedshiftClusterTrigger task_id=pause_redshift_cluster, polling_period_seconds=5, aws_conn_id=aws_default, cluster_identifier=redshift-cluster-1, operation_type=pause_cluster> (ID 477) fired: TriggerEvent<{'status': 'error', 'message': "An error occurred (InvalidClusterState) when calling the PauseCluster operation: You can't pause cluster redshift-cluster-1 because no recently available backup was found. Create a manual snapshot or wait for an automated snapshot, then retry."}>

so to fix this

  • While creating the cluster make sure it is created in unique and snapshot is created (or)
  • If it is created with previous name make sure there is a snapshot in the cluster. (or)
  • Delete the cluster with snapshot created (it is not suggested because this snapshot storage is chargeable)

@bharanidharan14
Copy link
Contributor

On the RedshiftSQLOperatorAsync example DAG was not properly aligned with task

@bharanidharan14
Copy link
Contributor

@pankajastro There is dependencies of DAG because RedshiftSQLOperatorAsync and Redshiftcluster management DAG is using the same cluster details, so when the pause cluster is in executing mode and if the RedshiftSQLOperatorAsync try to execute the it will throw error saying cluster isn't available

@pankajastro
Copy link
Contributor

Fixed timeout issue for RedshiftPauseClusterOperatorAsync in PR #154

@pankajastro
Copy link
Contributor

@bharanidharan14 Now the amazons related DAG will run in sequence PR #136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pri/high High priority testing Unit or integration testing improvements
Projects
None yet
3 participants