You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting this discussion with the intention to share our use case, current issues and to get any ideas or inspiration on how to best proceed.
Background
We are currently using MWAA 2.10.3 to orchestrate among others, Glue jobs. For this we are using the GlueJobOperator to trigger runs of already defined jobs with minimal arguments provided.
Key detail to note is that we are using deferrable=True, main reason for this is that we have longer running jobs and sensors and we do not want to reserve workers for them over longer periods.
Issue
We are using on_failure_callback with a custom implemented function that extracts the error message from the context of a failed task and posts it as a card to our Teams channel. exception = context.get('exception')
When a glue job fails while the task is in deferred status it will only pick up that the state has failed and our callback simply extracts "Trigger failure".
This is an issue because in our Teams error notifications we want to immediately be able to see the high level cause of failure. Currently we would need to either go to Glue logs directly or via the Airflow logs.
Possibly solutions
We have considered the following solutions or workarounds
verbose=True
While this should include all detailed logs in our Airflow tasks we are not confident that this will actually solve our issue as status check on the final attempt will still fail. We are also hesitant to enable this as it would further duplicate our existing logs 1:1.
Wrap GlueJobOperator and execute_complete function
This could possible be a good solution to modify the behaviour of that final status check. But we are hesitant to wrap the original operator as that would complicate further MWAA version upgrades for us.
Enhance custom callback to include additional get_job_run call based on job_run_id from context
This is currently our preferred approach with the caveat that the final error message of the Glue job will not be included in the task logs. But it will be included in our error notification in Teams.
Summary
Happy to receive any thoughts or inputs on the described issue. Let me know if I have missed to describe any essential part.
Also interested to know if this type of behaviour would be encouraged to be added to the functionality of GlueJobOperator or if this has been a concious decision to not include.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Starting this discussion with the intention to share our use case, current issues and to get any ideas or inspiration on how to best proceed.
Background
We are currently using MWAA 2.10.3 to orchestrate among others, Glue jobs. For this we are using the GlueJobOperator to trigger runs of already defined jobs with minimal arguments provided.
Key detail to note is that we are using
deferrable=True, main reason for this is that we have longer running jobs and sensors and we do not want to reserve workers for them over longer periods.Issue
We are using
on_failure_callbackwith a custom implemented function that extracts the error message from the context of a failed task and posts it as a card to our Teams channel.exception = context.get('exception')When a glue job fails while the task is in deferred status it will only pick up that the state has failed and our callback simply extracts "Trigger failure".
This is an issue because in our Teams error notifications we want to immediately be able to see the high level cause of failure. Currently we would need to either go to Glue logs directly or via the Airflow logs.
Possibly solutions
We have considered the following solutions or workarounds
verbose=True
While this should include all detailed logs in our Airflow tasks we are not confident that this will actually solve our issue as status check on the final attempt will still fail. We are also hesitant to enable this as it would further duplicate our existing logs 1:1.
Wrap GlueJobOperator and execute_complete function
This could possible be a good solution to modify the behaviour of that final status check. But we are hesitant to wrap the original operator as that would complicate further MWAA version upgrades for us.
Enhance custom callback to include additional get_job_run call based on job_run_id from context
This is currently our preferred approach with the caveat that the final error message of the Glue job will not be included in the task logs. But it will be included in our error notification in Teams.
Summary
Happy to receive any thoughts or inputs on the described issue. Let me know if I have missed to describe any essential part.
Also interested to know if this type of behaviour would be encouraged to be added to the functionality of GlueJobOperator or if this has been a concious decision to not include.
Beta Was this translation helpful? Give feedback.
All reactions