-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Glue Hook to implement enhanced logging #23832
Comments
@ak-arun - maybe you would like to add Pull Request fixing it ? Seems ike an easy thing to do that would not require deep knowledge of Airlow itself and you could test it locally with your setup ? Can I assign it to you? This is the easiest and fastest way to get it implemented :). Otherwise I mark it as "good-first-issue". |
Hi, I would like to work on this feature. Can you please guide me a little on how to proceed with it? I looked into airflow/airflow/providers/amazon/aws/hooks/glue.py Lines 161 to 164 in 1d53bec
|
@Dark-Knight11 I would like to warn about side effect of fetch AWS Glue Logs into Airflow Tasks logs (already mentioned in the almost the same issue #23900 (comment)). If your Glue Job uses 40 DPU that mean it spawn minimum 1 driver and 39 workers/executors and all of them will create errors and output logs, as result
|
Ok, Thanks. I'll look into it |
Closed by #25142 |
Hi Andrey,
Point taken. I am thinking if we just publish the logs when the job state
changes to Error/Failed.
Also, instead of mining all executor logs + the driver, I think we can read
the 2 recently added job insights streams -
https://docs.aws.amazon.com/glue/latest/dg/monitor-job-insights.html
For sure, this is a new feature in Glue and older jobs may not have these
streams enabled. We can mention it works only if job-insights are
enabled?-Thoughts ?
…--
Thanks & Regards
*Arun A K*
On Wed, May 25, 2022 at 1:10 PM Andrey Anshin ***@***.***> wrote:
@Dark-Knight11 <https://github.com/Dark-Knight11> I would like to warn
about side effect of fetch AWS Glue Logs into Airflow Tasks logs (already
mentioned in the almost the same issue #23900 (comment)
<#23900 (comment)>).
If your Glue Job uses 40 DPU that mean it spawn minimum 1 driver and 39
workers/executors and all of them will create errors and output logs, as
result
1. You need to find all of CloudWatch Logs Prefix started with job_id
in correct log group, by default Glue uses
- /aws-glue/jobs/output - for output logs
- /aws-glue/jobs/error - for error logs
- /aws-glue/jobs/logs-v2 - (optional) for continuous logging
2. You need to be sure that you fetch from all prefixes and this
prefixes doesn't created in the same time
3. Be sure that fetch logging processes/threads do not use all
CPU/Memory/IO of Airflow Worker
—
Reply to this email directly, view it on GitHub
<#23832 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVRXFOQEOW4BUUI2YU5FHLVLZNHPANCNFSM5WP75PDA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description
Currently, when submitting AWS Glue jobs using Airflow, we are seeing the below in the log :-
Polling for AWS Glue Job <job_name> current run state with status <job_run_state>. This request is to enhance this by providing more detailed information about the Glue Job, show the same continuous logs from Glue/Cloudwatch into the Airflow logs. This avoids airflow users having to log onto Glue logs to be able to see detailed information on their Jobs. Also, AWS admins can limit the users access to just airflow and need not open up access to glue console and cloudwatch just for seeing the logs. Airflow could then be used as the single pane of glass on job orchestration and job health management.
From a design perspective, for customers that may not be interested in detailed logs, this could be configuration controlled.
Use case/motivation
This will provide a single place to track the job status rather than having to hop screens. Airflow could then be used as the single pane of glass on job orchestration and job health management.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: