Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Glue Hook to implement enhanced logging #23832

Closed
1 of 2 tasks
ak-arun opened this issue May 20, 2022 · 6 comments
Closed
1 of 2 tasks

AWS Glue Hook to implement enhanced logging #23832

ak-arun opened this issue May 20, 2022 · 6 comments
Assignees
Labels

Comments

@ak-arun
Copy link

ak-arun commented May 20, 2022

Description

Currently, when submitting AWS Glue jobs using Airflow, we are seeing the below in the log :-
Polling for AWS Glue Job <job_name> current run state with status <job_run_state>. This request is to enhance this by providing more detailed information about the Glue Job, show the same continuous logs from Glue/Cloudwatch into the Airflow logs. This avoids airflow users having to log onto Glue logs to be able to see detailed information on their Jobs. Also, AWS admins can limit the users access to just airflow and need not open up access to glue console and cloudwatch just for seeing the logs. Airflow could then be used as the single pane of glass on job orchestration and job health management.

From a design perspective, for customers that may not be interested in detailed logs, this could be configuration controlled.

Use case/motivation

This will provide a single place to track the job status rather than having to hop screens. Airflow could then be used as the single pane of glass on job orchestration and job health management.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ak-arun ak-arun added the kind:feature Feature Requests label May 20, 2022
@ak-arun ak-arun changed the title AWS Glue Hook to implement more verbos logging AWS Glue Hook to implement enhanced logging May 20, 2022
@potiuk
Copy link
Member

potiuk commented May 22, 2022

@ak-arun - maybe you would like to add Pull Request fixing it ? Seems ike an easy thing to do that would not require deep knowledge of Airlow itself and you could test it locally with your setup ? Can I assign it to you? This is the easiest and fastest way to get it implemented :). Otherwise I mark it as "good-first-issue".

@Dark-Knight11
Copy link
Contributor

Dark-Knight11 commented May 25, 2022

Hi, I would like to work on this feature. Can you please guide me a little on how to proceed with it?

I looked into glue.py file and I think I have to make changes over here.
How can I add the logs from cloudwatch/glue over here?
Do I need to use AwsLogsHook?

else:
self.log.info(
"Polling for AWS Glue Job %s current run state with status %s", job_name, job_run_state
)

@Taragolis
Copy link
Contributor

@Dark-Knight11 I would like to warn about side effect of fetch AWS Glue Logs into Airflow Tasks logs (already mentioned in the almost the same issue #23900 (comment)).

If your Glue Job uses 40 DPU that mean it spawn minimum 1 driver and 39 workers/executors and all of them will create errors and output logs, as result

  1. You need to find all of CloudWatch Logs Prefix started with job_id in correct log group, by default Glue uses
    • /aws-glue/jobs/output - for output logs
    • /aws-glue/jobs/error - for error logs
    • /aws-glue/jobs/logs-v2 - (optional) for continuous logging
  2. You need to be sure that you fetch from all prefixes and this prefixes doesn't created in the same time
  3. Be sure that fetch logging processes/threads do not use all CPU/Memory/IO of Airflow Worker

@Dark-Knight11
Copy link
Contributor

Ok, Thanks. I'll look into it

@ferruzzi
Copy link
Contributor

Closed by #25142

@eladkal eladkal closed this as completed Jul 19, 2022
@ak-arun
Copy link
Author

ak-arun commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants