[AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow#3504
[AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow#3504OElesin wants to merge 26 commits intoapache:masterfrom
Conversation
…; Also added Glue Job Operator and Glue Job Sensor
…; Also added Glue Job Operator and Glue Job Sensor
…; Updated ASF headers
[AIRFLOW-2310]: Added AWS Glue Job Compatibility to Airflow
Codecov Report
@@ Coverage Diff @@
## master #3504 +/- ##
=======================================
Coverage 77.14% 77.14%
=======================================
Files 203 203
Lines 15123 15123
=======================================
Hits 11667 11667
Misses 3456 3456Continue to review full report at Codecov.
|
|
Thanks @OElesin |
…ubator-airflow into aws-glue-integration # Conflicts: # airflow/contrib/hooks/aws_glue_job_hook.py # docs/code.rst # docs/integration.rst # tests/contrib/hooks/test_aws_glue_job_hook.py
# Conflicts: # airflow/contrib/hooks/aws_glue_job_hook.py # docs/code.rst # tests/contrib/hooks/test_aws_glue_job_hook.py
|
@Fokko, kindly review. Thanks |
|
@OElesin something must have gone wrong with the rebasing/merging, I see a lot of unrelated changes. |
|
@Fokko, this was the same problem I emphasized with the previous PR. Can you please point out the unrelated changes? |
|
It isn't clean. If you look at the Files tab, you see a lot of files changed related to GCP. That doesn't make sense. |
|
@OElesin Do you plan to resolve the merge issues soon? Looking forward to using the Glue operator soon, thanks! |
|
@OElesin I'm also really interested in this submission. |
| completed = job_run_state == 'SUCCEEDED' | ||
|
|
||
| while True: | ||
| if failed or stopped or completed: |
There was a problem hiding this comment.
Just "accidentally" got here while looking at something else but I thought it was worth commenting on this loop.
Isn't this a potentially infinite loop? Once in the loop, how are these 3 variables updated? I don't see the call to get the up-to-date status after the time.sleep().
There was a problem hiding this comment.
Well, it's not an infinite loop as it exits the loop once the job status is in any of these states:
FAILED, STOPPED or SUCCEEDED.
There was a problem hiding this comment.
Sorry @oelesinsc24 just saw this comment. I'm probably missing something but I can't see where the failed, stopped, completed variables change value inside the while loop. The else branch only sleeps, doesn't this mean that the if condition will always be false? 🤷♂
|
@OElesin AWS Glue Operator will be a great addition! Let's fix this PR and get this out? |
|
@OElesin Have you been able to make any progress on this? [edit] From looking at the PR it seems it will both create a job (using boto3's FYI All the unrelated changes are because you merged |
This is absolutely correct. And also makes the implementation way easier. I would make the necessary changes and add the commit. |
|
Everyone, I have moved this PR to: #4068. I will close this later and ask that we track changes on the new PR. Apologies for the inconveniences. |
|
Moved to #4068 |
Make sure you have checked all steps below.
JIRA
Description
Currently, there is no integration with AWS Glue Jobs. My pull request is basically an improvement to integrate running AWS Glue jobs with Airflow.
Tests
Added tests.contrib.test_aws_glue_job_hook.py.
However, moto the test class for boto3 does not support AWS Glue mock currently
Commits
My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
Passes
git diff upstream/master -u -- "*.py" | flake8 --diff