Skip to content

[AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow#3504

Closed
OElesin wants to merge 26 commits intoapache:masterfrom
OElesin:master
Closed

[AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow#3504
OElesin wants to merge 26 commits intoapache:masterfrom
OElesin:master

Conversation

@OElesin
Copy link
Contributor

@OElesin OElesin commented Jun 14, 2018

Make sure you have checked all steps below.

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Currently, there is no integration with AWS Glue Jobs. My pull request is basically an improvement to integrate running AWS Glue jobs with Airflow.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Added tests.contrib.test_aws_glue_job_hook.py.
    However, moto the test class for boto3 does not support AWS Glue mock currently

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":

    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"
  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

OElesin added 22 commits April 23, 2018 21:17
…; Also added Glue Job Operator and Glue Job Sensor
…; Also added Glue Job Operator and Glue Job Sensor
[AIRFLOW-2310]: Added AWS Glue Job Compatibility to Airflow
@codecov-io
Copy link

codecov-io commented Jun 14, 2018

Codecov Report

Merging #3504 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3504   +/-   ##
=======================================
  Coverage   77.14%   77.14%           
=======================================
  Files         203      203           
  Lines       15123    15123           
=======================================
  Hits        11667    11667           
  Misses       3456     3456

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f4d681...253171f. Read the comment docs.

@Fokko
Copy link
Contributor

Fokko commented Jun 15, 2018

Thanks @OElesin
Can you squash your commits into one single commit?

OElesin added 4 commits June 27, 2018 21:16
…ubator-airflow into aws-glue-integration

# Conflicts:
#	airflow/contrib/hooks/aws_glue_job_hook.py
#	docs/code.rst
#	docs/integration.rst
#	tests/contrib/hooks/test_aws_glue_job_hook.py
# Conflicts:
#	airflow/contrib/hooks/aws_glue_job_hook.py
#	docs/code.rst
#	tests/contrib/hooks/test_aws_glue_job_hook.py
@OElesin
Copy link
Contributor Author

OElesin commented Jun 29, 2018

@Fokko, kindly review.

Thanks

@Fokko
Copy link
Contributor

Fokko commented Jul 1, 2018

@OElesin something must have gone wrong with the rebasing/merging, I see a lot of unrelated changes.

@OElesin
Copy link
Contributor Author

OElesin commented Jul 1, 2018

@Fokko, this was the same problem I emphasized with the previous PR. Can you please point out the unrelated changes?

@Fokko
Copy link
Contributor

Fokko commented Jul 6, 2018

It isn't clean. If you look at the Files tab, you see a lot of files changed related to GCP. That doesn't make sense.

@suma-ps
Copy link

suma-ps commented Jul 31, 2018

@OElesin Do you plan to resolve the merge issues soon? Looking forward to using the Glue operator soon, thanks!

@cnidus
Copy link

cnidus commented Aug 6, 2018

@OElesin I'm also really interested in this submission.

@OElesin
Copy link
Contributor Author

OElesin commented Aug 6, 2018

@suma-ps and @cnidus ,

I have plans of working fixing this. But I kinda get squashing commits mixed up at times. I would try to resolve this by weekend and resubmit the PR.

Thanks @ALL

completed = job_run_state == 'SUCCEEDED'

while True:
if failed or stopped or completed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "accidentally" got here while looking at something else but I thought it was worth commenting on this loop.

Isn't this a potentially infinite loop? Once in the loop, how are these 3 variables updated? I don't see the call to get the up-to-date status after the time.sleep().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not an infinite loop as it exits the loop once the job status is in any of these states:
FAILED, STOPPED or SUCCEEDED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @oelesinsc24 just saw this comment. I'm probably missing something but I can't see where the failed, stopped, completed variables change value inside the while loop. The else branch only sleeps, doesn't this mean that the if condition will always be false? 🤷‍♂

@sid88in
Copy link
Contributor

sid88in commented Sep 25, 2018

@OElesin AWS Glue Operator will be a great addition! Let's fix this PR and get this out?

@simonvanderveldt
Copy link

simonvanderveldt commented Oct 3, 2018

@OElesin Have you been able to make any progress on this?
There's one duplicate file in there airflow/contrib/aws_glue_job_hook.py vs airflow/contrib/hook/aws_glue_job_hook.py.

[edit] From looking at the PR it seems it will both create a job (using boto3's create_job()) as well as run the job (using boto3's start_job_run()). IMHO at the very least these should be separate things so one can run jobs without creating them.
To expand on this, we create the jobs using a normal development workflow/CI and we just want to be able to run them from Airflow. IMHO it's not Airflow's responsibility to create resources, it should only invoke them.

FYI All the unrelated changes are because you merged master into your branch instead of rebasing your branch on top of master

@oelesinsc24
Copy link

@OElesin Have you been able to make any progress on this?
There's one duplicate file in there airflow/contrib/aws_glue_job_hook.py vs airflow/contrib/hook/aws_glue_job_hook.py.

[edit] From looking at the PR it seems it will both create a job (using boto3's create_job()) as well as run the job (using boto3's start_job_run()). IMHO at the very least these should be separate things so one can run jobs without creating them.
To expand on this, we create the jobs using a normal development workflow/CI and we just want to be able to run them from Airflow. IMHO it's not Airflow's responsibility to create resources, it should only invoke them.

FYI All the unrelated changes are because you merged master into your branch instead of rebasing your branch on top of master

This is absolutely correct. And also makes the implementation way easier. I would make the necessary changes and add the commit.

@oelesinsc24
Copy link

Everyone, I have moved this PR to: #4068. I will close this later and ask that we track changes on the new PR.

Apologies for the inconveniences.

@Fokko
Copy link
Contributor

Fokko commented Oct 20, 2018

Moved to #4068

@Fokko Fokko closed this Oct 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants