Skip to content

Glue DataBrew operator#34807

Merged
vincbeck merged 21 commits intoapache:mainfrom
ellisms:main
Oct 16, 2023
Merged

Glue DataBrew operator#34807
vincbeck merged 21 commits intoapache:mainfrom
ellisms:main

Conversation

@ellisms
Copy link
Contributor

@ellisms ellisms commented Oct 6, 2023

closes: #22037

Created Glue DataBrew start job Operator.

Comment on lines 296 to 305
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen when both wait_for_completion and deferrable are set to True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The AMPP trigger README suggests placing the deferrable check before wait_for_completion, indicating the preference is to defer. In this case, both options accomplish the same goal - wait for the DataBrew job to reach a terminal status. So, should setting both to True make a difference?

Comment on lines 426 to 427
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure should we store it into the glue.py or move it into the databrew.py. It somehow related to the Glue, but internally in boto3 it is separate client, and I guess it might have separate AWS API. So in theory both approach could be applicable, that is just question for clarification

cc: @vincbeck @ferruzzi @o-nikolas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I guess both are fine and it is just a question of opinion but I would move it too to another file named databrew.py

Copy link
Contributor Author

@ellisms ellisms Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure which way it should really go, it is shown in the console under Glue. The waiter, however, needs to be named databrew, so it probably makes more sense to split it out into its own set of files. I will make the changes, if this is the direction we should go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's in an awkward position and I guess either would work, but I'd lean towards separate in this case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this appear in the provider.yaml file? Should it have its own entry named 'Amazon Glue DataBrew', or be listed under the existing 'Glue' (like glue_crawler)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a simple answer to where to put it.

If I am a user of databrew what is the first place that I will look to find databrew related code? (This question is regardless of Airflow itself.. it's the same for doc search or any other thing related to the service)
obviously I will go under Amazon/aws but what after?
would it be under glue / databrew / glue_databrew ?
If most users are going naturally to A then code should be in A.
This is the only question that matters.

Comment on lines 426 to 427
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I guess both are fine and it is just a question of opinion but I would move it too to another file named databrew.py

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments and suggestions

Comment on lines 426 to 427
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's in an awkward position and I guess either would work, but I'd lean towards separate in this case as well.

@vincbeck
Copy link
Contributor

Static check failures and unit tests. Could you please take a look?

@ellisms
Copy link
Contributor Author

ellisms commented Oct 12, 2023

Static check failures and unit tests. Could you please take a look?

I don't think all changes were committed/pushed. Just rebased and pushed all changes. Local tests passed.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my comments have been addressed and the CI is green 👍 Congrats

ellisms and others added 13 commits October 13, 2023 09:51
Co-authored-by: Hussein Awala <hussein@awala.fr>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
@vincbeck vincbeck merged commit 4a37777 into apache:main Oct 16, 2023
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers area:system-tests changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow Operator for Starting Amazon Glue DataBrew Job

7 participants

Comments