-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide GCP credentials in Bash/Python operators #8432
Conversation
5c0e0d8
to
235382f
Compare
Codecov Report
@@ Coverage Diff @@
## master #8432 +/- ##
=========================================
- Coverage 6.23% 6.22% -0.01%
=========================================
Files 946 950 +4
Lines 45661 45723 +62
=========================================
Hits 2846 2846
- Misses 42815 42877 +62
Continue to review full report at Codecov.
|
I think it should be a gcp_bash_operator.py deriving from Bash operator and it should be in providers/google. |
@potiuk This contradicts the whole idea and the need for this operator. BashOperator and PythonOperaator are very useful because it is universal. Bash and Python are also built by compositions. New applications are installed on the system and can be used by any tool. If we inherit and make customization GCP-specific, we will limit its functionality. It will no longer be a universal operator. You will only be able to use it with one provider. I think this is a similar problem to I hope that in the future new parameters will be added for other cloud providers, e.g. AWS. cross_platform_task = BashOperator(
task_id='gcloud',
bash_command=(
'gsutil cp gs//bucket/a.txt a.txt && aws s3 cp test.txt s3://mybucket/test2.txt'
),
gcp_conn_id=GCP_PROJECT_ID,
aws_conn_id=AWS_PROJECT_ID,
) Then it will still be a universal operator and we will not build a vendor-lock for one providers. From an architectural point of view. Here the use of inheritance will be bad, but we should composition. Inheritance will limit these operators too much. I will only cite one fragment.
If we replace some words, we have our problem.
|
As I started thinking about it for a long time, we can create a |
I disagree. It's not a matter about inheritance vs. composition. I think we can find a good solution for that - but It's the matter of bringing GCP specific code to shared Bash Operator. Not everyone uses GCP and having code that is GCP-specific in general-purpose operator hurts my eyes. BashOperator should know nothing about GCP. Maybe indeed this should be done via plugins or similar solution. I wonder what others thing about it @turbaszek @kaxil @ashb ? Should BashOperator contain GCP-specific code? WDYT? |
Yeah I agree this feels very leaky. Being able to pass any Airflow connection securely in to Bash (or SSH) Op feels like a useful ability. GCP, or any special case less so. |
If we did this, we'd need an AWS one, plus one for most databases, then a Spark+all of them etc. |
I agree with Jarek here, any cloud specific code in Core Operators would hurt my eyes. Each cloud provider also provides a way to authenticate using an Environment Variable too. In case of GCP it is GOOGLE_APPLICATION_CREDENTIALS which can be used for Bash and Python Operators. And if the Virtual Machine is on that Cloud Provider itself you don't need to authenticate as you can VM's credentials too. We already allow injecting env vars to BashOperator. Users have enough flexibility in Bash and Python, is my main point. I would let them take care of this, rather than having us to maintain this code. |
I agree that having provider-specific code in Bash / Python ops doesn't sound good. However, I think that it would be nice to help users somehow to authorize in those ops. Other idea I have is to pass an authorization context manager to operators: class BashOperator:
def __init__(..., authctx=None):
self.authctx = authctx
def pre_execute(self):
self.authctx.enter()
def post_execute(self):
self.authctx.close() and then users can do something like this:
|
Yeah I am fine if we can create something "generic" |
I will present something generic for moments, which will not require the transfer of a complex object. It's as easy to use as it is now, but it will be more generic. It will be something similar to the code below.
New services will continue to be added by composition rather than inheritance. Thank you very much for the comments. This shows that this feature can be useful. |
That looks much better indeed. Let's see how this will look like eventually :) |
I think we can have a nice solution together with #8651
all operators ( or only selected by function ) will benefit from the change |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I wanted to provide the ability to use GCP credentials in the
BashOperator
andPythonOperator
. Unfortunately, some users must use these operators in their workflows.I care about two things the most.
system_site_packages=False
.Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.