-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-538] Add support for JSON key dict in GCP base hook #1816
Conversation
Add tests for GCP base hook. Remove unused import.
Current coverage is 65.51% (diff: 100%)@@ master #1816 diff @@
==========================================
Files 127 127
Lines 9650 9650
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 6322 6322
Misses 3328 3328
Partials 0 0
|
My biggest question is: should it not be encrypted... this JSON key will be exactly the same as the JSON file on disk (but at least this can be safe guarded). This is an ideal vehicle to steel secret keys... @jlowin @criccomini what is your idea around this? |
@alexvanboxel that's a very good point, however I'm not sure if we can really prevent people from passing secrets to any of their hooks/operators... even Connections, which are where all credentials should really be stored, aren't perfectly secure, though they at least obfuscate passwords. |
IIRC, the keydict field would be encrypted. I agree it's a little odd, but I think it would be as secure as the other passwords in the connection list (such as DB passwords). I'm OK with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick look. One nit on comments. Also, curious, I think you might need to update this file:
airflow/www/views.py
To include something similar to:
'extra__google_cloud_platform__key_path',
'extra__google_cloud_platform__key_path': StringField('Keyfile Path'),
|
||
Default credentials: Only specify 'Project Id'. Then you need to have executed | ||
``gcloud auth`` on the Airflow worker machine. | ||
|
||
JSON key file: Specify 'Project Id', 'Key Path' and 'Scope'. | ||
|
||
JSON key dict: Specify 'Project Id' and 'Scope'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be:
JSON key dict: Specify 'Project Id', 'Key Dict' and 'Scope'.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review. Addressed all comments, and add JSON encoding to the field, which seems like the sane way to do it.
|
||
Default credentials: Only specify 'Project Id'. Then you need to have executed | ||
``gcloud auth`` on the Airflow worker machine. | ||
|
||
JSON key file: Specify 'Project Id', 'Key Path' and 'Scope'. | ||
|
||
JSON key dict: Specify 'Project Id' and 'Scope'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I have a new concern: for the DataFlow operator I need to fix something so I could propagate the credentials defined in the connection to the DataFlow process. With a key on disk it's easy, just set the GOOGLE_APPLICATION_CREDENTIALS, point it to the key. What would I do with the JSON key? Write it do disk on a temp location and set the env variable every time the DataFlow process is set? see: https://developers.google.com/identity/protocols/application-default-credentials |
I'm inclined to pass on this PR. We have been taking a stance on being opinionated about things in the GCP area (i.e. which Google cloud library to use). IMO, we should be opinionated on this, and just say the file should be on disk. |
I buy that. Would (most of) the test still be useful? |
Ya, most of the test would be great, if you want to strip out the other changes. |
@sammcveety please update this PR within the next couple of days else it will be closed as inactive. Once you update it and update the commit and jira to be more reflective of your change, we can merge it. |
Closed this. Please open a new PR with just the tests. |
Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
-https://issues.apache.org/jira/browse/AIRFLOW-538
Some system deployments use JSON dictionaries rather than a keyfile to store GCP credentials. This PR adds support for those, as well as unit tests for the overall hook.
@alexvanboxel FYI as a major contributor to this code