Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-538] Add support for JSON key dict in GCP base hook #1816

Closed
wants to merge 2 commits into from

Conversation

sammcveety
Copy link

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:
-https://issues.apache.org/jira/browse/AIRFLOW-538

Some system deployments use JSON dictionaries rather than a keyfile to store GCP credentials. This PR adds support for those, as well as unit tests for the overall hook.

@alexvanboxel FYI as a major contributor to this code

Add tests for GCP base hook.

Remove unused import.
@codecov-io
Copy link

codecov-io commented Oct 1, 2016

Current coverage is 65.51% (diff: 100%)

Merging #1816 into master will not change coverage

@@             master      #1816   diff @@
==========================================
  Files           127        127          
  Lines          9650       9650          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           6322       6322          
  Misses         3328       3328          
  Partials          0          0          

Powered by Codecov. Last update c02425d...8e4ee10

@alexvanboxel
Copy link
Contributor

My biggest question is: should it not be encrypted... this JSON key will be exactly the same as the JSON file on disk (but at least this can be safe guarded). This is an ideal vehicle to steel secret keys... @jlowin @criccomini what is your idea around this?

@jlowin
Copy link
Member

jlowin commented Oct 3, 2016

@alexvanboxel that's a very good point, however I'm not sure if we can really prevent people from passing secrets to any of their hooks/operators... even Connections, which are where all credentials should really be stored, aren't perfectly secure, though they at least obfuscate passwords.

@criccomini
Copy link
Contributor

IIRC, the keydict field would be encrypted. I agree it's a little odd, but I think it would be as secure as the other passwords in the connection list (such as DB passwords). I'm OK with this.

Copy link
Contributor

@criccomini criccomini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick look. One nit on comments. Also, curious, I think you might need to update this file:

airflow/www/views.py

To include something similar to:

    'extra__google_cloud_platform__key_path',
    'extra__google_cloud_platform__key_path': StringField('Keyfile Path'),


Default credentials: Only specify 'Project Id'. Then you need to have executed
``gcloud auth`` on the Airflow worker machine.

JSON key file: Specify 'Project Id', 'Key Path' and 'Scope'.

JSON key dict: Specify 'Project Id' and 'Scope'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

JSON key dict: Specify 'Project Id', 'Key Dict' and 'Scope'.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Author

@sammcveety sammcveety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. Addressed all comments, and add JSON encoding to the field, which seems like the sane way to do it.


Default credentials: Only specify 'Project Id'. Then you need to have executed
``gcloud auth`` on the Airflow worker machine.

JSON key file: Specify 'Project Id', 'Key Path' and 'Scope'.

JSON key dict: Specify 'Project Id' and 'Scope'.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexvanboxel
Copy link
Contributor

alexvanboxel commented Oct 23, 2016

I have a new concern: for the DataFlow operator I need to fix something so I could propagate the credentials defined in the connection to the DataFlow process. With a key on disk it's easy, just set the GOOGLE_APPLICATION_CREDENTIALS, point it to the key. What would I do with the JSON key? Write it do disk on a temp location and set the env variable every time the DataFlow process is set?

see: https://developers.google.com/identity/protocols/application-default-credentials

@criccomini
Copy link
Contributor

I'm inclined to pass on this PR. We have been taking a stance on being opinionated about things in the GCP area (i.e. which Google cloud library to use). IMO, we should be opinionated on this, and just say the file should be on disk.

@sammcveety
Copy link
Author

I buy that. Would (most of) the test still be useful?

@criccomini
Copy link
Contributor

Ya, most of the test would be great, if you want to strip out the other changes.

@r39132
Copy link
Contributor

r39132 commented Nov 14, 2016

@sammcveety please update this PR within the next couple of days else it will be closed as inactive. Once you update it and update the commit and jira to be more reflective of your change, we can merge it.

@criccomini
Copy link
Contributor

Closed this. Please open a new PR with just the tests.

@asfgit asfgit closed this in 523f2dc Nov 17, 2016
alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants