Skip to content

[AIRFLOW-2512][AIRFLOW-2522] Use google-auth instead of oauth2client#3488

Closed
tswast wants to merge 1 commit intoapache:masterfrom
tswast:google-auth
Closed

[AIRFLOW-2512][AIRFLOW-2522] Use google-auth instead of oauth2client#3488
tswast wants to merge 1 commit intoapache:masterfrom
tswast:google-auth

Conversation

@tswast
Copy link
Copy Markdown
Contributor

@tswast tswast commented Jun 11, 2018

Make sure you have checked all steps below.

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:
  • Updates the GCP hooks to use the google-auth library and removes
    dependencies on the deprecated oauth2client package.
  • Removes inconsistent handling of the scope parameter for different
    auth methods.

Note: using google-auth for credentials requires a newer version of the
google-api-python-client package, so this commit also updates the
minimum version for that.

To avoid some annoying warnings about the discovery cache not being
supported, so disable the discovery cache explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923

No UI changes.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Added test_gcp_api_base_hook.py to test fix for [AIRFLOW-2522] (allowing scopes for default credentials).

Tested by running:

nosetests tests/contrib/operators/test_dataflow_operator.py \
    tests/contrib/operators/test_gcs*.py \
    tests/contrib/operators/test_mlengine_*.py \
    tests/contrib/operators/test_pubsub_operator.py \
    tests/contrib/hooks/test_gcp*.py \
    tests/contrib/hooks/test_gcs_hook.py \
    tests/contrib/hooks/test_bigquery_hook.py

and also tested by running some GCP-related DAGs locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstart

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Tweaked explanation of default credentials since they can be many different kinds of credentials besides those from the gcloud command.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@tswast tswast force-pushed the google-auth branch 3 times, most recently from 93e2afe to 4cc1bef Compare June 11, 2018 23:56
@kaxil
Copy link
Copy Markdown
Member

kaxil commented Jun 11, 2018

Thank You for this PR. Can you please have a look at the failing travis build?

@tswast
Copy link
Copy Markdown
Contributor Author

tswast commented Jun 11, 2018

Yeah, just pushed what I hope will be a fix.

Copy link
Copy Markdown
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's wait for the results from Travis CI now.

@kaxil
Copy link
Copy Markdown
Member

kaxil commented Jun 12, 2018

@tswast Can you please rebase against the master? The CI was failing for unrelated tests which are now fixed.

@tswast
Copy link
Copy Markdown
Contributor Author

tswast commented Jun 12, 2018

Rebased. 🤞

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jun 12, 2018

Codecov Report

Merging #3488 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3488      +/-   ##
==========================================
+ Coverage   77.16%   77.16%   +<.01%     
==========================================
  Files         203      203              
  Lines       15123    15123              
==========================================
+ Hits        11669    11670       +1     
+ Misses       3454     3453       -1
Impacted Files Coverage Δ
airflow/models.py 88.07% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bb547a...bc12c80. Read the comment docs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work with users that don't want to use a service account? Developers often times want to oauth using their Google account rather than using a service account. Will this continue to work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will work. google-auth does read credentials from the Google Cloud SDK.

Note: those developers will get an annoying warning message. See: googleapis/google-auth-library-python#271

If that's a concern we could possibly add a user-auth option as is done in pandas-gbq, but that's quite a bit more code to do. I'd prefer to do that in a separate PR, as it would be additional functionality as opposed to this which is a 1:1 replacement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'm fine with a 1:1 replacement. Just want to make sure that we don't lose functionality that we rely on. :)

Copy link
Copy Markdown
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the docs?

MacBook-Pro-van-Fokko:incubator-airflow fokkodriesprong$ grep -R oauth2client .
./docs/conf.py:    'oauth2client.service_account',
Binary file ./airflow/contrib/hooks/gcp_api_base_hook.pyc matches
grep: ./airflow/www/static/docs: No such file or directory
grep: ./airflow/www_rbac/static/docs: No such file or directory
./scripts/ci/requirements.txt:oauth2client>=2.0.2,<2.1.0

@tswast
Copy link
Copy Markdown
Contributor Author

tswast commented Jun 12, 2018

I'm a bit curious why they are mocking out so many modules in ./docs/conf.py instead of using intersphinx, but I'll update according to the existing pattern for now.

@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented Jun 12, 2018

I'm not an expert on intersphinx, but feel free to PR some of your knowledge to Airflow ;)

* Updates the GCP hooks to use the google-auth library and removes
  dependencies on the deprecated oauth2client package.
* Removes inconsistent handling of the scope parameter for different
  auth methods.

Note: using google-auth for credentials requires a newer version of the
google-api-python-client package, so this commit also updates the
minimum version for that.

To avoid some annoying warnings about the discovery cache not being
supported, so disable the discovery cache explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923

Tested by running:

    nosetests tests/contrib/operators/test_dataflow_operator.py \
        tests/contrib/operators/test_gcs*.py \
        tests/contrib/operators/test_mlengine_*.py \
        tests/contrib/operators/test_pubsub_operator.py \
        tests/contrib/hooks/test_gcp*.py \
        tests/contrib/hooks/test_gcs_hook.py \
        tests/contrib/hooks/test_bigquery_hook.py

and also tested by running some GCP-related DAGs locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstart
@tswast
Copy link
Copy Markdown
Contributor Author

tswast commented Jun 12, 2018

@Fokko Done.

$ ag oauth2client .

$ ag google-auth .
docs/howto/manage-connections.rst
80:   <https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_,

setup.py
149:    'google-auth>=1.0.0, <2.0.0dev',
150:    'google-auth-httplib2>=0.0.1',

scripts/ci/requirements.txt
45:google-auth>=1.0.0,<2.0.0dev
46:google-auth-httplib2

Regarding Intersphinx, I've filed https://issues.apache.org/jira/browse/AIRFLOW-2603 to get to in a separate PR.

@asfgit asfgit closed this in 0f4d681 Jun 12, 2018
@kaxil
Copy link
Copy Markdown
Member

kaxil commented Jun 12, 2018

Great work @tswast . Thanks 👍

@tswast tswast deleted the google-auth branch June 12, 2018 23:01
asfgit pushed a commit that referenced this pull request Jun 27, 2018
* Updates the GCP hooks to use the google-auth
library and removes
  dependencies on the deprecated oauth2client
package.
* Removes inconsistent handling of the scope
parameter for different
  auth methods.

Note: using google-auth for credentials requires a
newer version of the
google-api-python-client package, so this commit
also updates the
minimum version for that.

To avoid some annoying warnings about the
discovery cache not being
supported, so disable the discovery cache
explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923

Tested by running:

    nosetests
tests/contrib/operators/test_dataflow_operator.py
\
        tests/contrib/operators/test_gcs*.py \
        tests/contrib/operators/test_mlengine_*.py \
        tests/contrib/operators/test_pubsub_operator.py \
        tests/contrib/hooks/test_gcp*.py \
        tests/contrib/hooks/test_gcs_hook.py \
        tests/contrib/hooks/test_bigquery_hook.py

and also tested by running some GCP-related DAGs
locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstart

Closes #3488 from tswast/google-auth

(cherry picked from commit 0f4d681)
Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
* Updates the GCP hooks to use the google-auth
library and removes
  dependencies on the deprecated oauth2client
package.
* Removes inconsistent handling of the scope
parameter for different
  auth methods.

Note: using google-auth for credentials requires a
newer version of the
google-api-python-client package, so this commit
also updates the
minimum version for that.

To avoid some annoying warnings about the
discovery cache not being
supported, so disable the discovery cache
explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923

Tested by running:

    nosetests
tests/contrib/operators/test_dataflow_operator.py
\
        tests/contrib/operators/test_gcs*.py \
        tests/contrib/operators/test_mlengine_*.py \
        tests/contrib/operators/test_pubsub_operator.py \
        tests/contrib/hooks/test_gcp*.py \
        tests/contrib/hooks/test_gcs_hook.py \
        tests/contrib/hooks/test_bigquery_hook.py

and also tested by running some GCP-related DAGs
locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstart

Closes apache#3488 from tswast/google-auth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants