Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12641] Use google-auth instead of oauth2client for GCP auth #15004

Merged
merged 19 commits into from Mar 28, 2022

Conversation

chunyang
Copy link
Contributor

@chunyang chunyang commented Jun 11, 2021

This change switches the GCP auth library from oauth2client to google-auth. google-auth-httplib2 library is used to provide authorized HTTP clients that work with the existing vendored libraries for BigQuery, GCS, etc.

I'm interested in this migration because of the need to use custom token URIs for issuing service account tokens--it's supported by google-auth but not oauth2client.

dev list discussion thread: https://lists.apache.org/x/thread.html/r4345315bfa80c6181138d556317a45d7692cd2b37d1d341238e5440a@%3Cdev.beam.apache.org%3E


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Jun 11, 2021

Codecov Report

Merging #15004 (e1eadbe) into master (eeccbb0) will increase coverage by 9.68%.
The diff coverage is 70.83%.

@@            Coverage Diff             @@
##           master   #15004      +/-   ##
==========================================
+ Coverage   73.94%   83.63%   +9.68%     
==========================================
  Files         667      453     -214     
  Lines       88071    63264   -24807     
==========================================
- Hits        65128    52912   -12216     
+ Misses      21832    10352   -11480     
+ Partials     1111        0    -1111     
Flag Coverage Δ
go ?
python 83.63% <70.83%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/internal/gcp/auth.py 81.03% <70.83%> (-0.45%) ⬇️
...ks/python/apache_beam/runners/worker/data_plane.py 87.50% <0.00%> (-1.71%) ⬇️
...thon/apache_beam/runners/worker/sdk_worker_main.py 74.12% <0.00%> (-1.40%) ⬇️
...n/apache_beam/runners/interactive/cache_manager.py 87.58% <0.00%> (-1.01%) ⬇️
sdks/python/apache_beam/internal/metrics/metric.py 90.00% <0.00%> (-1.00%) ⬇️
sdks/python/apache_beam/coders/coder_impl.py 93.97% <0.00%> (-0.31%) ⬇️
...s/interactive/dataproc/dataproc_cluster_manager.py 88.32% <0.00%> (-0.26%) ⬇️
...eam/runners/interactive/interactive_environment.py 90.51% <0.00%> (-0.23%) ⬇️
...apache_beam/runners/dataflow/internal/apiclient.py 77.27% <0.00%> (-0.13%) ⬇️
sdks/python/apache_beam/coders/coders.py 87.98% <0.00%> (-0.12%) ⬇️
... and 225 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeccbb0...e1eadbe. Read the comment docs.

@aaltay
Copy link
Member

aaltay commented Jun 14, 2021

As a reference, I was referred to: https://github.com/googleapis/google-auth-library-python-httplib2 as how some other libraries went through a similar transition.

@chunyang
Copy link
Contributor Author

Thanks! I wasn't aware that library existed, will see if I can incorporate it.

:sdks:python:test-suites:tox:py38:testPy38CloudCoverage fails with
```
ImportError: cannot import name 'enquote_executable' from
'distlib.scripts'
(/home/chuck.yang/beam/build/gradleenv/-1227304282/lib/python3.8/site-packages/distlib/scripts.py)
```
when I don't bump this version. It's odd because 0.3.1 should be enough
to solve the problem...
Use google-auth-httplib2 library instead of implementing our own shims.
@aaltay
Copy link
Member

aaltay commented Jun 29, 2021

R: @tvalentyn - Could you review this? Do you see any issues with Dataflow internals related to this?

@chunyang chunyang changed the title Use google-auth instead of oauth2client for GCP auth [BEAM-12641] Use google-auth instead of oauth2client for GCP auth Jul 20, 2021
@pabloem
Copy link
Member

pabloem commented Aug 12, 2021

What's the status of this PR? @tvalentyn @chunyang

@pabloem pabloem requested a review from tvalentyn August 12, 2021 18:01
@chunyang
Copy link
Contributor Author

@pabloem I'm waiting for a review, I heard that this change may not be compatible with the internal repo at Google, but hope @tvalentyn can confirm/deny. On my side I need to look at the Python PreCommit failure. It seems unrelated to me at first glance.

@tvalentyn
Copy link
Contributor

Thanks @chunyang for working on this and your patience with this change. Adding a dependency on google-auth-httplib2 should be possible, looking closer at the change.

@@ -115,29 +131,32 @@ def get_service_credentials(cls):

@staticmethod
def _get_service_credentials():
if is_running_in_gce:
# We are currently running as a GCE taskrunner worker.
return _GceAssertionCredentials(user_agent='beam-python-sdk/1.0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this path was used to authenticate requests originating from GCE VMs, and we had it so that when Beam SDK is running Dataflow workers, the requests to GCP were authenticated just because the SDK is running on GCE VM.
I wonder how this will work with the new dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @tvalentyn. My understanding is that the google.auth.default() call in line 151 will attempt to find credentials on GCE VMs using the instance Metadata Server so we don't need a special case within the Beam code. Is this something we can check via the existing integration tests?

https://github.com/googleapis/google-auth-library-python/blob/08c987d0215c9d3e230efe5b7c13e6b8197267bc/google/auth/_default.py#L386-L389

@tvalentyn
Copy link
Contributor

Run Python 3.7 PostCommit

@tvalentyn
Copy link
Contributor

Yes, we should be able to check the behavior in postcommit integration tests.

@tvalentyn
Copy link
Contributor

Run Python 3.7 PostCommit

@tvalentyn
Copy link
Contributor

Run Python 3.7 PostCommit

@chunyang
Copy link
Contributor Author

Oops sorry, I broke the postcommit. Fix pending in #15584

@chunyang
Copy link
Contributor Author

Run Python 3.7 PostCommit

@tvalentyn
Copy link
Contributor

That's good to know, thanks for sharing. We have a plan to complete this migration as well.

@yeandy
Copy link
Contributor

yeandy commented Mar 9, 2022

Hi @chunyang, I will be doing internal verification for these changes. It seems that I don't have permission to push to your fork/branch. Please rebase or merge master. Can you give me access to this PR? In the meantime, I'll create temp PR to test internally.

@chunyang
Copy link
Contributor Author

chunyang commented Mar 9, 2022

Hi @yeandy, I merged master and added you as a collaborator on my fork. Is that enough permissions?

@yeandy
Copy link
Contributor

yeandy commented Mar 10, 2022

Thanks, that should work.

Copy link
Contributor

@yeandy yeandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the internal verification, and everything seems to work fine.

Before final approval/merge, I have a few questions @tvalentyn.

  • Do we want to also remove oauth2client from the four base_image_requirements.txt files under the sdks/python/container/py3* directories too?
  • And do we still need oath2client in deps_urls_py.yaml?

@@ -24,4 +24,4 @@ grpcio-tools==1.37.0
mypy-protobuf==1.18

# Avoid https://github.com/pypa/virtualenv/issues/2006
distlib==0.3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been bumped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an error running tests locally when I didn't bump the version (see commit 0bd2c24). However, I don't think Jenkins had the same problem. We can probably revert this bump to minimize the change set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can revert this back.

@tvalentyn
Copy link
Contributor

Yes, we should regenerate the container dependencies, can be done in separate commit/pr, or you can follow https://s.apache.org/beam-python-requirements-generate and push more commits to this branch.

we can remove the entry in deps_urls_py.yaml in the same change.

@tvalentyn
Copy link
Contributor

tvalentyn commented Mar 18, 2022

Running s.apache.org/beam-python-requirements-generate currently requires a linux machine. some deps are not installable on macos.

@yeandy
Copy link
Contributor

yeandy commented Mar 21, 2022

Thanks for the info! I'll start regenerating them.

@yeandy
Copy link
Contributor

yeandy commented Mar 21, 2022

R: @tvalentyn PTAL. Regenerated requirements, removed oauth2client from dep_urls_py.yaml, and bumped distlib version back down.

@tvalentyn
Copy link
Contributor

Happy to merge when tests pass, thanks all.

@yeandy
Copy link
Contributor

yeandy commented Mar 21, 2022

Was just reviewing again, and not sure why, but oauth2client is still in the base_image_requirements.txt files, even though I generated them with oauth2client>=2.0.1,<5 deleted from setup.py. Maybe I did the generation incorrectly?

@tvalentyn
Copy link
Contributor

is it a transitive dep of some other dep? you can install pipdeptree to check.

@yeandy
Copy link
Contributor

yeandy commented Mar 22, 2022

Seems that it's a dependency of google-apitools

google-apitools==0.5.31
  - fasteners [required: >=0.14, installed: 0.16.3]
    - six [required: Any, installed: 1.16.0]
  - httplib2 [required: >=0.8, installed: 0.19.1]
    - pyparsing [required: >=2.4.2,<3, installed: 2.4.7]
  - oauth2client [required: >=1.4.12, installed: 4.1.3]
    - httplib2 [required: >=0.9.1, installed: 0.19.1]
      - pyparsing [required: >=2.4.2,<3, installed: 2.4.7]
    - pyasn1 [required: >=0.1.7, installed: 0.4.8]
    - pyasn1-modules [required: >=0.0.5, installed: 0.2.8]
      - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
    - rsa [required: >=3.1.4, installed: 4.8]
      - pyasn1 [required: >=0.1.3, installed: 0.4.8]
    - six [required: >=1.6.1, installed: 1.16.0]
  - six [required: >=1.12.0, installed: 1.16.0]

@yeandy
Copy link
Contributor

yeandy commented Mar 23, 2022

@tvalentyn If I'm understanding this correctly, this doesn't require any additional regeneration, right?

@tvalentyn
Copy link
Contributor

Run Portable_Python PreCommit

@tvalentyn
Copy link
Contributor

Run PythonDocker PreCommit

@tvalentyn
Copy link
Contributor

Run Python PreCommit

@tvalentyn
Copy link
Contributor

@tvalentyn If I'm understanding this correctly, this doesn't require any additional regeneration, right?

right

@tvalentyn
Copy link
Contributor

We need to revert the change to dep_urls_py.yaml.

@tvalentyn
Copy link
Contributor

retest this please

@tvalentyn
Copy link
Contributor

hmm. not seeing test signals for some reason, might be a github issue

@tvalentyn
Copy link
Contributor

ok, triggered now.

@tvalentyn tvalentyn merged commit 2e7f5f3 into apache:master Mar 28, 2022
@chunyang chunyang deleted the cyang/auth-shim branch March 28, 2022 18:04
@aaltay
Copy link
Member

aaltay commented Mar 28, 2022

Congratulations! And thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants