Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11780] Use vendored cloudbuild python client. #13933

Merged
merged 4 commits into from Feb 12, 2021

Conversation

y1chi
Copy link
Contributor

@y1chi y1chi commented Feb 9, 2021

Use vendored cloud build python client so that we can avoid the pain of dealing with third_party dependencies with google internal testing.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Feb 9, 2021

Codecov Report

Merging #13933 (333940b) into master (123901f) will increase coverage by 0.04%.
The diff coverage is 84.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13933      +/-   ##
==========================================
+ Coverage   82.85%   82.89%   +0.04%     
==========================================
  Files         466      469       +3     
  Lines       57596    58217     +621     
==========================================
+ Hits        47722    48260     +538     
- Misses       9874     9957      +83     
Impacted Files Coverage Δ
..._beam/runners/portability/sdk_container_builder.py 40.21% <6.89%> (-3.07%) ⬇️
...nternal/clients/cloudbuild/cloudbuild_v1_client.py 55.17% <55.17%> (ø)
...s/dataflow/internal/clients/cloudbuild/__init__.py 80.00% <80.00%> (ø)
...ernal/clients/cloudbuild/cloudbuild_v1_messages.py 100.00% <100.00%> (ø)
.../python/apache_beam/transforms/periodicsequence.py 96.49% <0.00%> (-1.76%) ⬇️
sdks/python/apache_beam/internal/metrics/metric.py 86.45% <0.00%> (-1.05%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.54% <0.00%> (-0.16%) ⬇️
sdks/python/apache_beam/dataframe/expressions.py 91.02% <0.00%> (-0.14%) ⬇️
.../runners/portability/fn_api_runner/translations.py 92.10% <0.00%> (-0.08%) ⬇️
setup.py 0.00% <0.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 123901f...333940b. Read the comment docs.

@y1chi
Copy link
Contributor Author

y1chi commented Feb 9, 2021

R: @tvalentyn @emilymye

@tvalentyn
Copy link
Contributor

  1. What is the process to update or regenerate the GCB code? How does one find out What version of GCB used?
  2. How do we account for GCB dependencies since top-level library is no longer installed ? I am seeing following deps:
google-cloud-build==2.0.0
  - google-api-core [required: >=1.22.0,<2.0.0dev, installed: 1.26.0]
  - libcst [required: >=0.2.5, installed: 0.3.17]
  - proto-plus [required: >=0.4.0, installed: 1.13.0]

It seems strange that we are modifying the external library for internal testing purposes. Can these or similar changes be done upon import instead?

@y1chi
Copy link
Contributor Author

y1chi commented Feb 10, 2021

  1. What is the process to update or regenerate the GCB code? How does one find out What version of GCB used?

we can use

pip install google-apitools

gen_client --discovery_url=cloudbuild.v1 --overwrite --outdir=apache_beam/runners/dataflow/internal/clients/cloudbuild --root_package=. client

to generate the client.

I think the version is only bind to the api url version, in this case we are using cloudbuild v1

. If client is updated it will become v1b1, v1b2 etc.

  1. How do we account for GCB dependencies since top-level library is no longer installed ? I am seeing following deps:
google-cloud-build==2.0.0
  - google-api-core [required: >=1.22.0,<2.0.0dev, installed: 1.26.0]
  - libcst [required: >=0.2.5, installed: 0.3.17]
  - proto-plus [required: >=0.4.0, installed: 1.13.0]

I think if they are solely dependencies for google-cloud-build they won't be needed anymore? Do we need some special handling?

It seems strange that we are modifying the external library for internal testing purposes. Can these or similar changes be done upon import instead?

It will add a lot of hacks into the importing process and I'm guessing we were avoiding that before since there are also other gcp client vendored:

  • apache_beam/runners/dataflow/internal/clients/dataflow
  • apache_beam/io/gcp/internal/clients/bigquery
  • apache_beam/io/gcp/internal/clients/storage

I think it's desirable to use the third_party libraries, but it seems to cause a lot of frictions since the dependencies are often out of sync and have different hacks with copybara internally.

@tvalentyn
Copy link
Contributor

I think if they are solely dependencies for google-cloud-build they won't be needed anymore? Do we need some special handling?

Thanks. I thought originally that this PR checks in the google-cloud-build code. Looks like you generated client code with apitools. google-apitools is deprecated, and we were actually trying to move away from depending on google-apitools and use official google-cloud-python libraries. This may be a step in the opposite direction. cc: @aaltay - do you have any opinion here? Perhaps we should look into addressing the difficulty of managing the third_party deps for internal testing instead.

@aaltay
Copy link
Member

aaltay commented Feb 10, 2021

I think if they are solely dependencies for google-cloud-build they won't be needed anymore? Do we need some special handling?

Thanks. I thought originally that this PR checks in the google-cloud-build code. Looks like you generated client code with apitools. google-apitools is deprecated, and we were actually trying to move away from depending on google-apitools and use official google-cloud-python libraries. This may be a step in the opposite direction. cc: @aaltay - do you have any opinion here? Perhaps we should look into addressing the difficulty of managing the third_party deps for internal testing instead.

do you have any opinion here?

This is just an opinion: It would be better to avoid generated clients. They do end up causing more issues than regular dependencies, they are hard to upgrade, usually stale, and apitools has been deprecated for a long time. I am fine if we want to do this in there interim and unblock some testing, but a much better solution would be to make this a proper dependency. I do not know the specific issues but this is only one of many dependencies and I assume we will have ways to address the issues with using a dependency.

@y1chi
Copy link
Contributor Author

y1chi commented Feb 10, 2021

I think if they are solely dependencies for google-cloud-build they won't be needed anymore? Do we need some special handling?

Thanks. I thought originally that this PR checks in the google-cloud-build code. Looks like you generated client code with apitools. google-apitools is deprecated, and we were actually trying to move away from depending on google-apitools and use official google-cloud-python libraries. This may be a step in the opposite direction. cc: @aaltay - do you have any opinion here? Perhaps we should look into addressing the difficulty of managing the third_party deps for internal testing instead.

do you have any opinion here?

This is just an opinion: It would be better to avoid generated clients. They do end up causing more issues than regular dependencies, they are hard to upgrade, usually stale, and apitools has been deprecated for a long time. I am fine if we want to do this in there interim and unblock some testing, but a much better solution would be to make this a proper dependency. I do not know the specific issues but this is only one of many dependencies and I assume we will have ways to address the issues with using a dependency.

I think it should be easy to switch back to use the third_party library any time(assuming cloud build team will manage the library internally). The difficulties of using the third party google-cloud-build library are:

  • Right now there are dozens of dependencies of google-cloud-build library that are un-managed.
  • or managed by some other teams but with many customized import hacks.
  • some dependencies could have been imported to a certain version to be used by certain targets, the version does not satisfy google-cloud-build requirements.

The third party dependency issue is a lot more chaos and hard to control than using google-apitools to generate the client at the moment.

@aaltay
Copy link
Member

aaltay commented Feb 10, 2021

I think if they are solely dependencies for google-cloud-build they won't be needed anymore? Do we need some special handling?

Thanks. I thought originally that this PR checks in the google-cloud-build code. Looks like you generated client code with apitools. google-apitools is deprecated, and we were actually trying to move away from depending on google-apitools and use official google-cloud-python libraries. This may be a step in the opposite direction. cc: @aaltay - do you have any opinion here? Perhaps we should look into addressing the difficulty of managing the third_party deps for internal testing instead.

do you have any opinion here?

This is just an opinion: It would be better to avoid generated clients. They do end up causing more issues than regular dependencies, they are hard to upgrade, usually stale, and apitools has been deprecated for a long time. I am fine if we want to do this in there interim and unblock some testing, but a much better solution would be to make this a proper dependency. I do not know the specific issues but this is only one of many dependencies and I assume we will have ways to address the issues with using a dependency.

I think it should be easy to switch back to use the third_party library any time(assuming cloud build team will manage the library internally). The difficulties of using the third party google-cloud-build library are:

  • Right now there are dozens of dependencies of google-cloud-build library that are un-managed.
  • or managed by some other teams but with many customized import hacks.
  • some dependencies could have been imported to a certain version to be used by certain targets, the version does not satisfy google-cloud-build requirements.

The third party dependency issue is a lot more chaos and hard to control than using google-apitools to generate the client at the moment.

Sounds reasonable to me to proceed with this PR and at the same time ask cloud build team to improve the internal library story.

@tvalentyn
Copy link
Contributor

Thanks all for the input.

Can we

  1. add instructions on how to regenerate this client somewhere in the code or documentation, and leave pointers on BEAM-11780?
  2. move the generated code to apache_beam/io/gcp/internal/clients/
  3. create an (internal) issue for cloudbuild ownernes that documents the friction points?

Thanks!

@y1chi
Copy link
Contributor Author

y1chi commented Feb 10, 2021

Thanks all for the input.

Can we

  1. add instructions on how to regenerate this client somewhere in the code or documentation, and leave pointers on BEAM-11780?

Done.

  1. move the generated code to apache_beam/io/gcp/internal/clients/

This library is purely used in dataflow runner and has no io functionality, doesn't it make more sense to put it under apache_beam/runners/dataflow/internal/clients?

  1. create an (internal) issue for cloudbuild ownernes that documents the friction points?

Created b/179919397 as a FR for cloud build team.

Thanks!

@y1chi
Copy link
Contributor Author

y1chi commented Feb 12, 2021

friendly ping for another look, @aaltay @tvalentyn

@tvalentyn tvalentyn merged commit adf4555 into apache:master Feb 12, 2021
@y1chi y1chi deleted the cloudbuild_client branch February 12, 2021 01:23
@tvalentyn
Copy link
Contributor

Thanks, @y1chi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants