Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-13732] Switch x-lang BigQueryIO expansion service to GCP one. #16784

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

youngoli
Copy link
Contributor

@youngoli youngoli commented Feb 8, 2022

This keeps our jars from inflating much. Having GCP on the SchemaIO exp. service adds ~60 MB to the jar, as opposed to around 1 MB to the existing GCP exp. service.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

This keeps our jars from inflating much. Having GCP on the SchemaIO exp. service adds ~60 MB to the jar, as opposed to around 1 MB to the existing GCP exp. service.
@youngoli
Copy link
Contributor Author

youngoli commented Feb 8, 2022

R: @chamikaramj @lostluck
CC: @riteshghorse

@youngoli
Copy link
Contributor Author

youngoli commented Feb 8, 2022

Sidenote: Is there documentation listing which IOs are in which expansion services that I should be updating when I make a change like this? I've only seen this listed in the IO wrapper documentation, and I'm wondering if there's another location I'm missing or if that's all.

@chamikaramj
Copy link
Contributor

Is there documentation listing which IOs are in which expansion services that I should be updating

Currently no but this is something we should add.

@youngoli
Copy link
Contributor Author

youngoli commented Feb 9, 2022

Run Java PreCommit

@youngoli youngoli merged commit 06de99b into apache:master Feb 9, 2022
Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to send some of my changes.

@@ -35,5 +35,13 @@ dependencies {
permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761
implementation project(":sdks:java:io:google-cloud-platform")
permitUnusedDeclared project(":sdks:java:io:google-cloud-platform") // BEAM-11761
implementation project(":sdks:java:extensions:schemaio-expansion-service")
permitUnusedDeclared project(":sdks:java:extensions:schemaio-expansion-service") // BEAM-11761
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you just need to depend on "https://github.com/apache/beam/blob/master/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java" not the full expansion service (including all the shaded jars) right ?

I think in the future we should consider just moving that class to "core" or to a new module that both expansion services depend on. This change should be OK if it does not bloat the size of google-cloud-platform jar too much (certainly better than the other way around).

cc: @TheNeuralBit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(let's create a Jira for this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemaio-expansion-service actually is just that class already, it just pulls in other dependencies and publishes a shaded jar. With this change the only other use of the schemaio-expansion-service shaded jar is for jdbc:

serviceGradleTarget = ":sdks:java:extensions:schemaio-expansion-service:runExpansionService"

':sdks:java:extensions:schemaio-expansion-service:shadowJar')

We could instead make a jdbc expansion service, and stop creating a shaded jar for schemaio-expansion-service.

Copy link
Contributor

@chamikaramj chamikaramj Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't SchemaIO be used by other non-GCP connectors in the future ?
cc: @pabloem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but they could publish their own expansion services in the same way right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should maintain a grouped set of limited expansion service jars. Each expansion service jar is a shaded jar and can be large.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair point, but regardless I think what we're both proposing is aligned. I'm just suggesting that instead of making a new extension to contain ExternalSchemaIOTransformRegistrar.java we can leave it where it is, and move away from publishing a shaded expansion service jar with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants