-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13732] Switch x-lang BigQueryIO expansion service to GCP one. #16784
Conversation
This keeps our jars from inflating much. Having GCP on the SchemaIO exp. service adds ~60 MB to the jar, as opposed to around 1 MB to the existing GCP exp. service.
R: @chamikaramj @lostluck |
Sidenote: Is there documentation listing which IOs are in which expansion services that I should be updating when I make a change like this? I've only seen this listed in the IO wrapper documentation, and I'm wondering if there's another location I'm missing or if that's all. |
Currently no but this is something we should add. |
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to send some of my changes.
@@ -35,5 +35,13 @@ dependencies { | |||
permitUnusedDeclared project(":sdks:java:expansion-service") // BEAM-11761 | |||
implementation project(":sdks:java:io:google-cloud-platform") | |||
permitUnusedDeclared project(":sdks:java:io:google-cloud-platform") // BEAM-11761 | |||
implementation project(":sdks:java:extensions:schemaio-expansion-service") | |||
permitUnusedDeclared project(":sdks:java:extensions:schemaio-expansion-service") // BEAM-11761 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you just need to depend on "https://github.com/apache/beam/blob/master/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java" not the full expansion service (including all the shaded jars) right ?
I think in the future we should consider just moving that class to "core" or to a new module that both expansion services depend on. This change should be OK if it does not bloat the size of google-cloud-platform jar too much (certainly better than the other way around).
cc: @TheNeuralBit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let's create a Jira for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schemaio-expansion-service actually is just that class already, it just pulls in other dependencies and publishes a shaded jar. With this change the only other use of the schemaio-expansion-service shaded jar is for jdbc:
serviceGradleTarget = ":sdks:java:extensions:schemaio-expansion-service:runExpansionService" |
beam/sdks/python/apache_beam/io/jdbc.py
Line 98 in 8578c9a
':sdks:java:extensions:schemaio-expansion-service:shadowJar') |
We could instead make a jdbc expansion service, and stop creating a shaded jar for schemaio-expansion-service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't SchemaIO be used by other non-GCP connectors in the future ?
cc: @pabloem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but they could publish their own expansion services in the same way right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should maintain a grouped set of limited expansion service jars. Each expansion service jar is a shaded jar and can be large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok fair point, but regardless I think what we're both proposing is aligned. I'm just suggesting that instead of making a new extension to contain ExternalSchemaIOTransformRegistrar.java
we can leave it where it is, and move away from publishing a shaded expansion service jar with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. Thanks.
This keeps our jars from inflating much. Having GCP on the SchemaIO exp. service adds ~60 MB to the jar, as opposed to around 1 MB to the existing GCP exp. service.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.