-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helper task to print pipeline options for Dataflow portability #6979
Conversation
Currently it is difficult to manually run a portabiltity pipeline as a Beam user. This change adds a new Gradle task to conveniently generate necessary artifacts and print the necessary commandline options. Usage: ./gradlew -p runners/google-cloud-dataflow-java printFnApiPipelineOptions
Run Dataflow PortabilityApi ValidatesRunner |
We also launch portable pipelines in javaPrecommit: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/examples/build.gradle#L79. Do you think it's necessary adding this help into the precommit? @swegner |
The reason I found a need for this: I'm trying to run a simple prototype pipeline from outside of the Gradle build (as a user, following the Java Quickstart). So the commandline I need to run is something like:
I don't believe we publish maven artifacts to fulfill It seems there's a lot of redundant logic in the examples project you link above (i.e. |
For maven build, the worker jar is not published: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/build.gradle#L57 in purpose. For end users, they are supposed to use the worker jar cached by Dataflow service. |
For redundant logic, I'm not sure whether it's worthy unifying them since there are just 2 usages across the whole project, but feel free to refactor them if needed. |
7b13226
to
3fa92b6
Compare
Run Dataflow PortabilityApi ValidatesRunner |
I believe the service also caches portable worker jar. If the dataflowWorkerJar option is missing, then it will pick the caches worker jar. Here is the design doc, maybe more illustrative: https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit#heading=h.gh88g5y0rekp |
Thanks, good to know. I think this helper is still useful in order to use the worker image from HEAD since the cached version in the service may be updated less frequently. I believe this is ready for review. @boyuanzz PTAL |
Yes, it would be very helpful if dev is trying to test their dataflow worker changes. Btw, the ValidatesRunner test takes ~3 hr to complete. |
Run Java PortabilityApi PostCommit |
…o print pipeline options for Dataflow portability" This appears to be breaking post-commit tests: https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Cron/26 This reverts commit 42984a8, reversing changes made to b558c4d.
FYI, this appears to be breaking post-commits: https://issues.apache.org/jira/browse/BEAM-6028 |
… Add helper task to print pipeline options for Dataflow portability
Currently it is difficult to manually run a portabiltity pipeline as a
Beam user. This change adds a new Gradle task to conveniently generate
necessary artifacts and print the necessary commandline options.
Usage:
Sample output:
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username
) to look at it.Post-Commit Tests Status (on master branch)