-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-7722] Add a Python FlinkRunner that fetches and uses released artifacts. #9043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mxm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. Left some comments inline.
What I'm a bit concerned is that this is a Python-specific solution that might be less portable than a bash script, though that could be argued against.
Also I'm not fully convinced of the concept of a FlinkRunner as opposed to a universal PortableRunner. At least the former just complements the latter. However, it would be nice if other Runners could easily make use of the job server feature. Perhaps this is a specific solution that we generalize later on.
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've separated out the re-usable bits into a new JavaJarJobServer baseclass. I agree with the downsides of moving off of plain-old PortableRunner, but somewhere we need to encode the details of Flink (e.g. where to find the artifacts, what arguments the executable jar file takes (though we could come up with conventions here), etc.). A (thin) FlinkRunner seems the right place to do that. (Part of the difficulty in running BeamPython-on-Flink was precisely having to manually specify these kinds of things externally.)
As for Python vs. bash, I'd say the former is actually more portable these days (especially considering Windows users), and is much more maintainable. We could see if it makes sense to separate this out into a separate script to be used from non-Java, non-Python.
mxm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Your arguments are convincing. Python can be more portable, especially when it comes to Windows machines. Also, adding specific Runners in addition to the PortableRunner seems ok if it improves the user-experience.
| def path_to_jar(self): | ||
| raise NotImplementedError(type(self)) | ||
|
|
||
| def path_to_gradle_target_jar(self, target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a static method? Looks like this is just a helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
This looks like a great step toward making the portable Flink runner more usable. Is it premature to update the documentation along with this PR? https://beam.apache.org/documentation/runners/flink/ |
|
I updated the documentation accordingly, assuming this makes it into 2.15.0. @mxm Any more comments? |
| <tr> | ||
| <td>>=2.14.0</td> | ||
| <td>1.9.x</td> | ||
| <td>beam-runners-flink-1.9</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following all changes at the moment but I believe 1.9 support is pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's pending the 2.14 release (which isn't quite out yet, but it's clearly marked here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no PR yet and it looks like it's going to be non trivial: https://issues.apache.org/jira/browse/BEAM-7730
It's marked 2.15 now. I'd include this in the PR which adds 1.9 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, off by one error. (For some reason I though I had been using 1.9 at head.) #9110
Most module only contain a single runner, and the current structure required a lot of boilerplate for every new runner added.
f954ff7 to
db12afe
Compare
Also refactored the job_server module to be more re-usable and cleaned up the logic in PortableRunner.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.