Skip to content

Conversation

@robertwb
Copy link
Contributor

Also refactored the job_server module to be more re-usable and cleaned up the logic in PortableRunner.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@robertwb robertwb requested a review from mxm July 11, 2019 15:22
Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Left some comments inline.

What I'm a bit concerned is that this is a Python-specific solution that might be less portable than a bash script, though that could be argued against.

Also I'm not fully convinced of the concept of a FlinkRunner as opposed to a universal PortableRunner. At least the former just complements the latter. However, it would be nice if other Runners could easily make use of the job server feature. Perhaps this is a specific solution that we generalize later on.

Copy link
Contributor Author

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated out the re-usable bits into a new JavaJarJobServer baseclass. I agree with the downsides of moving off of plain-old PortableRunner, but somewhere we need to encode the details of Flink (e.g. where to find the artifacts, what arguments the executable jar file takes (though we could come up with conventions here), etc.). A (thin) FlinkRunner seems the right place to do that. (Part of the difficulty in running BeamPython-on-Flink was precisely having to manually specify these kinds of things externally.)

As for Python vs. bash, I'd say the former is actually more portable these days (especially considering Windows users), and is much more maintainable. We could see if it makes sense to separate this out into a separate script to be used from non-Java, non-Python.

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Your arguments are convincing. Python can be more portable, especially when it comes to Windows machines. Also, adding specific Runners in addition to the PortableRunner seems ok if it improves the user-experience.

def path_to_jar(self):
raise NotImplementedError(type(self))

def path_to_gradle_target_jar(self, target):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a static method? Looks like this is just a helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ibzib
Copy link

ibzib commented Jul 17, 2019

This looks like a great step toward making the portable Flink runner more usable.

Is it premature to update the documentation along with this PR? https://beam.apache.org/documentation/runners/flink/

@robertwb
Copy link
Contributor Author

I updated the documentation accordingly, assuming this makes it into 2.15.0.

@mxm Any more comments?

<tr>
<td>>=2.14.0</td>
<td>1.9.x</td>
<td>beam-runners-flink-1.9</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following all changes at the moment but I believe 1.9 support is pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's pending the 2.14 release (which isn't quite out yet, but it's clearly marked here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no PR yet and it looks like it's going to be non trivial: https://issues.apache.org/jira/browse/BEAM-7730

It's marked 2.15 now. I'd include this in the PR which adds 1.9 support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, off by one error. (For some reason I though I had been using 1.9 at head.) #9110

robertwb added 3 commits July 19, 2019 10:27
Most module only contain a single runner, and the current structure required
a lot of boilerplate for every new runner added.
@robertwb robertwb force-pushed the easy-flink branch 2 times, most recently from f954ff7 to db12afe Compare July 19, 2019 09:48
@robertwb robertwb merged commit 9589835 into apache:master Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants