New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3287] Add Go support for universal runners, incl Flink #4888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting! I have largely documentation related comments, but otherwise looks good.
@@ -28,12 +28,12 @@ import ( | |||
const ( | |||
// Model constants | |||
|
|||
URNImpulse = "urn:beam:transform:impulse:v1" | |||
URNImpulse = "beam:transform:impulse:v1" | |||
URNParDo = "urn:beam:transform:pardo:v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ParDo still be prefixed with "urn:" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that one is still the old way.
// limitations under the License. | ||
|
||
// Package runnerlib contains utilities for submitting Go pipelines | ||
// to a Beam model runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idle question & statement: Beam Model runner is what's we're going with for the name of portability runners long term? I like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is an agreed upon name long-term. The code package is "universal", so I'm not even internally consistent in this PR :)
// Experiments are additional experiments. | ||
Experiments []string | ||
|
||
// TODO(herohde) 3/17/2018: add further parametrization as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: parameterization
} | ||
|
||
// Prepare prepares a job to the given endpoint. It returns an id and endpoint, if successful. | ||
func Prepare(ctx context.Context, client jobpb.JobServiceClient, p *pb.Pipeline, opt *JobOptions) (string, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function returns two parameters of the same type, it may be useful to use named parameters to be unambiguous about which returned value is which. eg (id, endpoint string, err error)
This is not advocating for naked returns being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
InternalJavaRunner string | ||
} | ||
|
||
// Prepare prepares a job to the given endpoint. It returns an id and endpoint, if successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to improve this comment, as I'm having difficulty reconciling what's here, and in the implementation.
Knowing nothing about what PrepareJob request on the service does, but reading this implementation, it looks like the idea is to prepare the runner to receive the job, largely by getting a place to stage artifacts (like the worker) to run the job later.
So, it's preparing the service for the job. Is that right?
With that in mind, and looking at how the returns are used in this PR perhaps:
// Prepare prepares a given Beam model runner to receive a job. If successful, Prepare returns an id, and a URL
// endpoint where the runner expects job artifacts to be staged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Rephrased the description.
return resp.GetPreparationId(), resp.GetArtifactStagingEndpoint().GetUrl(), nil | ||
} | ||
|
||
// Submit submits a job to the given endpoint. It returns a jobID, if successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the comment on Prepare, we should probably change "given endpoint" here to "given Beam model runner" for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made them consistent.
"github.com/apache/beam/sdks/go/pkg/beam/util/grpcx" | ||
) | ||
|
||
// Stage stages the worker binary and any additional content to the given endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "content" be changed to "artifacts" for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to files, which is what this helper function accepts.
I think I'm not really helpful since I don't really know Go but the structure of this looks good! Did you actually try this against the hacking job server we have? What did you get? |
* Extracted a runnerlib for runners that wish to use the portable model, but adds special logic or integration.
@aljoscha Thanks! Yes! That is how I'm testing this stuff. I'm currently hitting an issue with the artifacts on the Flink side:
Flink runner logs (info):
|
@lostluck Thanks! PTAL |
Thanks @aljoscha. I figured it was something simple. |
Now I get:
It seems we're mangling the name. The correct file, named "worker", is indeed in this directory:
|
Yes, you're right. This was a case of misapplied mangling. It should be fixed now: |
Tried to remove the demangling and using the go container and wordcount seems to work:
|
@axelmagn Tested with your changes and it seems to work as well! Awesome! |
beam.RegisterRunner("universal", Execute) | ||
} | ||
|
||
// Execute execute the pipeline on a universal beam runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute executes...
beam.RegisterRunner("universal", Execute) | ||
} | ||
|
||
// Execute execute the pipeline on a universal beam runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute executes...
beam.RegisterRunner("universal", Execute) | ||
} | ||
|
||
// Execute execute the pipeline on a universal beam runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute executes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Extracted a runnerlib for runners that wish to use the portable model, but adds special logic or integration. Updated the portable URNs.
To run against a Flink runner, do from sdks/go:
$ go run examples/wordcount/wordcount.go --runner=flink --output=/tmp/foo --endpoint=localhost:3000