New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-2877][BEAM-2881] Add Java SDK harness container image and support #3928
Conversation
R: @tgroh @wcn3 (Kenn - this PR seems to trivially conflict with your proto shuffle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass.
runners/gcp/gcsproxy/main.go
Outdated
pb.RegisterArtifactStagingServiceServer(gs, proxy) | ||
|
||
default: | ||
log.Fatalf("Invalid mode: '%v', want 'retrieve' or 'stage'", *mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the constants you defined instead of hardcoding the strings here. Otherwise, the constants are used exactly once and wouldn't need to be declared constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sdks/go/BUILD.md
Outdated
# Go build | ||
|
||
This document describes the [Go](golang.org) code layout and build integration | ||
with maven. The setup is non-trivial, because the Go toolchain expects a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maven should be capitalized throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sdks/go/cmd/beamctl/main.go
Outdated
if endpoint == "" { | ||
return nil, nil, errors.New("endpoint not defined") | ||
} | ||
cc, err := grpc.Dial(endpoint, grpc.WithInsecure()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want grpc.WithBlock() here, and possibly a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return fmt.Errorf("no artifact named %v for location %v", l.Name, l.Uri) | ||
} | ||
if !fresh { | ||
return fmt.Errorf("multiple location for %v:%v", l.Name, l.Uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
manifest string | ||
bucket, root string | ||
blobs map[string]staged | ||
mu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments describing what mu is protecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
manifest := req.GetManifest() | ||
|
||
s.mu.Lock() | ||
loc, err := matchLocations(manifest.GetArtifact(), s.blobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What ifmatchLocations was a member of StagingServer so it can manage the lock, so internally it can do a
s.mu.Lock() and defer s.mu.Unlock()
This would read a bit cleaner, and be more modular. With this approach, if someone moves/deletes line 77, chaos would ensue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The lock is currently held in the two public methods only, which lets matchLocations be a pure function. I think the current form is simpler than adding methods and state assumptions, but we can reevaluate if the code becomes more complicated. I wish Go had block-scoped locking, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block-scoped locking would help. That's fine, since the locking policy is cohesive and there are only a few locking methods.
) | ||
|
||
// Execute runs the program with the given arguments. It attaches stdio to the | ||
// child process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attaches other things too. Since this is the godoc, it should be accurate and include everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you read "stdio" as "stdin"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did. Thanks.
// (3) Invoke the Java harness, preserving artifact ordering in classpath. | ||
|
||
os.Setenv("PIPELINE_OPTIONS", options) | ||
os.Setenv("LOGGING_API_SERVICE_DESCRIPTOR", fmt.Sprintf("id: \"1\"\nurl: \"%v\"\n", *loggingEndpoint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't JSON encoded like options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I think they're text proto, now that I think about it. The code does expect this form.
Thanks @wcn3. PTAL |
I'm happy to merge this before the proto shuffle and resolved conflicts. I'm assuming reviews will conclude in that order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The POMs look basically reasonable
sdks/java/container/pom.xml
Outdated
<artifactId>beam-sdks-java-harness</artifactId> | ||
<overWrite>true</overWrite> | ||
</artifactItem> | ||
<!-- java harness dependencies that are not staged --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected these to be bundled into an uberjar and come along inside the harness JAR (and thus not need to be staged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
sdks/java/container/pom.xml
Outdated
</artifactItem> | ||
<artifactItem> | ||
<groupId>org.apache.beam</groupId> | ||
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: BEAM-2566 remove this dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment in harness POM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go code looks good.
Thanks @wcn3 ! |
Thanks @tgroh. PTAL |
* Add support for building Go code and docker container images with maven (see sdks/go/BUILD.md for details). The latter is only done if the "build-containers" profile is used. * Add GCS proxy service for managing artifacts in GCS. * Add GCE md service for metdata-configured provision info in GCE. * Add beamctl tool for manually interacting with these services. This PR is focused on the execution side and would need support from the submission side as well to be functional. The ULR will likely be the first runner to tie everything together. The contents of the java image is kept simple for now.
Fixed artifact materialize to accommodate different umask settings. The test assumed 0022, but would fail with 0027. @tgroh PTAL. |
Oh dang, one more request: Add a parallel go-dependents jenkins build to the |
maven (see sdks/go/BUILD.md for details). The latter is only done
if the "build-containers" profile is used.
This PR is focused on the execution side and would need support from
the submission side as well to be functional. The ULR will likely be
the first runner to tie everything together. The contents of the java
image is kept simple for now.