Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2877][BEAM-2881] Add Java SDK harness container image and support #3928

Closed
wants to merge 1 commit into from

Conversation

herohde
Copy link
Contributor

@herohde herohde commented Oct 1, 2017

  • Add support for building Go code and docker container images with
    maven (see sdks/go/BUILD.md for details). The latter is only done
    if the "build-containers" profile is used.
  • Add GCS proxy service for managing artifacts in GCS.
  • Add GCE md service for metdata-configured provision info in GCE.
  • Add beamctl tool for manually interacting with these services.

This PR is focused on the execution side and would need support from
the submission side as well to be functional. The ULR will likely be
the first runner to tie everything together. The contents of the java
image is kept simple for now.

@herohde
Copy link
Contributor Author

herohde commented Oct 1, 2017

R: @tgroh @wcn3
CC: @lukecwik @kennknowles @aljoscha

(Kenn - this PR seems to trivially conflict with your proto shuffle)

Copy link
Contributor

@wcn3 wcn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.

pb.RegisterArtifactStagingServiceServer(gs, proxy)

default:
log.Fatalf("Invalid mode: '%v', want 'retrieve' or 'stage'", *mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the constants you defined instead of hardcoding the strings here. Otherwise, the constants are used exactly once and wouldn't need to be declared constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sdks/go/BUILD.md Outdated
# Go build

This document describes the [Go](golang.org) code layout and build integration
with maven. The setup is non-trivial, because the Go toolchain expects a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maven should be capitalized throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if endpoint == "" {
return nil, nil, errors.New("endpoint not defined")
}
cc, err := grpc.Dial(endpoint, grpc.WithInsecure())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want grpc.WithBlock() here, and possibly a timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return fmt.Errorf("no artifact named %v for location %v", l.Name, l.Uri)
}
if !fresh {
return fmt.Errorf("multiple location for %v:%v", l.Name, l.Uri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: locations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

manifest string
bucket, root string
blobs map[string]staged
mu sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments describing what mu is protecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

manifest := req.GetManifest()

s.mu.Lock()
loc, err := matchLocations(manifest.GetArtifact(), s.blobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ifmatchLocations was a member of StagingServer so it can manage the lock, so internally it can do a

s.mu.Lock() and defer s.mu.Unlock()

This would read a bit cleaner, and be more modular. With this approach, if someone moves/deletes line 77, chaos would ensue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The lock is currently held in the two public methods only, which lets matchLocations be a pure function. I think the current form is simpler than adding methods and state assumptions, but we can reevaluate if the code becomes more complicated. I wish Go had block-scoped locking, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block-scoped locking would help. That's fine, since the locking policy is cohesive and there are only a few locking methods.

)

// Execute runs the program with the given arguments. It attaches stdio to the
// child process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attaches other things too. Since this is the godoc, it should be accurate and include everything.

Copy link
Contributor Author

@herohde herohde Oct 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you read "stdio" as "stdin"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did. Thanks.

// (3) Invoke the Java harness, preserving artifact ordering in classpath.

os.Setenv("PIPELINE_OPTIONS", options)
os.Setenv("LOGGING_API_SERVICE_DESCRIPTOR", fmt.Sprintf("id: \"1\"\nurl: \"%v\"\n", *loggingEndpoint))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't JSON encoded like options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I think they're text proto, now that I think about it. The code does expect this form.

@herohde
Copy link
Contributor Author

herohde commented Oct 2, 2017

Thanks @wcn3. PTAL

@kennknowles
Copy link
Member

I'm happy to merge this before the proto shuffle and resolved conflicts. I'm assuming reviews will conclude in that order.

Copy link
Member

@tgroh tgroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The POMs look basically reasonable

<artifactId>beam-sdks-java-harness</artifactId>
<overWrite>true</overWrite>
</artifactItem>
<!-- java harness dependencies that are not staged -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected these to be bundled into an uberjar and come along inside the harness JAR (and thus not need to be staged)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

</artifactItem>
<artifactItem>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: BEAM-2566 remove this dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment in harness POM

Copy link
Contributor

@wcn3 wcn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go code looks good.

@herohde
Copy link
Contributor Author

herohde commented Oct 4, 2017

Thanks @wcn3 !

@herohde
Copy link
Contributor Author

herohde commented Oct 4, 2017

Thanks @tgroh. PTAL

 * Add support for building Go code and docker container images with
   maven (see sdks/go/BUILD.md for details). The latter is only done
   if the "build-containers" profile is used.
 * Add GCS proxy service for managing artifacts in GCS.
 * Add GCE md service for metdata-configured provision info in GCE.
 * Add beamctl tool for manually interacting with these services.

This PR is focused on the execution side and would need support from
the submission side as well to be functional. The ULR will likely be
the first runner to tie everything together. The contents of the java
image is kept simple for now.
@herohde
Copy link
Contributor Author

herohde commented Oct 5, 2017

Fixed artifact materialize to accommodate different umask settings. The test assumed 0022, but would fail with 0027. @tgroh PTAL.

@tgroh
Copy link
Member

tgroh commented Oct 6, 2017

Oh dang, one more request: Add a parallel go-dependents jenkins build to the .test-infra/jenkins directory (can be basically copied from python, just with sdks/python replaced with sdks/go, I believe)

@asfgit asfgit closed this in 63b54a5 Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants