Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-4176] Initial implementation for running portable runner tests #5935

Merged
merged 11 commits into from Jul 25, 2018

Conversation

angoenka
Copy link
Contributor

@angoenka angoenka commented Jul 12, 2018

Basic idea is to start a job server using a JobServerDriver which can be initialized with provided parameters.
TestPortableRunner will start a new instance of JobServer for every test and then run the test on it.

I have kept the maxParallelForks to 1 for now as i want to first make sure that all the tests pass with no concurrency.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

@angoenka
Copy link
Contributor Author

@angoenka
Copy link
Contributor Author

Note that the validate runner tests are not yet passing for flink and will have to be debugged separately

@jkff
Copy link
Contributor

jkff commented Jul 12, 2018

Wow, nice!!! (didn't look at code yet) Can we start with ULR then? CC: @youngoli

@angoenka
Copy link
Contributor Author

Yes, though I have not plugged in the gradle task for ULR but it should just follow the Flink task.

@angoenka
Copy link
Contributor Author

verifyPAssertsSucceeded requires support for metrics in pipelineResult which is causing almost all the tests to fail.

@bsidhom
Copy link
Contributor

bsidhom commented Jul 12, 2018

Ah, yes. I had forgotten about that. That will cause problems for us unless we somehow rewrite PAssert to not depend on metrics if we're running portably. I guess a pipeline option could do it.

@angoenka angoenka force-pushed the portable_test_runner branch 2 times, most recently from 4d2d949 to 36a6595 Compare July 12, 2018 18:32
@tweise
Copy link
Contributor

tweise commented Jul 14, 2018

JIRA?

@angoenka angoenka changed the title Initial implementation for running portable runner tests [BEAM-4176] Initial implementation for running portable runner tests Jul 16, 2018
@angoenka
Copy link
Contributor Author

The tests are not passing yet but that seems to be a separate issue.
We can proceed with this PR review and get it in so that people can easily run the tests and iterate.

Copy link
Contributor

@axelmagn axelmagn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does validatesRunner get run as part of any automated framework? I'm happy to merge this and then iterate on getting tests to pass, but I worry about putting any automated tests into a red state.

@Option(name = "--job-host", required = true, usage = "The job server host string")
/** Configuration for the jobServer. */
public static class ServerConfiguration {
@Option(name = "--job-host", usage = "The job server host string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we no longer requiring this?

Copy link
Contributor Author

@angoenka angoenka Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ports can be chosen dynamically if not provided.
This help facilitate launch of multiple jobServices on the same machine by finding an unused port.

Finding and managing unused ports reliably is surprisingly difficult. The major challenge is that once you pick a port without using it, it can be picked and used by some other process without your knowledge.

Please let me know of any other ways/library to get dynamic unused ports in tests case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. that makes sense.

I have no idea about other ways to manage unused ports (short of writing our own manager), so I think you're right that this is the way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a local port? If so, it's quite easy to do in Java by allocating a new TCP server socket at port 0. You can then query it for the chosen port. This works as long as you are able to pass that socket directly to the service implementation. (Note that you can do this with gRPC and we have a Beam-specific tool that does this for you).

If you're trying to find an unused port for a non-Java server or for a server that will run separately (i.e., for which you do not directly control the socket), you can use the same server allocation trick and note the chosen port number. You can then kill the server socket and pass that port into your new server's configuration. Note that doing this approach introduces a race condition but will work in most cases (especially with retries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be possible.
Comparing the approach

  1. Optional port and query the port back
  2. Passing a socket to the flink driver

I feel option 1 seems to be simpler while keeping initialization flexible.

@angoenka
Copy link
Contributor Author

Ping for the review!
cc: @bsidhom @lukecwik @axelmagn @ryan-williams

boolean streaming
}

def createPortableValidatesRunnerTask(Map m) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use a lambda and the implicit it parameter as it allows you to call the method without any paramemters or with parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.
I am not very sure with the semantics so please take a look again.

}
}

createPortableValidatesRunnerTask(name: "validatesPortableRunner", jobServerDriver: "org.apache.beam.runners.flink.FlinkJobServerDriver", jobServerConfig: "[]", streaming: false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect jobServerDriver to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for Flink.

But I want to move it to createPortableValidatesRunnerTask reference runner project so that it can be reused. I am not sure how to do it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to move it into the BeamModulePlugin.groovy and expose it like applyJavaNature and the other methods there. This can happen in a follow up PR.

@@ -45,3 +59,55 @@ runShadow {
// Enable remote debugging.
jvmArgs = ["-Xdebug", "-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"]
}

class PortableValidatesRunnerConfig {
String name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for what each of these configuration properties do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

* the following projects are evaluated before we evaluate this project. This is because
* we are attempting to reference the "sourceSets.test.output" directly.
*/
evaluationDependsOn(":beam-runners-flink_2.11")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You reference :beam-sdks-java-core and :beam-runners-core-java sourceSets.test.output directly and not :beam-runners-flink_2.11

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

dependencies {
compile project(path: ":beam-runners-flink_2.11", configuration: "shadow")
validatesRunner project(path: ":beam-runners-flink_2.11", configuration: "shadowTest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add :beam-sdks-java-core / shadowTest and :beam-runners-core-java / shadowTest as a validatesRunner dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

"ArtifactStagingServer stopped on {}", jobServer.getApiServiceDescriptor().getUrl());
artifactStagingServer = null;
} catch (Exception e) {
LOG.error("Error while closing the artifactStagingServer.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

@Required
@Description(
"Fully qualified class name of TestJobServiceDriver capable of managing the JobService.")
String getJobServerDriver();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be of type Class, PipelineOptionsFactory will convert string -> Class for you automatically during parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

void setJobServerDriver(String jobServerDriver);

@Description("String containing comma separated arguments for the JobServer.")
String getJobServerConfig();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineOptionsFactory supports parsing List or String[] automatically. There is no need to do this yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

* {@link TestPortableRunner} is a pipeline runner that wraps a {@link PortableRunner} when running
* tests against the {@link TestPipeline}.
*
* <p>This runner requires a JobServerDriver with following methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the FlinkJobServerDriver implement a new interface TestPortableRunner.JobServerDriver which exposes these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Take it up in subsequent diff as there is some class dependency issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the issue is that TestPortableRunner shouldn't be part of the reference runner subproject but in a place where every runner could refer to it. SGTM as a change in a subsequent PR.

try {
jobServerDriver =
jobServerDriverClass
.getMethod("fromParams", String[].class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

@ryan-williams
Copy link
Contributor

I don't have comments on this change yet, but wanted to just mention this prototype of an ArtifactStagingService that:

  • detects artifacts that have already been uploaded (by MD5)
  • closes the connection to prevent redundant artifacts from being uploaded
  • leaves the manifest with all the entries for each job (just with URIs that may point to where a previous job staged the artifact)

It doesn't help the main pain point here (re-staging ~200MB of artifacts for each test case) because it doesn't work across artifact-server instances, but that's the next extension I'm planning to look into.

Copy link
Contributor Author

@angoenka angoenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luke.
Applied the review comments.

* the following projects are evaluated before we evaluate this project. This is because
* we are attempting to reference the "sourceSets.test.output" directly.
*/
evaluationDependsOn(":beam-runners-flink_2.11")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

dependencies {
compile project(path: ":beam-runners-flink_2.11", configuration: "shadow")
validatesRunner project(path: ":beam-runners-flink_2.11", configuration: "shadowTest")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

@@ -45,3 +59,55 @@ runShadow {
// Enable remote debugging.
jvmArgs = ["-Xdebug", "-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"]
}

class PortableValidatesRunnerConfig {
String name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

boolean streaming
}

def createPortableValidatesRunnerTask(Map m) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.
I am not very sure with the semantics so please take a look again.

}
}

createPortableValidatesRunnerTask(name: "validatesPortableRunner", jobServerDriver: "org.apache.beam.runners.flink.FlinkJobServerDriver", jobServerConfig: "[]", streaming: false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for Flink.

But I want to move it to createPortableValidatesRunnerTask reference runner project so that it can be reused. I am not sure how to do it though.

@Required
@Description(
"Fully qualified class name of TestJobServiceDriver capable of managing the JobService.")
String getJobServerDriver();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

void setJobServerDriver(String jobServerDriver);

@Description("String containing comma separated arguments for the JobServer.")
String getJobServerConfig();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

* <ul>
* <li>public static Object fromParams(String... params)
* <li>public String start() // Start JobServer and returns the JobServer host and port.
* <li>public void start() // Stop the JobServer and free all resources.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

try {
jobServerDriver =
jobServerDriverClass
.getMethod("fromParams", String[].class)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

* {@link TestPortableRunner} is a pipeline runner that wraps a {@link PortableRunner} when running
* tests against the {@link TestPipeline}.
*
* <p>This runner requires a JobServerDriver with following methods.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Take it up in subsequent diff as there is some class dependency issue.

@tweise
Copy link
Contributor

tweise commented Jul 25, 2018

Run Java PreCommit

@angoenka
Copy link
Contributor Author

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, made some minor changes directly. Will wait for tests to be green to confirm my edits were ok and then will merge.

@angoenka
Copy link
Contributor Author

angoenka commented Jul 25, 2018

Thanks Luke.
I have created a jira to track the 2 changes we discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants