Skip to content

Comments

[BEAM-4615] Flink job server wrapper and shadow jar#5726

Merged
jkff merged 1 commit intoapache:masterfrom
bsidhom:flink-job-server-gradle
Jun 26, 2018
Merged

[BEAM-4615] Flink job server wrapper and shadow jar#5726
jkff merged 1 commit intoapache:masterfrom
bsidhom:flink-job-server-gradle

Conversation

@bsidhom
Copy link
Contributor

@bsidhom bsidhom commented Jun 21, 2018

Adds a JavaExec target that can be remotely debugged and a runnable uber jar that packages the Flink job server.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

@bsidhom
Copy link
Contributor Author

bsidhom commented Jun 21, 2018

R: @angoenka

}

task runJobServer(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a configuration for this like 'jobServer' and use

classpath = configurations.jobServer

This will allow you to run using the post shaded artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do that originally, but we would effectively have to duplicate every dependency above in the configuration. In any case, I don't understand how doing so would have different shading semantics from this. I thought that the reason to do it was to maintain separate dependencies between the portable and non-portable runners.

Copy link
Member

@lukecwik lukecwik Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to duplicate every dependency.

configurations {
  jobServer
}

dependencies {
  ...
  jobServer project(path: project.path, configuration: "shadow")
}

task runJobServer {
  classpath = configurations.jobServer
  ...
}

should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still recommend a distinct configuration now that we actually export the artifact in a new project?

jvmArgs = ["-Xdebug", "-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"]
}

task jobServerShadowJar(type: com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar) {
Copy link
Member

@lukecwik lukecwik Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed beyond the javaexec task above?

If its needed, does this need to be a separate artifact from what we currently ship as the Flink jar with Apache Beam?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed in order to make the job server submittable to a Flink cluster via flink run. If we already had an executable uberjar that we publish that would be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should publish both the old Flink artifact which people use right now and this job server artifact with Apache Beam. The easiest way to do this is to create a separate job-server subproject in a directory below the Flink directory and produce this as an artifact.

Copy link
Contributor

@angoenka angoenka Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the uber flink jar then we can use it as it is.
What is the command to create the uber jar in gradle and where does it put the jar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukecwik I think that makes sense. I'm not sure how to reach into a different project's configurations in order to construct the dependency set though.

Copy link
Member

@lukecwik lukecwik Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that this would add the project and all its dependencies to the current project's compile scope dependency list:

configurations {
  compile project(path: "beam-runners-flink", configuration: "shadow")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing it in a different way. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main downside of my approach is that it requires an evaluation dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angoenka With the new layout, to create the uber jar, you run ./gradlew -p runners/flink/job-server shadowJar. It will output the uber jar at runners/flink/job-server/build/libs/flink-job-server.jar. Note that if we plan to actually publish this, we may want to go back to a versioned artifact name.

In order to debug the job server through gradle: ./gradlew -p runners/flink/job-server runShadow -PjobHost=<server address> -PartifactsDir=<artifact staging directory>.

@bsidhom
Copy link
Contributor Author

bsidhom commented Jun 22, 2018

I'm not sure what's causing the build error in job-server as it builds on my machine just fine.

* the following projects are evaluated before we evaluate this project. This is because
* we reference "sourceSets.shadow.output" directly.
*/
evaluationDependsOn(":beam-runners-flink_2.11")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the version 2_11 at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be decided and done as a different change, since it already exists that way in master and was explicitly decided.

archiveName = "flink-job-server.jar"
mergeServiceFiles()
append "reference.conf"
def flinkProject = project(":beam-runners-flink_2.11")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is how the projects are named in master. We can change those separately if desired, but this is generally the convention used for java artifacts that depend on specific Scala binaries.

// task will not work because the flink runner classes only exist in the shadow
// jar.
runShadow {
def jobHost = project.hasProperty("jobHost") ? project.property("jobHost") : "localhost:8099"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to setup classpath and main parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I've applied the application plugin above in order to get access to the runShadow task. This allows us to configure the main entrypoint globally for this build file via mainClassName.

// NOTE: runShadow must be used in order to run the job server. The standard run
// task will not work because the flink runner classes only exist in the shadow
// jar.
runShadow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(type: JavaExec)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to specify the type since runShadow is already of the correct type.

jvmArgs = ["-Xdebug", "-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"]
}

shadowJar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(type: com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to specify the type.

append "reference.conf"
def flinkProject = project(":beam-runners-flink_2.11")
from flinkProject.configurations.runtime
from flinkProject.sourceSets.main.output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add mainfest
manifest { attributes "Main-Class": "org.apache.beam.runners.flink.FlinkJobServerDriver" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, removes a thing that people have to specify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted above, this is already done globally via mainClassName. This is a feature of the application plugin, which itself is required for runShadow functionality.

jvmArgs = ["-Xdebug", "-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"]
}

shadowJar {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to specify the type.

}

shadowJar {
archiveName = "flink-job-server.jar"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make this the shadowClosure for applyJavaNature like:

applyJavaNature(failOnWarning: true, shadowClosure: { ... contents of shadowJar task ... })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't seen that before! Done.

mergeServiceFiles()
append "reference.conf"
def flinkProject = project(":beam-runners-flink_2.11")
from flinkProject.configurations.runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you were able to drop the from clauses and instead used:

dependencies {
  compile project(path: ":beam-runners-flink", configuration: "shadow")
}

and then you wouldn't need the evaluationDependsOn and the shading plugin should be already configured to bundle all jars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's much better! It's also necessary in order to get shadowJar to work via applyJavaNature.

append "reference.conf"
def flinkProject = project(":beam-runners-flink_2.11")
from flinkProject.configurations.runtime
from flinkProject.sourceSets.main.output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, removes a thing that people have to specify.

// NOTE: runShadow must be used in order to run the job server. The standard run
// task will not work because the flink runner classes only exist in the shadow
// jar.
runShadow {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to specify the type since runShadow is already of the correct type.

* limitations under the License.
*/

apply from: project(":").file("build_rules.gradle")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that build_rules.gradle was deleted and migrated to a binary plugin, you'll want:

apply plugin: org.apache.beam.gradle.BeamModulePlugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that explains the precommit error. Thanks for pointing that out. Rebased and fixed.

@bsidhom bsidhom force-pushed the flink-job-server-gradle branch from 0ec96ef to 6fd8659 Compare June 22, 2018 21:40
@bsidhom
Copy link
Contributor Author

bsidhom commented Jun 22, 2018

Thanks for the feedback. I think it's much cleaner now.

failOnWarning: true,
shadowClosure: {
archiveName = "flink-job-server.jar"
mergeServiceFiles()
Copy link
Member

@lukecwik lukecwik Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults are:

        classifier = "shaded"
        mergeServiceFiles()
        into("META-INF/") {
          from "${project.rootProject.projectDir}/LICENSE"
          from "${project.rootProject.projectDir}/NOTICE"
        }

so I think we can drop mergeServiceFiles()

Finally is it important to set the archiveName? (since I'm not sure how this will impact maven publishing and what artifact gets produced)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and dropped mergeServiceFiles(). Setting the archive name is not necessary. This is a remnant from when the jar was output in the same project as the core flink jar, so it needed a distinct name. I've removed this as well.

Copy link
Contributor Author

@bsidhom bsidhom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now use as many shadowJar defaults as possible. Please let me know if there are no more review comments so I can clean up the history before it's merged.

failOnWarning: true,
shadowClosure: {
archiveName = "flink-job-server.jar"
mergeServiceFiles()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and dropped mergeServiceFiles(). Setting the archive name is not necessary. This is a remnant from when the jar was output in the same project as the core flink jar, so it needed a distinct name. I've removed this as well.

This adds a new subproject that exports a new shadow jar that includes
the job server and all transitive dependencies. It also includes a
runShadow task that can be used for local debugging via gradle.
@bsidhom bsidhom force-pushed the flink-job-server-gradle branch from b4d758d to 45d509c Compare June 25, 2018 22:47
@bsidhom
Copy link
Contributor Author

bsidhom commented Jun 25, 2018

Went ahead and squashed the commits.

@jkff jkff merged commit f55e80f into apache:master Jun 26, 2018
@bsidhom bsidhom deleted the flink-job-server-gradle branch June 28, 2018 17:36
charlesccychen pushed a commit to charlesccychen/beam that referenced this pull request Jul 26, 2018
…and shadow jar

[BEAM-4615] Flink job server wrapper and shadow jar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants