[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph #6199

zentol · 2018-06-21T21:13:51Z

What is the purpose of the change

This PR moves the logic for uploading jars/artifacts from the jobgraph into a separate utility class usable by all submission methods.

The new ClientUtils class exposes 2 methods for uploading jars/artifacts and setting the respective blob keys on the JobGraph.
All existing job-submission method were updated to use the new utilities and should now behave the same.

The subsumed methods in JobGraph were removed, but remnants of them remain in 2 added methods:

setUserArtifactBlobKey sets the blobkey for a specific entry
finalizeUserArtifactEntries writes the artifact entries into the ExecutionConfig
We could also do the latter in the JobManager when assembling the TaskDeploymentDescriptor; in any case we can now just shuffle this method around to where we want it.

Verifying this change

ClientUtils is tested in ClientUtilsTest
JobGraph changes are covered in JobGraphTest
client modifications are covered by various existing tests

zentol · 2018-06-21T21:14:10Z

flink-runtime/src/test/java/org/apache/flink/runtime/client/ClientUtilsTest.java

+import static org.junit.Assert.assertEquals;
+
+/**
+ * TODO: add javadoc.


missing javadoc

zentol · 2018-06-21T21:14:24Z

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java

-		uploadViaBlob(blobServerAddress, blobClientConfig, uploadToBlobServer);
-
-		for (Map.Entry<String, DistributedCache.DistributedCacheEntry> userArtifact : distributeViaDFS) {
+	public void finalizeUserArtifactEntries() {


missing test

zentol · 2018-06-21T21:15:17Z

flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java

-				for (PermanentBlobKey key : keys) {
-					jobGraph.addUserJarBlobKey(key);
+				List<Path> userJars = jobGraph.getUserJars();
+				Map<String, DistributedCache.DistributedCacheEntry> userArtifacts = jobGraph.getUserArtifacts();


this entire block is effectively duplicated in several classes and could also be moved to ClientUtils, but I wasn't sure whether this wouldn't put too much logic into a single method,

zentol · 2018-06-21T21:16:36Z

flink-runtime/src/main/java/org/apache/flink/runtime/client/ClientUtils.java

+	 * @param blobClient client to upload jars with
+	 * @throws IOException if the upload fails
+	 */
+	public static void uploadAndSetUserJars(JobGraph jobGraph, BlobClient blobClient) throws IOException {


JarRunHandler could use this method as well.

Good point, let's do it then.

zentol · 2018-06-22T08:28:46Z

flink-runtime/src/main/java/org/apache/flink/runtime/client/ClientUtils.java

+		setUserArtifactBlobKeys(jobGraph, blobKeys);
+	}
+
+	private static Collection<Tuple2<String, PermanentBlobKey>> uploadUserArtifacts(JobID jobID, Map<String, DistributedCache.DistributedCacheEntry> userArtifacts, BlobClient blobClient) throws IOException {


Signature could be changed to accept a Map<String, Path> instead.
For consistency of in- and output we could also pass this as a Collection<Tuple2>.

zentol · 2018-06-22T08:37:23Z

flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java

+				List<Path> userJars = jobGraph.getUserJars();
+				Map<String, DistributedCache.DistributedCacheEntry> userArtifacts = jobGraph.getUserArtifacts();
+				if (!userJars.isEmpty() || !userArtifacts.isEmpty()) {
+					try (BlobClient client = new BlobClient(address, flinkConfig)) {


alternatively we could refactor the try-with-resource statement and exception handling into a method that accepts a function, which would be used like this:

ClientUtils.withBlobClient(address, flinkConfig, () -> { log.info("Uploading jar files."); ClientUtils.uploadAndSetUserJars(jobGraph, client); log.info("Uploading jar artifacts."); ClientUtils.uploadAndSetUserArtifacts(jobGraph, client); }

I would be in favour of having a ClientUtils#uploadJobGraphFiles(jobGraph, flinkConfig, Supplier<BlobClient>) which basically does what's being done here.

tillrohrmann

Changes look good to me @zentol. I think it would be a good idea to remove the code redundancy by introducing a ClientUtils#uploadJobGraphFiles method which encapsulates the logic. Moreover, one could get rid of writing the user artifacts into the job configuration which would avoid the two phase user artifact upload procedure which harder to maintain. What do you think?

tillrohrmann · 2018-06-22T12:29:17Z

flink-runtime/src/main/java/org/apache/flink/runtime/client/ClientUtils.java

+			Path path = new Path(userArtifact.getValue().filePath);
+			// only upload local files
+			if (!path.getFileSystem().isDistributedFS()) {
+				final PermanentBlobKey blobKey = blobClient.uploadFile(jobID, new Path(userArtifact.getValue().filePath));


we could reuse path here

tillrohrmann · 2018-06-22T12:43:16Z

flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java

+				List<Path> userJars = jobGraph.getUserJars();
+				Map<String, DistributedCache.DistributedCacheEntry> userArtifacts = jobGraph.getUserArtifacts();
+				if (!userJars.isEmpty() || !userArtifacts.isEmpty()) {
+					try (BlobClient client = new BlobClient(address, flinkConfig)) {


I would be in favour of having a ClientUtils#uploadJobGraphFiles(jobGraph, flinkConfig, Supplier<BlobClient>) which basically does what's being done here.

tillrohrmann · 2018-06-22T12:43:45Z

flink-runtime/src/main/java/org/apache/flink/runtime/client/ClientUtils.java

+	 * @param blobClient client to upload jars with
+	 * @throws IOException if the upload fails
+	 */
+	public static void uploadAndSetUserJars(JobGraph jobGraph, BlobClient blobClient) throws IOException {


Good point, let's do it then.

tillrohrmann · 2018-06-22T12:46:53Z

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java

+		try {
+			serializedBlobKey = InstantiationUtil.serializeObject(blobKey);
+		} catch (IOException e) {
+			throw new FlinkRuntimeException("Could not serialize blobkey " + blobKey + ".", e);


I would not throw a FlinkRuntimeException here. Instead we could led the IOException bubble up.

tillrohrmann · 2018-06-22T12:49:14Z

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java

-		uploadViaBlob(blobServerAddress, blobClientConfig, uploadToBlobServer);
-
-		for (Map.Entry<String, DistributedCache.DistributedCacheEntry> userArtifact : distributeViaDFS) {
+	public void finalizeUserArtifactEntries() {


Maybe rename to writeUserArtifactEntriesToConfiguration

tillrohrmann · 2018-06-22T13:00:58Z

flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/JobGraph.java

-		uploadViaBlob(blobServerAddress, blobClientConfig, uploadToBlobServer);
-
-		for (Map.Entry<String, DistributedCache.DistributedCacheEntry> userArtifact : distributeViaDFS) {
+	public void finalizeUserArtifactEntries() {


I think we would not need this method if we don't write the DistributedCacheEntries into the configuration. If I'm not mistaken, then we send the userArtifacts map anyway to the cluster. The things which are missing are: Addind a serial version UID to the DistributedCacheEntry, and adding the userArtifacts to the TaskDeploymentDescriptor to send them to the TaskManager.

I agree that this would nice, but I think that this is out of scope of this PR as we would have to touch an entirely new set of classes.

Alright, please create a follow up JIRA issue.

https://issues.apache.org/jira/browse/FLINK-8713

tillrohrmann · 2018-06-22T13:03:13Z

flink-runtime/src/test/java/org/apache/flink/runtime/client/ClientUtilsTest.java

+
+		assertEquals(jars.size(), jobGraph.getUserJars().size());
+		assertEquals(jars.size(), jobGraph.getUserJarBlobKeys().size());
+		assertEquals(jars.size(), jobGraph.getUserJarBlobKeys().stream().distinct().count());


Assert that we find the blob keys in the blob upload directory.

I will use blobServer.getFile() instead. to verify the validity of the blob keys

tillrohrmann · 2018-06-22T13:03:52Z

flink-runtime/src/test/java/org/apache/flink/runtime/client/ClientUtilsTest.java

+		// 1 unique key for each local artifact, and null for distributed artifacts
+		assertEquals(localArtifacts.size() + 1, jobGraph.getUserArtifacts().values().stream().map(entry -> entry.blobKey).distinct().count());
+		for (DistributedCache.DistributedCacheEntry original : localArtifacts) {
+			assertState(original, jobGraph.getUserArtifacts().get(original.filePath), false);


Assert that the blobs can be found in the blob server storage directory.

tillrohrmann · 2018-06-22T13:10:07Z

Test failure seems to be unrelated.

zentol

@tillrohrmann I believe I've addressed all comments.

zentol · 2018-06-25T08:52:03Z

flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarRunHandler.java

-			try {
-				keys = BlobClient.uploadFiles(address, configuration, jobGraph.getJobID(), jobGraph.getUserJars());
+			try (BlobClient blobClient = new BlobClient(address, configuration)) {
+				ClientUtils.uploadAndSetUserJars(jobGraph, blobClient);


we could use uploadJobGraphFiles here, but there isn't really a use-case for uploading distributed cache artifacts when going through the JarRunHandler, since we're already on the server here.

But it would reduce code redundancy, right? If this is the case, then let's do it.

The JarRunHandler now also uses uploadJobGraphFiles().

tillrohrmann

Nice. +1 for merging.

zentol · 2018-06-28T07:07:38Z

merging.

This closes apache#6199.

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph

13e3dc7

zentol commented Jun 21, 2018

View reviewed changes

zentol commented Jun 22, 2018

View reviewed changes

zentol mentioned this pull request Jun 22, 2018

[FLINK-9280][rest] Rework JobSubmitHandler to accept jar/artifact files #6203

Closed

tillrohrmann reviewed Jun 22, 2018

View reviewed changes

zentol added 11 commits June 22, 2018 16:14

adjust uploadUserArtifacts signature

f7ac8a9

add missing javadoc

279ded2

do not wrap serialization exception

1f6d02e

+

189bc8b

rename FinalizeUserArtifactEntries

651cc3c

shutdown blobserver in ClientUtilsTest

c2bb698

verify validity of BlobKeys

405f5f8

fix "usage before initialization" warnings

bae97de

add ClientUtils#uploadJobGraphFiles

0a75019

aso use ClientUtils in JarRunHandler

3f509d8

checkstyle

f53366b

zentol commented Jun 22, 2018

View reviewed changes

zentol added 2 commits June 25, 2018 10:46

move isEmpty checks into #uploadJobGraphFiles

a286910

make BlobClient creation optional in MiniCluster

7493e1a

zentol commented Jun 25, 2018

View reviewed changes

checkstyle

710fb8a

zentol force-pushed the 9280_delta branch from 5572149 to 0874442 Compare June 25, 2018 16:13

checkstyle

2b2ca07

zentol force-pushed the 9280_delta branch from 0874442 to 2b2ca07 Compare June 25, 2018 20:10

use #uploadJobGraphFiles in JarRunHandler

fc16f80

zentol force-pushed the 9280_delta branch from 26a4b4e to fc16f80 Compare June 26, 2018 12:47

reduce visibility in ClientUtils

2a49bc3

tillrohrmann approved these changes Jun 27, 2018

View reviewed changes

zentol added a commit to zentol/flink that referenced this pull request Jun 28, 2018

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph

b357410

This closes apache#6199.

asfgit closed this in dd4c846 Jun 28, 2018

zentol deleted the 9280_delta branch June 28, 2018 10:26

sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph

2ac6681

This closes apache#6199.

rmetzger added the component=Runtime/Coordination label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph #6199

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph #6199

zentol commented Jun 21, 2018 •

edited

zentol Jun 21, 2018

zentol Jun 21, 2018

zentol Jun 21, 2018

zentol Jun 21, 2018

tillrohrmann Jun 22, 2018

zentol Jun 22, 2018 •

edited

zentol Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann left a comment

tillrohrmann Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann Jun 22, 2018

zentol Jun 22, 2018

tillrohrmann Jun 22, 2018

zentol Jun 22, 2018

tillrohrmann Jun 22, 2018

zentol Jun 22, 2018

tillrohrmann Jun 22, 2018

tillrohrmann commented Jun 22, 2018

zentol left a comment

zentol Jun 25, 2018

tillrohrmann Jun 26, 2018

zentol Jun 26, 2018

tillrohrmann left a comment

zentol commented Jun 28, 2018

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph #6199

[FLINK-9624][runtime] Move jar/artifact upload out of jobgraph #6199

Conversation

zentol commented Jun 21, 2018 • edited

What is the purpose of the change

Verifying this change

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zentol Jun 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann commented Jun 22, 2018

zentol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

zentol commented Jun 28, 2018

zentol commented Jun 21, 2018 •

edited

zentol Jun 22, 2018 •

edited