[SYSTEMML-2420] Initial version of distributed spark ps #805

EdgarLGB · 2018-07-19T22:41:54Z

Here is the PR for implementing the initial version of spark paramserv function. And just for information, in the spark ps test, there are some FIXMEs indicating that some error will be reproduced when specifying the epochs with 10. I'm hoping that you could maybe have a look on them because currently I have little idea to understand the problem.

Thanks for the review,
Guobao

…rpc for the communication between ps and remote workers

…n SparkPSBody

mboehm7

Thanks @EdgarLGB. Sure - I'm happy to help debugging this. However, the tests are currently not running because paramserv-test.dml is not included in the PR - could you please fix this. In addition, below you'll find some additional suggestions from a glance over the PR.

mboehm7 · 2018-07-20T06:53:28Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/ParamservUtils.java

+		DMLScript.RUNTIME_PLATFORM oldRtPlatform = DMLScript.rtplatform;
+		DMLScript.rtplatform = DMLScript.RUNTIME_PLATFORM.SINGLE_NODE;
+		Recompiler.recompileProgramBlockHierarchy2Forced(program.getProgramBlocks(), 0, new HashSet<>(), LopProperties.ExecType.CP);
+		DMLScript.rtplatform = oldRtPlatform;


This seems to be some outdated code. During the merge of a previous PR, I fixed the recompilation to CP which does not require modifying the global static flags (which could conflict with concurrent recompilation if paramserv is used inside parfor) - instead we can simply call Recompiler.recompileProgramBlockHierarchy2Forced with a given ExecType et to force all instructions into this et .

In fact, I have a question about how to launch the script on spark mode. I found that we could execute the script by creating MLContext or by using java SystemML.jar -exec spark. So what is the difference here? And the reason that I did so is that I used java -exec spark to launch scripts on spark and it was not able to correctly convert all the instructions into CP. But currently, creating MLContext works well for this conversion.

mboehm7 · 2018-07-20T06:54:05Z

src/test/java/org/apache/sysml/test/integration/functions/paramserv/ParamservLocalNNTest.java

-	private static final String TEST_NAME5 = "paramserv-nn-bsp-batch-drr";
-	private static final String TEST_NAME6 = "paramserv-nn-bsp-batch-dr";
-	private static final String TEST_NAME7 = "paramserv-nn-bsp-batch-or";
+	private static final String TEST_NAME = "paramserv-test";


Not included in the PR - please add.

mboehm7 · 2018-07-20T06:58:42Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/SparkPSWorker.java

@@ -73,5 +88,18 @@ private void configureWorker(Tuple2<Integer, Tuple2<MatrixBlock, MatrixBlock>> i

 		// Initialize the buffer pool and register it in the jvm shutdown hook in order to be cleanuped at the end
 		RemoteParForUtils.setupBufferPool(_workerID);
+
+		// Create the ps proxy
+		_ps = PSRpcFactory.createSparkPSProxy(_host);


On configuring the worker, you might want to double check that the list objects are deserialized with the right update status. Otherwise, this might be one of the reasons of invalid cleanups again.

mboehm7 · 2018-07-20T07:03:02Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/SparkPSProxy.java

+public class SparkPSProxy extends ParamServer {
+
+	private TransportClient _client;
+	private static final long RPC_TIME_OUT = 1000 * 60 * 5;	// 5 minute of timeout


Please, try to get the spark.rpc.* (e.g. spark.rpc.lookupTimeout) from the current Spark configuration (e.g., create a new SparkConf() which reads the configuration inside the executors from their system properties).

EdgarLGB · 2018-07-20T23:47:07Z

Thanks @mboehm7 for your early feedback. And I've made some modifications according to it.

mboehm7 · 2018-07-22T03:36:45Z

To answer your question on spark execution, I have to separate two things here (1) APIs and (2) execution modes, both of which are orthogonal except for certain APIs that only support a limited set of execution modes.

Regarding APIs, we support command line, MLContext, JMLC, ML pipelines, and Keras2DML/Caffe2DML. Command lines itself covers several types: you can run it standalone through Java (as you show above), via the spark-submit script (where the SystemML's driver runs in Spark's driver process) and through the hadoop script (where SystemML's driver runs in a client process or YARN container).

However, you're actual question is more about execution modes, which you can influence with the command line flag -exec. There we support singlenode (all operations in CP), hadoop (all matrix operations in MR), spark (all matrix operations in SPARK), hybrid (CP or MR per operation), and hybrid_spark (CP or SPARK per operation). Note that hybrid_spark is set as the default if you come through MLContext or spark_submit command line, while singlenode is the default in JMLC. These three configurations of API/exec modes are what most applications use.

mboehm7 · 2018-07-22T03:39:38Z

Furthermore, thanks for catching the issue with the in-place binary operations. As it turned out these issues occurred in special cases where dense matrices where converted to sparse matrices in CSR format. However, the in-place sparse binary operation implementation assumed our default MCSR. With the fix in SYSTEMML-2462 all your tests run perfectly fine. The tests with update per batch run very long so I'll likely reduce the number of epochs there.

mboehm7

LGTM - thanks @EdgarLGB. Overall this is a very good start and because it's working correctly in local mode, we can already merge it in. However, I would suggest (1) reworking the communication to a deep serialization of the actual matrix blocks, and (2) using accumulators to collect the statistics from remote workers. Both can be done in subsequent PR.

mboehm7 · 2018-07-22T05:14:24Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/rpc/PSRpcCall.java

+			sb.append(EMPTY);
+		} else {
+			flushListObject(_data);
+			sb.append(ProgramConverter.serializeDataObject(DATA_KEY, _data));


Perf: This serialization / deserialization approach should be replaced with serialization of in-memory matrix blocks. Right now, this PR exports (serializes and writes) the matrices to HDFS, which replicates them to multiple nodes via RPC, on the other side we load (read and deserialize) them again which causes another RPC for remote lookups if the data was not already replicated to the target node. Instead we could simply use the existing matrix serializers and deserializers to send the matrices via RPC. For that I would recommend to define a simple binary format (e.g., 4 byte int method, 4 byte int worker id, followed by name-value pairs of matrices where value is the byte sequence of a serialized matrix). The size of the output buffer can be exactly determined via MatrixBlock.getExactSizeOnDisk.

mboehm7 · 2018-07-22T05:17:30Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/rpc/PSRpcResponse.java

+					sb.append(EMPTY);
+				} else {
+					flushListObject((ListObject) _data);
+					sb.append(ProgramConverter.serializeDataObject(DATA_KEY, (ListObject) _data));


Perf: Same as the previous comment on serialization/deserialization.

mboehm7 · 2018-07-22T05:18:57Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/rpc/PSRpcObject.java

+	protected String bufferToString(ByteBuffer buffer) {
+		byte[] result = new byte[buffer.limit()];
+		buffer.get(result, 0, buffer.limit());
+		return new String(result);


This can be removed once we modified the serialization/deserialization. Usually it's not a good idea to convert anything other than meta data into string representations as its conversion and parsing is very expensive for floating point data.

mboehm7 · 2018-07-22T05:20:24Z

src/main/java/org/apache/sysml/runtime/controlprogram/paramserv/spark/rpc/PSRpcFactory.java

+//TODO should be able to configure the port by users
+public class PSRpcFactory {
+
+	private static final int PORT = 5055;


Could you please have a look how Spark assigns ports for RPC communication. It would be great if we can use a similar approach to have it consistent and ensure we're not conflicting with Spark and other daemon processes.

mboehm7 · 2018-07-22T05:26:07Z

src/main/java/org/apache/sysml/runtime/instructions/cp/ParamservBuiltinCPInstruction.java

+
+		try {
+			ParamservUtils.doPartitionOnSpark(sec, features, labels, scheme, workerNum) // Do data partitioning
+			    .foreach(worker);   // Run remote workers


Please use accumulators to collect the statistics (that we show on -stats) from all executor processes unless we're running in local spark mode as indicated by SparkExecutionContext.isLocalMaster() (parfor can serve as an example). Right now the statistics collection would only work in local mode where all "remote" tasks are executed in the driver process and hence correctly update the static statistics. In cluster mode, all statistics from the executors would be lost.

Also, it might be a good idea to use additional accumulators for the number of executed batches and epochs. This gives users a rough progress indicator in the Spark UI which will be very useful for long running paramserv instances.

EdgarLGB · 2018-07-22T19:41:37Z

Thanks @mboehm7 for the final feedback. And I will work on it for the following PR.

EdgarLGB added 12 commits July 16, 2018 23:54

[SYSTEMML-2419, 2420] Fix the recompilation to CP; Implementation of …

a00080c

…rpc for the communication between ps and remote workers

[SYSTEMML-2420] Remove the unnecessary serialization of driver host i…

70090e3

…n SparkPSBody

[SYSTEMML-2422] Shipment of the worker arguments

2b5c891

[SYSTEMML-2420] Initial version of ps rpc

7cad0c6

[SYSTEMML-2422] Implementation of remote worker

b143d1a

[SYSTEMML-2457] Add error handling and statistic

7f8d477

[SYSTEMML-2087] Add spark paramserv test

8a957aa

Merge branch 'master' into systemml-2420

065251e

[SYSTEMML-2420] Fix the serialization of matrix block

3e6e950

[SYSTEMML-2420] Add some fixme in spark ps test

24f9586

[MINOR] Cleanup the ps test

4a6061d

[MINOR] Remove a TODO

75c6a80

mboehm7 reviewed Jul 20, 2018

View reviewed changes

EdgarLGB added 2 commits July 20, 2018 10:17

[HOTFIX] Add the missing script

acd9f7a

[SYSTEMML-2420] Enable configuration of the rpc timeout from spark conf

4fefcb5

[SYSTEMML-2410] Fix the rpc timeout configuration

7b6fc61

mboehm7 reviewed Jul 22, 2018

View reviewed changes

asfgit closed this in 15ecb72 Jul 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMML-2420] Initial version of distributed spark ps #805

[SYSTEMML-2420] Initial version of distributed spark ps #805

EdgarLGB commented Jul 19, 2018

mboehm7 left a comment

mboehm7 Jul 20, 2018

EdgarLGB Jul 20, 2018 •

edited

mboehm7 Jul 20, 2018

mboehm7 Jul 20, 2018

mboehm7 Jul 20, 2018

EdgarLGB commented Jul 20, 2018

mboehm7 commented Jul 22, 2018

mboehm7 commented Jul 22, 2018

mboehm7 left a comment

mboehm7 Jul 22, 2018

mboehm7 Jul 22, 2018

mboehm7 Jul 22, 2018

mboehm7 Jul 22, 2018

mboehm7 Jul 22, 2018

EdgarLGB commented Jul 22, 2018

[SYSTEMML-2420] Initial version of distributed spark ps #805

[SYSTEMML-2420] Initial version of distributed spark ps #805

Conversation

EdgarLGB commented Jul 19, 2018

mboehm7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdgarLGB Jul 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdgarLGB commented Jul 20, 2018

mboehm7 commented Jul 22, 2018

mboehm7 commented Jul 22, 2018

mboehm7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdgarLGB commented Jul 22, 2018

EdgarLGB Jul 20, 2018 •

edited