[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

Baunsgaard · 2021-06-08T14:24:10Z

This PR connects the workload tree with the compression instruction,
this is done by constructing a global singleton-Hash-Map that can share objects via integer Ids,
this id is then appended to the compression instruction, and allow the parsing of the Workload tree from Hop level to
be accessed at Lop level.

I would like some comments if this is okay, also i am concerned about spark implementation,
since i would have to parse and send the WTree along with the spark instructions to workers.

Baunsgaard · 2021-06-08T17:48:14Z

I could use some feedback on this one, @mboehm7 .

mboehm7 · 2021-06-08T18:42:08Z

thanks - I'll make a detailed pass Friday this week.

phaniarnab · 2021-06-09T10:59:20Z

I noticed that you are transmitting the whole tree, via an integer ID, to the instruction runtime. I am not well aware of the intention, but can we not follow the conventional path: optimize the tree/DAG in the optimizer, extract information, and transmit via the instructions strings of the corresponding instructions; and/or do the same during recompilation? That way you can stick to the supposed separation of compiler and runtime, and can easily scale to distributed runtimes such as Spark and federated.

Baunsgaard · 2021-06-09T11:25:32Z

I noticed that you are transmitting the whole tree, via an integer ID, to the instruction runtime. I am not well aware of the intention, but can we not follow the conventional path: optimize the tree/DAG in the optimizer, extract information, and transmit via the instructions strings of the corresponding instructions; and/or do the same during recompilation? That way you can stick to the supposed separation of compiler and runtime, and can easily scale to distributed runtimes such as Spark and federated.

Excellent question.

The instruction strings then would have to be able to contain nested structures, since the workload tree provide means of separating different loops into sub tree nodes in the structure. If a string instruction containing this should be constructed it means that the string ends up with different lengths depending on how many sub loops are executed.

Providing an id to the structure and storing the tree somewhere else seams more scalable and flexible to future modifications where new types of instructions are added to the WTree.

Also note that currently we look up our matrix objects the same way via an id provided to the instruction string, (just with another back-end that we have worked with much much longer). The reason why I'm not using that back-end is because it require specific data types contained, like Data, MatrixValue, MatrixBlock, FrameBlock, etc.

sebwrede · 2021-06-09T15:37:32Z

src/main/java/org/apache/sysds/runtime/compress/CompressedMatrixBlockFactory.java

+	public static Pair<MatrixBlock, CompressionStatistics> compress(MatrixBlock mb, int k, WTreeRoot root){
+		return compress(mb, k, new CompressionSettingsBuilder().create());
+	}


You are not using root in this method. Is this on purpose?

yup, this PR simply wire up the tree to allow the compression instruction to use it, my WIP PR is the actual implementation.

sebwrede · 2021-06-09T15:52:52Z

I think the way the WTree is added to the instructions seems reasonable if the CompressionCPInstruction needs the entire tree, but is this really the case? Isn't it possible to add the information about compression contained in the tree and add it to the different instructions, so instead of putting all the information into a single CompressionCPInstruction string (which would have variable length), the information should be placed in the relevant instruction strings so that each of the strings contains only the compression information needed for its instruction execution (resulting in an instruction string of fixed length).
Maybe this is because I do not fully understand what you mean by "means of separating different loops into sub tree nodes in the structure". Couldn't the nested structures be represented by adding to the different instructions instead of adding all of it to the same instruction?

Baunsgaard · 2021-06-09T19:06:43Z

I think the way the WTree is added to the instructions seems reasonable if the CompressionCPInstruction needs the entire tree, but is this really the case?

Yes, but that is assuming that i know at compile time what information is needed in the runtime of an arbitrary matrix.
currently we don't have access to this information at compile time, and we probably should not.

Isn't it possible to add the information about compression contained in the tree and add it to the different instructions, so instead of putting all the information into a single CompressionCPInstruction string (which would have variable length), the information should be placed in the relevant instruction strings so that each of the strings contains only the compression information needed for its instruction execution (resulting in an instruction string of fixed length).

Since the Compression instruction is one instruction i don't see how i should create multiple instruction strings? Maybe I'm missing the question.

Maybe this is because I do not fully understand what you mean by "means of separating different loops into sub tree nodes in the structure". Couldn't the nested structures be represented by adding to the different instructions instead of adding all of it to the same instruction?

To clarify the loops:

Lets say i read in a matrix

X = read("...")

This matrix is used in a nested for loop at different locations

d = mean(X)
for(a in ...){
  ...
  c = max(X)
  Xm = X + 2
  for(b in ...){
    ...
    B = t(Xm) %*% X
    ...
}}

for optimizing compression for workflow i need to know

what functions are called on X and how many times(at least an approximation), mean once, max in forloop, compressed multiplication in forloop forloop.
what derived compressed matrices is constructed such as Xm (Still compressed).
what operations is performed on the derived matrices
how many decompressing operations there is

all this can be encapsulated in the "AWTree" object, and is hard to add to the instruction string since this string then would have to support all this complexity.

The information cannot just be extracted at compile time because i need access to the actual data that is being compressed to optimize for the given instructions.

sebwrede · 2021-06-10T08:45:30Z

Well, if you cannot extract the information at compile time and cannot build the tree during runtime, then I agree that the best solution is to put the information somewhere that is not in the instruction string, like you have done in this PR.

Another comment: If you change the _cops field, addCompressedOp, and getCompressOps to something like _highlightedOps, addHighlightedOp, and getHighlightedOps then the class would already be generalized enough to be used for other workload tree representations, such as the one I need for the federated executions.

Baunsgaard · 2021-06-10T11:43:49Z

Well, if you cannot extract the information at compile time and cannot build the tree during runtime, then I agree that the best solution is to put the information somewhere that is not in the instruction string, like you have done in this PR.

Another comment: If you change the _cops field, addCompressedOp, and getCompressOps to something like _highlightedOps, addHighlightedOp, and getHighlightedOps then the class would already be generalized enough to be used for other workload tree representations, such as the one I need for the federated executions.

should be changed now, But it is not enough to make it applicable to federated, for that you need to define

A new IPA
A new Analyzer

also if you decide to do so, we need to move this workload package to some more general location. perhaps a new package called workload in the runtime folder?

sebwrede · 2021-06-10T12:01:57Z

But it is not enough to make it applicable to federated

It is only the tree-structure I intent to reuse. The only purpose is to use it as a representation of execution plans.

perhaps a new package

Yes, that would be a good idea.

mboehm7 · 2021-06-11T18:10:43Z

LGTM. Overall, I think this map is fine - generating a unique token, storing the WTree globally under this token, and compiling the token into the compression instruction for later access is (in my opinion) the best option here. We could further make the WTree part of the normal explain output - that way it's not much different from our dictionaries of functions that are shown in the explain after compilation and then called from multiple different places.

Some minor additional comments:

The rework of the WTree seem to have lost the begin/end line information (for the explain output). Especially in large scripts like GLM with >1000 lines and many removed statement blocks during compilation, this information is crucial for effective debugging.
The global map for token lookups could be hardened for concurrent access by different threads (e.g., JMLC) and be made part of the cleanup logic to prevent resource leaks in programmatic APIs. Furthermore, there is no need to create singleton instances of this map - instead it could be a simple class with a static member variable and static methods (which would avoid unnecessary indirections).

Baunsgaard · 2021-06-21T09:50:08Z

Thanks for the comments, @phaniarnab @mboehm7 @sebwrede
i have modified accordingly.

Closing PR because of continued work in PR : #1320 , will merge after the Release

sebwrede reviewed Jun 9, 2021

View reviewed changes

Baunsgaard force-pushed the CLACostTree branch from 7dd4329 to 1fd9013 Compare June 10, 2021 11:42

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation

ceb52ae

Baunsgaard force-pushed the CLACostTree branch from 99d04e0 to ceb52ae Compare June 14, 2021 10:09

Baunsgaard closed this Jun 21, 2021

Baunsgaard deleted the CLACostTree branch July 1, 2021 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

Baunsgaard commented Jun 8, 2021

Baunsgaard commented Jun 8, 2021

mboehm7 commented Jun 8, 2021

phaniarnab commented Jun 9, 2021

Baunsgaard commented Jun 9, 2021

sebwrede Jun 9, 2021

Baunsgaard Jun 9, 2021

sebwrede commented Jun 9, 2021

Baunsgaard commented Jun 9, 2021

sebwrede commented Jun 10, 2021

Baunsgaard commented Jun 10, 2021

sebwrede commented Jun 10, 2021

mboehm7 commented Jun 11, 2021

Baunsgaard commented Jun 21, 2021

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

Conversation

Baunsgaard commented Jun 8, 2021

Baunsgaard commented Jun 8, 2021

mboehm7 commented Jun 8, 2021

phaniarnab commented Jun 9, 2021

Baunsgaard commented Jun 9, 2021

sebwrede Jun 9, 2021

Choose a reason for hiding this comment

Baunsgaard Jun 9, 2021

Choose a reason for hiding this comment

sebwrede commented Jun 9, 2021

Baunsgaard commented Jun 9, 2021

sebwrede commented Jun 10, 2021

Baunsgaard commented Jun 10, 2021

sebwrede commented Jun 10, 2021

mboehm7 commented Jun 11, 2021

Baunsgaard commented Jun 21, 2021