Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-2994+2991] CLA Workload Analyzer and Workload Representation #1303

Closed
wants to merge 1 commit into from

Conversation

Baunsgaard
Copy link
Contributor

This PR connects the workload tree with the compression instruction,
this is done by constructing a global singleton-Hash-Map that can share objects via integer Ids,
this id is then appended to the compression instruction, and allow the parsing of the Workload tree from Hop level to
be accessed at Lop level.

I would like some comments if this is okay, also i am concerned about spark implementation,
since i would have to parse and send the WTree along with the spark instructions to workers.

@Baunsgaard
Copy link
Contributor Author

I could use some feedback on this one, @mboehm7 .

@mboehm7
Copy link
Contributor

mboehm7 commented Jun 8, 2021

thanks - I'll make a detailed pass Friday this week.

@phaniarnab
Copy link
Contributor

I noticed that you are transmitting the whole tree, via an integer ID, to the instruction runtime. I am not well aware of the intention, but can we not follow the conventional path: optimize the tree/DAG in the optimizer, extract information, and transmit via the instructions strings of the corresponding instructions; and/or do the same during recompilation? That way you can stick to the supposed separation of compiler and runtime, and can easily scale to distributed runtimes such as Spark and federated.

@Baunsgaard
Copy link
Contributor Author

I noticed that you are transmitting the whole tree, via an integer ID, to the instruction runtime. I am not well aware of the intention, but can we not follow the conventional path: optimize the tree/DAG in the optimizer, extract information, and transmit via the instructions strings of the corresponding instructions; and/or do the same during recompilation? That way you can stick to the supposed separation of compiler and runtime, and can easily scale to distributed runtimes such as Spark and federated.

Excellent question.

The instruction strings then would have to be able to contain nested structures, since the workload tree provide means of separating different loops into sub tree nodes in the structure. If a string instruction containing this should be constructed it means that the string ends up with different lengths depending on how many sub loops are executed.

Providing an id to the structure and storing the tree somewhere else seams more scalable and flexible to future modifications where new types of instructions are added to the WTree.

Also note that currently we look up our matrix objects the same way via an id provided to the instruction string, (just with another back-end that we have worked with much much longer). The reason why I'm not using that back-end is because it require specific data types contained, like Data, MatrixValue, MatrixBlock, FrameBlock, etc.

Comment on lines +92 to +94
public static Pair<MatrixBlock, CompressionStatistics> compress(MatrixBlock mb, int k, WTreeRoot root){
return compress(mb, k, new CompressionSettingsBuilder().create());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not using root in this method. Is this on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this PR simply wire up the tree to allow the compression instruction to use it, my WIP PR is the actual implementation.

@sebwrede
Copy link
Contributor

sebwrede commented Jun 9, 2021

I think the way the WTree is added to the instructions seems reasonable if the CompressionCPInstruction needs the entire tree, but is this really the case? Isn't it possible to add the information about compression contained in the tree and add it to the different instructions, so instead of putting all the information into a single CompressionCPInstruction string (which would have variable length), the information should be placed in the relevant instruction strings so that each of the strings contains only the compression information needed for its instruction execution (resulting in an instruction string of fixed length).
Maybe this is because I do not fully understand what you mean by "means of separating different loops into sub tree nodes in the structure". Couldn't the nested structures be represented by adding to the different instructions instead of adding all of it to the same instruction?

@Baunsgaard
Copy link
Contributor Author

I think the way the WTree is added to the instructions seems reasonable if the CompressionCPInstruction needs the entire tree, but is this really the case?

Yes, but that is assuming that i know at compile time what information is needed in the runtime of an arbitrary matrix.
currently we don't have access to this information at compile time, and we probably should not.

Isn't it possible to add the information about compression contained in the tree and add it to the different instructions, so instead of putting all the information into a single CompressionCPInstruction string (which would have variable length), the information should be placed in the relevant instruction strings so that each of the strings contains only the compression information needed for its instruction execution (resulting in an instruction string of fixed length).

Since the Compression instruction is one instruction i don't see how i should create multiple instruction strings? Maybe I'm missing the question.

Maybe this is because I do not fully understand what you mean by "means of separating different loops into sub tree nodes in the structure". Couldn't the nested structures be represented by adding to the different instructions instead of adding all of it to the same instruction?

To clarify the loops:

Lets say i read in a matrix

X = read("...")

This matrix is used in a nested for loop at different locations

d = mean(X)
for(a in ...){
  ...
  c = max(X)
  Xm = X + 2
  for(b in ...){
    ...
    B = t(Xm) %*% X
    ...
}}

for optimizing compression for workflow i need to know

  1. what functions are called on X and how many times(at least an approximation), mean once, max in forloop, compressed multiplication in forloop forloop.
  2. what derived compressed matrices is constructed such as Xm (Still compressed).
  3. what operations is performed on the derived matrices
  4. how many decompressing operations there is

all this can be encapsulated in the "AWTree" object, and is hard to add to the instruction string since this string then would have to support all this complexity.

The information cannot just be extracted at compile time because i need access to the actual data that is being compressed to optimize for the given instructions.

@sebwrede
Copy link
Contributor

Well, if you cannot extract the information at compile time and cannot build the tree during runtime, then I agree that the best solution is to put the information somewhere that is not in the instruction string, like you have done in this PR.

Another comment: If you change the _cops field, addCompressedOp, and getCompressOps to something like _highlightedOps, addHighlightedOp, and getHighlightedOps then the class would already be generalized enough to be used for other workload tree representations, such as the one I need for the federated executions.

@Baunsgaard
Copy link
Contributor Author

Well, if you cannot extract the information at compile time and cannot build the tree during runtime, then I agree that the best solution is to put the information somewhere that is not in the instruction string, like you have done in this PR.

Another comment: If you change the _cops field, addCompressedOp, and getCompressOps to something like _highlightedOps, addHighlightedOp, and getHighlightedOps then the class would already be generalized enough to be used for other workload tree representations, such as the one I need for the federated executions.

should be changed now, But it is not enough to make it applicable to federated, for that you need to define

  1. A new IPA
  2. A new Analyzer

also if you decide to do so, we need to move this workload package to some more general location. perhaps a new package called workload in the runtime folder?

@sebwrede
Copy link
Contributor

But it is not enough to make it applicable to federated

It is only the tree-structure I intent to reuse. The only purpose is to use it as a representation of execution plans.

perhaps a new package

Yes, that would be a good idea.

@mboehm7
Copy link
Contributor

mboehm7 commented Jun 11, 2021

LGTM. Overall, I think this map is fine - generating a unique token, storing the WTree globally under this token, and compiling the token into the compression instruction for later access is (in my opinion) the best option here. We could further make the WTree part of the normal explain output - that way it's not much different from our dictionaries of functions that are shown in the explain after compilation and then called from multiple different places.

Some minor additional comments:

  • The rework of the WTree seem to have lost the begin/end line information (for the explain output). Especially in large scripts like GLM with >1000 lines and many removed statement blocks during compilation, this information is crucial for effective debugging.
  • The global map for token lookups could be hardened for concurrent access by different threads (e.g., JMLC) and be made part of the cleanup logic to prevent resource leaks in programmatic APIs. Furthermore, there is no need to create singleton instances of this map - instead it could be a simple class with a static member variable and static methods (which would avoid unnecessary indirections).

@Baunsgaard
Copy link
Contributor Author

Thanks for the comments, @phaniarnab @mboehm7 @sebwrede
i have modified accordingly.

Closing PR because of continued work in PR : #1320 , will merge after the Release

@Baunsgaard Baunsgaard closed this Jun 21, 2021
@Baunsgaard Baunsgaard deleted the CLACostTree branch July 1, 2021 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants