[SYSTEMDS-3386] Refactor replacement of CP instructions #1628

kev-inn · 2022-06-07T14:03:06Z

This reworks our runtime replacement of CP instructions with FED versions when appropriate.
SP replacement will be done in another PR.

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

Note that not everything is replaced yet, I will work on it while the general approach can already be reviewed.

During the testing I found myself requiring to check if actual FED instructions are executed, is there a way to ensure this in the test framework, or is this already ensured in some way via privacy? Tests checking the heavy hitters do it implicitly.

NOTE: I deleted a fix in QuantilePickCPInstruction (bf19244), as the common layout is no longer necessary and null is cleaner. This is intentional and the fix is no longer necessary.

kev-inn

Some comments to explain myself :)

src/main/java/org/apache/sysds/runtime/instructions/cp/CovarianceCPInstruction.java

kev-inn · 2022-06-07T14:09:42Z

src/main/java/org/apache/sysds/runtime/instructions/cp/QuantilePickCPInstruction.java

-			return new QuantilePickCPInstruction(null, in1, new CPOperand(), out, ptype, inmem, opcode, str);
+			return new QuantilePickCPInstruction(null, in1, out, ptype, inmem, opcode, str);


Our new approach allows us to remove this fix/hack.

src/main/java/org/apache/sysds/runtime/instructions/cp/QuantilePickCPInstruction.java

src/main/java/org/apache/sysds/runtime/instructions/cp/QuantileSortCPInstruction.java

kev-inn · 2022-06-07T14:16:12Z

src/main/java/org/apache/sysds/runtime/instructions/cp/UnaryScalarCPInstruction.java

 import org.apache.sysds.runtime.matrix.operators.UnaryOperator;

-public class UnaryScalarCPInstruction extends UnaryMatrixCPInstruction {
+public class UnaryScalarCPInstruction extends UnaryCPInstruction {


This is a problem due to inheritance. We replace UnaryMatrixCPInstructions, but don't replace UnaryScalarCPInstructions, with FED instructions. I think this inheritance made no sense and probably was a bug.
If it wasn't we can also overload the tryReplaceWithFederated() here.

src/main/java/org/apache/sysds/runtime/instructions/fed/AggregateBinaryFEDInstruction.java

src/main/java/org/apache/sysds/runtime/instructions/fed/MMChainFEDInstruction.java

src/main/java/org/apache/sysds/runtime/instructions/fed/TsmmFEDInstruction.java

kev-inn · 2022-06-07T14:59:22Z

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring
FederatedColAggregateTest.testTernaryColSumDenseMatrix
FederatedCtableTest.federatedCtableSinglenode (configuration [3])
FederatedFullAggregateTest.testTernarySumDenseMatrixCP (configuration [0] and [1])
FederatedWeightedCrossEntropy (configuration [0] and [1])
- federatedWeightedCrossEntropySpark
- federatedWeightedCrossEntropySparkEpsilon
- federatedWeightedCrossEntropySparkNode
- federatedWeightedCrossEntropySparkNodeEpsilon
FederatedWeightedDivMatrixMultTest all or almost all
FederatedWeightedSigmoidTest all
FederatedWeightedSquaredLossTest all
FederatedWeightedUnaryMatrixMultTest all (both configurations)

I remember that this is a known problem.
Edit: I was behind a few commits, so this information might be outdated.

Baunsgaard · 2022-06-08T08:58:16Z

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

* `FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring`

* `FederatedColAggregateTest.testTernaryColSumDenseMatrix`

* `FederatedCtableTest.federatedCtableSinglenode` (configuration [3])

* `FederatedFullAggregateTest.testTernarySumDenseMatrixCP` (configuration [0] and [1])

* `FederatedWeightedCrossEntropy` (configuration [0] and [1])
  
  * `federatedWeightedCrossEntropySpark`
  * `federatedWeightedCrossEntropySparkEpsilon`
  * `federatedWeightedCrossEntropySparkNode`
  * `federatedWeightedCrossEntropySparkNodeEpsilon`

* `FederatedWeightedDivMatrixMultTest` all or almost all

* `FederatedWeightedSigmoidTest` all

* `FederatedWeightedSquaredLossTest` all

* `FederatedWeightedUnaryMatrixMultTest` all (both configurations)

I remember that this is a known problem. Edit: I was behind a few commits, so this information might be outdated.

Yes this is known.

You can run your tests such that they repeat if failing, there you will detect these "flaky" tests.
If you use IntelliJ, you can also add a new execution mode where you repeat failing tests

mvn clean compile test -Dmaven.test.skip=false -Drerun.failingtests.count=1 -D automatedtestbase.outputbuffering=true -Dtest=org.apache.sysds.test.$@

Baunsgaard · 2022-06-08T09:02:39Z

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

I am not sure if this is the best location to edit.
Would it not make more sense to change it when we construct the CP vs SP vs GPU instructions from the string versions of instructions themselves?
This way if we for instance in the future add FPGA instructions it naturally also support changing these into federated instructions, and we do not have to maintain both all instances for each federated instruction which would lead to excessive testing requirements where we need to hit each instruction as either cases.

kev-inn · 2022-06-08T09:14:00Z

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

Baunsgaard · 2022-06-08T11:30:36Z

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

You are right, it is not that simple but try to prioritize a good extensible design and then we can expand upon it.
If you cannot find a better way, then i still can live with the design proposed already.

kev-inn · 2022-06-08T11:40:07Z

Sure, that is the number one priority of this rework, that is why I also would like to remove the dependency for the string representations of those instructions to be similar/matching.
In the future we want to send only CP instructions to the workers and locally choose CP, SP or GPU (maybe even FED?). Keeping this in mind, we could add conversions for all kinds of instructions to CP (and require constructors like the ones I added for FED) or add interfaces with getters for specialized instructions like BinaryAggregationInstruction.
The approach, which should be the nicest to work with, would be a complete rework of the hierarchy of instructions, adding the type of execution only at the leaf specializations. Instead of ComputationCPInstruction and ComputationSPInstruction we would have `ComputationInstruction. This would require a very intrusive change of a lot of core implementations and seems infeasable.

kev-inn · 2022-06-08T18:14:36Z

After a lot of consideration I propose the following:

Switch to the first implementation approach. The refactor makes the code more maintainable, because of less indentation and implementation in the corresponding FEDInstruction. The static methods perform checks for both CPInstruction and SPInstruction, which will necessarily have large overlaps. A constructor with the SPInstruction as input (like the ones I already added with CPInstruction) will be added where necessary.
Add a method to most SPInstructions to convert it to a CPInstruction. This will be used to allow the federated worker to choose which execution type he wants to use. Most SPInstruction can be converted to CPInstruction (and vice versa) by changing the execution type in the instruction string before parsing, those we could simply tag with an interface. This would reduce code quite a bit, but is harder to maintain. Opinions are appreciated.
Some SPInstructions like ReblockSPInstruction don't have an equivalent CPInstruction, those will be sent to the worker as is. A problem arises when the federated worker opts to use the CPInstruction for inputs instead, this would require us to ignore the SPInstruction. Not sure if checking for RDD handles on the inputs is enough(?).

mboehm7 · 2022-06-08T20:39:21Z

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

Baunsgaard · 2022-06-08T21:07:23Z

(Just because i regret deleting it -- i wrote this while you commented Matthias)

I suggest that you make the methods inside CP and SP instructions part of a interface that allows any instruction to change to FED, and then make the footprint of these functions as small as possible inside each of the CP and SP instructions. All of these instructions should call into the fitting FEDInstructions class that then handle the logic of converting the instructions to FEDInstructions though static constructors that can return null.
With this we simply try to call this interface function on the instructions to change them into FEDInstructions when they can before execution. In the static construction it is easy to change the SP instructions (or GPU etc) to CP if possible, otherwise the static constructor would construct SP type.

This design addresses all 3 points with (i think) the least amount of code.

-- Comment to Matthias's point

We are missing one step where we can avoid the whole FEDInstructionUtils code with casting that Kevin i think correctly is addressing in some of in his changes, where we can remove the entire if else tree if we add a interface method that simply map to the static constructors in your step 1.
If we set default for all methods to return null for this interface "toFedInstruction" then we can just overwrite all instances of instructions that can change into FEDInstructions and get much cleaner code from it, --- with the downside of having small overloaded functions that map to the static constructors for FEDInstructions in each CP and SP instruction---

kev-inn · 2022-06-09T11:16:47Z

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

No worries, as @Baunsgaard stated, I think the majority of this PR is still a nice addition and it fits your first step nicely. I would like to have 2/3 methods:

Interface method named toFedInstruction for CP and SP Instructions
Static "constructor" method for FEDInstructions parseInstruction() for SP and CP

EDIT: Method names can be changed, please feel free to give your opinion

kev-inn · 2022-06-13T13:06:45Z

I am sorry that this PR got so large. I believe it is better to do it in one go, but I can also break it down so the review is faster.

I recommend to open the old FEDInstructionUtils.java file on one screen and on the other screen go through all added toFedInstruction() methods. This will allows to easily match the old conditions with the new, more compact and (hopefully) easier to read conditions.
There aren't many changes to the logic, except that I changed (or added, see AppendSPInstruction) some super-classes.

kev-inn · 2022-06-13T13:47:11Z

Not ready for review, found some problems with the spark instructions.

kev-inn · 2022-06-14T13:14:15Z

ConcurrentModificationException seems to be more frequent with this PR, I do not know what the reason is for that, but that is anyway a fix for the future.
EDIT: this should be because of the PR 942a3a2

- Refactor replacement with FED instructions - Fix FED output flag removal Closes apache#1680 Closes apache#1628

Baunsgaard · 2022-08-18T14:11:02Z

Closing to begin the merging phase, Thanks for the contribution @kev-inn

- Refactor replacement with FED instructions - Fix FED output flag removal Closes #1628

- Refactor replacement with FED instructions - Fix FED output flag removal Closes apache#1628

kev-inn commented Jun 7, 2022

View reviewed changes

kev-inn marked this pull request as draft June 7, 2022 15:14

kev-inn force-pushed the cleanup_fed_instructions branch from 352adb3 to 3bdf394 Compare June 7, 2022 18:29

kev-inn marked this pull request as ready for review June 7, 2022 18:30

kev-inn marked this pull request as draft June 8, 2022 09:11

kev-inn marked this pull request as ready for review June 13, 2022 13:07

kev-inn marked this pull request as draft June 13, 2022 13:46

kev-inn marked this pull request as ready for review June 15, 2022 09:56

kev-inn marked this pull request as draft June 15, 2022 12:12

kev-inn force-pushed the cleanup_fed_instructions branch from 2d97961 to 45271b1 Compare June 16, 2022 11:59

kev-inn marked this pull request as ready for review June 16, 2022 12:03

Refactor replacement with FED instructions

f527f82

kev-inn force-pushed the cleanup_fed_instructions branch from 45271b1 to f527f82 Compare July 12, 2022 17:31

Fix FED output flag removal

d48fe3d

Baunsgaard mentioned this pull request Aug 18, 2022

[SYSTEMDS-3386] Refactor replacement of CP instructions #1680

Closed

Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Aug 18, 2022

[SYSTEMDS-3386] Refactor replacement of CP instructions

ec65d16

- Refactor replacement with FED instructions - Fix FED output flag removal Closes apache#1680 Closes apache#1628

Baunsgaard closed this Aug 18, 2022

Baunsgaard pushed a commit that referenced this pull request Aug 19, 2022

[SYSTEMDS-3386] Refactor replacement of CP instructions

94f8c7f

- Refactor replacement with FED instructions - Fix FED output flag removal Closes #1628

fathollahzadeh pushed a commit to fathollahzadeh/systemds that referenced this pull request Dec 7, 2022

[SYSTEMDS-3386] Refactor replacement of CP instructions

142a8a4

- Refactor replacement with FED instructions - Fix FED output flag removal Closes apache#1628

		return new QuantilePickCPInstruction(null, in1, new CPOperand(), out, ptype, inmem, opcode, str);
		return new QuantilePickCPInstruction(null, in1, out, ptype, inmem, opcode, str);

[SYSTEMDS-3386] Refactor replacement of CP instructions #1628

[SYSTEMDS-3386] Refactor replacement of CP instructions #1628

Uh oh!

Conversation

kev-inn commented Jun 7, 2022

Uh oh!

kev-inn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kev-inn Jun 7, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kev-inn Jun 7, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kev-inn commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baunsgaard commented Jun 8, 2022

Uh oh!

Baunsgaard commented Jun 8, 2022

Uh oh!

kev-inn commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baunsgaard commented Jun 8, 2022

Uh oh!

kev-inn commented Jun 8, 2022

Uh oh!

kev-inn commented Jun 8, 2022

Uh oh!

mboehm7 commented Jun 8, 2022

Uh oh!

Baunsgaard commented Jun 8, 2022

Uh oh!

kev-inn commented Jun 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kev-inn commented Jun 13, 2022

Uh oh!

kev-inn commented Jun 13, 2022

Uh oh!

kev-inn commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Baunsgaard commented Aug 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kev-inn commented Jun 7, 2022 •

edited

Loading

kev-inn commented Jun 8, 2022 •

edited

Loading

kev-inn commented Jun 9, 2022 •

edited

Loading

kev-inn commented Jun 14, 2022 •

edited

Loading