Skip to content

Conversation

@kev-inn
Copy link
Contributor

@kev-inn kev-inn commented Jun 7, 2022

This reworks our runtime replacement of CP instructions with FED versions when appropriate.
SP replacement will be done in another PR.

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

Note that not everything is replaced yet, I will work on it while the general approach can already be reviewed.

During the testing I found myself requiring to check if actual FED instructions are executed, is there a way to ensure this in the test framework, or is this already ensured in some way via privacy? Tests checking the heavy hitters do it implicitly.

NOTE: I deleted a fix in QuantilePickCPInstruction (bf19244), as the common layout is no longer necessary and null is cleaner. This is intentional and the fix is no longer necessary.

Copy link
Contributor Author

@kev-inn kev-inn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to explain myself :)

return new QuantilePickCPInstruction(null, in1, new CPOperand(), out, ptype, inmem, opcode, str);
return new QuantilePickCPInstruction(null, in1, out, ptype, inmem, opcode, str);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our new approach allows us to remove this fix/hack.

import org.apache.sysds.runtime.matrix.operators.UnaryOperator;

public class UnaryScalarCPInstruction extends UnaryMatrixCPInstruction {
public class UnaryScalarCPInstruction extends UnaryCPInstruction {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem due to inheritance. We replace UnaryMatrixCPInstructions, but don't replace UnaryScalarCPInstructions, with FED instructions. I think this inheritance made no sense and probably was a bug.
If it wasn't we can also overload the tryReplaceWithFederated() here.

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 7, 2022

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

  • FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring
  • FederatedColAggregateTest.testTernaryColSumDenseMatrix
  • FederatedCtableTest.federatedCtableSinglenode (configuration [3])
  • FederatedFullAggregateTest.testTernarySumDenseMatrixCP (configuration [0] and [1])
  • FederatedWeightedCrossEntropy (configuration [0] and [1])
    • federatedWeightedCrossEntropySpark
    • federatedWeightedCrossEntropySparkEpsilon
    • federatedWeightedCrossEntropySparkNode
    • federatedWeightedCrossEntropySparkNodeEpsilon
  • FederatedWeightedDivMatrixMultTest all or almost all
  • FederatedWeightedSigmoidTest all
  • FederatedWeightedSquaredLossTest all
  • FederatedWeightedUnaryMatrixMultTest all (both configurations)

I remember that this is a known problem.
Edit: I was behind a few commits, so this information might be outdated.

@kev-inn kev-inn marked this pull request as draft June 7, 2022 15:14
@kev-inn kev-inn force-pushed the cleanup_fed_instructions branch from 352adb3 to 3bdf394 Compare June 7, 2022 18:29
@kev-inn kev-inn marked this pull request as ready for review June 7, 2022 18:30
@Baunsgaard
Copy link
Contributor

As a side note, when I ran all the tests the following testcases fail, but work when executed in isolation:

* `FederatedWorkerIntegrationCRUDTest.testWorkerAddedForMonitoring`

* `FederatedColAggregateTest.testTernaryColSumDenseMatrix`

* `FederatedCtableTest.federatedCtableSinglenode` (configuration [3])

* `FederatedFullAggregateTest.testTernarySumDenseMatrixCP` (configuration [0] and [1])

* `FederatedWeightedCrossEntropy` (configuration [0] and [1])
  
  * `federatedWeightedCrossEntropySpark`
  * `federatedWeightedCrossEntropySparkEpsilon`
  * `federatedWeightedCrossEntropySparkNode`
  * `federatedWeightedCrossEntropySparkNodeEpsilon`

* `FederatedWeightedDivMatrixMultTest` all or almost all

* `FederatedWeightedSigmoidTest` all

* `FederatedWeightedSquaredLossTest` all

* `FederatedWeightedUnaryMatrixMultTest` all (both configurations)

I remember that this is a known problem. Edit: I was behind a few commits, so this information might be outdated.

Yes this is known.

You can run your tests such that they repeat if failing, there you will detect these "flaky" tests.
If you use IntelliJ, you can also add a new execution mode where you repeat failing tests

mvn clean compile test -Dmaven.test.skip=false -Drerun.failingtests.count=1 -D automatedtestbase.outputbuffering=true -Dtest=org.apache.sysds.test.$@

@Baunsgaard
Copy link
Contributor

This implementation was my second approach, the first one consisted of static methods in the FED instructions. I much prefer this one as it is quite clear which instructions are replaced, which constraints the CP imposes and we don't have to lead every check with an instanceof check. The downside of this approach is that it can get a bit confusing with inheritance, but it is---and will probably stay---manageable.

I am not sure if this is the best location to edit.
Would it not make more sense to change it when we construct the CP vs SP vs GPU instructions from the string versions of instructions themselves?
This way if we for instance in the future add FPGA instructions it naturally also support changing these into federated instructions, and we do not have to maintain both all instances for each federated instruction which would lead to excessive testing requirements where we need to hit each instruction as either cases.

@kev-inn kev-inn marked this pull request as draft June 8, 2022 09:11
@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 8, 2022

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

@Baunsgaard
Copy link
Contributor

You have a point there, I also realized this recently when trying to also replace SP Instructions without duplicating code. I will weigh the options, if I come up with more, but will most likely opt to replace it already during parsing.

Edit: this actually is not that simple, as we create our instructions before we get the runtime information if an object is federated AFAIK. Note that federated instruction compilation exists by now, but this is about a runtime decision. I will search for a smooth approach, currently only interfaces come to mind, but would like to find something better.

You are right, it is not that simple but try to prioritize a good extensible design and then we can expand upon it.
If you cannot find a better way, then i still can live with the design proposed already.

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 8, 2022

Sure, that is the number one priority of this rework, that is why I also would like to remove the dependency for the string representations of those instructions to be similar/matching.
In the future we want to send only CP instructions to the workers and locally choose CP, SP or GPU (maybe even FED?). Keeping this in mind, we could add conversions for all kinds of instructions to CP (and require constructors like the ones I added for FED) or add interfaces with getters for specialized instructions like BinaryAggregationInstruction.
The approach, which should be the nicest to work with, would be a complete rework of the hierarchy of instructions, adding the type of execution only at the leaf specializations. Instead of ComputationCPInstruction and ComputationSPInstruction we would have `ComputationInstruction. This would require a very intrusive change of a lot of core implementations and seems infeasable.

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 8, 2022

After a lot of consideration I propose the following:

  1. Switch to the first implementation approach. The refactor makes the code more maintainable, because of less indentation and implementation in the corresponding FEDInstruction. The static methods perform checks for both CPInstruction and SPInstruction, which will necessarily have large overlaps. A constructor with the SPInstruction as input (like the ones I already added with CPInstruction) will be added where necessary.
  2. Add a method to most SPInstructions to convert it to a CPInstruction. This will be used to allow the federated worker to choose which execution type he wants to use. Most SPInstruction can be converted to CPInstruction (and vice versa) by changing the execution type in the instruction string before parsing, those we could simply tag with an interface. This would reduce code quite a bit, but is harder to maintain. Opinions are appreciated.
  3. Some SPInstructions like ReblockSPInstruction don't have an equivalent CPInstruction, those will be sent to the worker as is. A problem arises when the federated worker opts to use the CPInstruction for inputs instead, this would require us to ignore the SPInstruction. Not sure if checking for RDD handles on the inputs is enough(?).

@mboehm7
Copy link
Contributor

mboehm7 commented Jun 8, 2022

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

@Baunsgaard
Copy link
Contributor

(Just because i regret deleting it -- i wrote this while you commented Matthias)

I suggest that you make the methods inside CP and SP instructions part of a interface that allows any instruction to change to FED, and then make the footprint of these functions as small as possible inside each of the CP and SP instructions. All of these instructions should call into the fitting FEDInstructions class that then handle the logic of converting the instructions to FEDInstructions though static constructors that can return null.
With this we simply try to call this interface function on the instructions to change them into FEDInstructions when they can before execution. In the static construction it is easy to change the SP instructions (or GPU etc) to CP if possible, otherwise the static constructor would construct SP type.

This design addresses all 3 points with (i think) the least amount of code.

-- Comment to Matthias's point

We are missing one step where we can avoid the whole FEDInstructionUtils code with casting that Kevin i think correctly is addressing in some of in his changes, where we can remove the entire if else tree if we add a interface method that simply map to the static constructors in your step 1.
If we set default for all methods to return null for this interface "toFedInstruction" then we can just overwrite all instances of instructions that can change into FEDInstructions and get much cleaner code from it, --- with the downside of having small overloaded functions that map to the static constructors for FEDInstructions in each CP and SP instruction---

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 9, 2022

Thanks for getting started on this issue @kev-inn. Unfortunately, I think the PR goes a bit in the wrong direction. We first want to do a basic cleanup of the federated instructions, without modifying the CP or Spark instructions. The replace in FEDInstructionUtils should stay mostly as it is, and all federated instructions should have overloaded parseInstruction methods to consume either CP or Spark instructions (objects) directly and construct the FED instruction like [1] without replicating the string parsing logic. For the parseInstruction with strings, we can call the parse on the respective CP instruction (for consistency) and use again the above methods.

In a second step, we would remove the replicated instructions like MMFEDInstruction (redundant to AggregateBinaryFEDInstruction) and let all instruction send CP instruction strings to the federated workers.

In a third step, we then extend the federated workers to (under certain conditions) create a hop that represents the CP instruction and generate lops and instructions as needed (e.g., CP, Spark, GPU).

Sorry, if I was not clear during our offline discussions.

[1] https://github.com/apache/systemds/blob/main/src/main/java/org/apache/sysds/runtime/instructions/fed/CentralMomentFEDInstruction.java#L77

No worries, as @Baunsgaard stated, I think the majority of this PR is still a nice addition and it fits your first step nicely. I would like to have 2/3 methods:

  1. Interface method named toFedInstruction for CP and SP Instructions
  2. Static "constructor" method for FEDInstructions parseInstruction() for SP and CP

EDIT: Method names can be changed, please feel free to give your opinion

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 13, 2022

I am sorry that this PR got so large. I believe it is better to do it in one go, but I can also break it down so the review is faster.

I recommend to open the old FEDInstructionUtils.java file on one screen and on the other screen go through all added toFedInstruction() methods. This will allows to easily match the old conditions with the new, more compact and (hopefully) easier to read conditions.
There aren't many changes to the logic, except that I changed (or added, see AppendSPInstruction) some super-classes.

@kev-inn kev-inn marked this pull request as ready for review June 13, 2022 13:07
@kev-inn kev-inn marked this pull request as draft June 13, 2022 13:46
@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 13, 2022

Not ready for review, found some problems with the spark instructions.

@kev-inn
Copy link
Contributor Author

kev-inn commented Jun 14, 2022

ConcurrentModificationException seems to be more frequent with this PR, I do not know what the reason is for that, but that is anyway a fix for the future.
EDIT: this should be because of the PR 942a3a2

@kev-inn kev-inn marked this pull request as ready for review June 15, 2022 09:56
@kev-inn kev-inn marked this pull request as draft June 15, 2022 12:12
@kev-inn kev-inn force-pushed the cleanup_fed_instructions branch from 2d97961 to 45271b1 Compare June 16, 2022 11:59
@kev-inn kev-inn marked this pull request as ready for review June 16, 2022 12:03
@kev-inn kev-inn force-pushed the cleanup_fed_instructions branch from 45271b1 to f527f82 Compare July 12, 2022 17:31
Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Aug 18, 2022
- Refactor replacement with FED instructions
- Fix FED output flag removal

Closes apache#1680
Closes apache#1628
@Baunsgaard
Copy link
Contributor

Closing to begin the merging phase, Thanks for the contribution @kev-inn

@Baunsgaard Baunsgaard closed this Aug 18, 2022
Baunsgaard pushed a commit that referenced this pull request Aug 19, 2022
- Refactor replacement with FED instructions
- Fix FED output flag removal

Closes #1628
fathollahzadeh pushed a commit to fathollahzadeh/systemds that referenced this pull request Dec 7, 2022
- Refactor replacement with FED instructions
- Fix FED output flag removal

Closes apache#1628
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants