[SYSTEMDS-???] Parameterserver aggregation optimization by Baunsgaard · Pull Request #1211 · apache/systemds

Baunsgaard · 2021-03-24T19:37:53Z

This PR contains two small commits.

Remove incorrect warning of native error in MKL call. that print in case of
AggregateFinalResults avoid Generic access by forcing dense if the format of both parts are different.

the second part make the execution of CNN ParameterServer with batch size 32 on MNIST go from 260 sec to 200 on my laptop, with minimal changes (10 lines), i don't think this have any bad side effects but would like confirmation.

mboehm7 · 2021-03-24T22:38:47Z

thanks for following up on this, instead of converting the vector back to dense (after it has been copied from dense to sparse), couldn't we directly do ret.copy(rtasks.get(0).get(), false); //for init before the call to aggregateFinalResults to keep the accumulator dense?

Aggregate of final results would use a generic aggregation if either inputs were different format from each other. This commit change the behavior to force uniform format. Furthermore allocating the initial result in dense improve performance slightly more with 1 second The change improve performance of: uack+ 59.179sec -> uack+ 9.0sec on CNN implementation using the parameterserver on MNIST.

Baunsgaard · 2021-03-25T09:34:28Z

thanks for following up on this, instead of converting the vector back to dense (after it has been copied from dense to sparse), couldn't we directly do ret.copy(rtasks.get(0).get(), false); //for init before the call to aggregateFinalResults to keep the accumulator dense?

I Just tried it, we still encounter the case where the new values incoming are sparse, while our aggregate is dense.
the situation we want to avoid is this mix, because then we end up using our generic LibMatrixAgg.aggregateBinaryMatrixLastRowDenseGeneric () in this code path when we have correction enabled:

But i do get that it makes more sense to force the initial accumulator to be dense, since it most likely will be in the end so i added the copy to dense part you suggest, but overall this gives us a second better execution time on 32 batch size 1 epoch.

Baunsgaard force-pushed the Conv2d-2 branch from b35536c to 293fd64 Compare March 24, 2021 19:47

Baunsgaard added 2 commits March 25, 2021 10:12

[MINOR] Fix native warning on sparse MM

7cf4b23

Baunsgaard force-pushed the Conv2d-2 branch from 293fd64 to 2dfecc6 Compare March 25, 2021 09:34

Baunsgaard closed this in 3ec800a Mar 25, 2021

Baunsgaard deleted the Conv2d-2 branch March 25, 2021 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-???] Parameterserver aggregation optimization#1211

[SYSTEMDS-???] Parameterserver aggregation optimization#1211
Baunsgaard wants to merge 2 commits intoapache:masterfrom
Baunsgaard:Conv2d-2

Baunsgaard commented Mar 24, 2021

Uh oh!

mboehm7 commented Mar 24, 2021

Uh oh!

Baunsgaard commented Mar 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Baunsgaard commented Mar 24, 2021

Uh oh!

mboehm7 commented Mar 24, 2021

Uh oh!

Baunsgaard commented Mar 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants