Skip to content

[SYSTEMDS-3091] MatrixBlock Initialized with Zeros is Sparse#1362

Closed
ywcb00 wants to merge 3 commits intoapache:masterfrom
ywcb00:exp/improve/runtime/instructions/fed/ctable/plusagg
Closed

[SYSTEMDS-3091] MatrixBlock Initialized with Zeros is Sparse#1362
ywcb00 wants to merge 3 commits intoapache:masterfrom
ywcb00:exp/improve/runtime/instructions/fed/ctable/plusagg

Conversation

@ywcb00
Copy link
Copy Markdown
Contributor

@ywcb00 ywcb00 commented Aug 11, 2021

Hi,
This PR tries to remove the performance bottleneck inside the federated ctable instruction (caused by aggregating the sparse partial result matrices using a dense plus operator) by constructing a MatrixBlock as a sparse matrix if all the values are initialized to zero.

Thanks for review :)

@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented Aug 12, 2021

Since the implications of these changed defaults can be large, I would appreciate if we could address this ctable performance locally by passing the right parameters from outside MatrixBlock. Would that be possible or did you encounter any issues there?

@ywcb00
Copy link
Copy Markdown
Contributor Author

ywcb00 commented Aug 13, 2021

Yes, it is possible to address it locally.
I've changed it accordingly :)

However, I think we shouldn't forget the change of the MatrixBlock right away, because it accelerates the CTable Instruction by a factor of 100 in my experiments. Maybe it's the same with other instructions. 🤔

@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented Aug 13, 2021

LGTM - great catch @ywcb00 and thanks for fixing this performance issue. For local CP and distributed Spark operations we actually decide based on estimated sparsity if we do a sparse hash aggregation or an accumulation into a dense matrix. Likely the federated ctable operations was implemented for correctness so far. In your experiments, however, please test both sparse ctable scenario (like creating permutation matrices) and dense scenarios (like confusion matrices with moderate number of classes). In any case, the revised change is much better, because the previous commit affected the defaults, partially without considering the value, and unintended sparse accumulation can be vastly slower due to binary search and potential shifting. But you're right reevaluated these defaults (e.g., with our revived perftest suite). This reminds me, if feel it's beneficial, please add your federated algorithm tests to this perftest so we can automatically run them before releases.

@asfgit asfgit closed this in 59267c9 Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants