More unit tests and streamlined matrix multiplication#68
Merged
hpages merged 3 commits intoBioconductor:masterfrom Sep 25, 2020
Merged
More unit tests and streamlined matrix multiplication#68hpages merged 3 commits intoBioconductor:masterfrom
hpages merged 3 commits intoBioconductor:masterfrom
Conversation
Switched to bplapply2 from bpiterate to fix error handling.
Contributor
|
Looks good. Thanks! |
Contributor
|
of course I need to merge first, doh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The general problem of capturing error messages in a
cbindhas been solved by switching frombpiteratetobplapply2, so now errors inside the matrix multiplication lead to errors in the parent process. This has the additional bonus of not requiring BiocParallel to be installed to do matrix multiplication.I also greatly streamlined the decision-making process of choosing the iteration scheme in the matrix multiplication. Now it just prioritizes the larger matrix as the one to split up for distributed computation across workers. This is not really any worse than the previous approach, which relied on the presence of a non-
NULLreturn fromchunkGridas a heuristic to identify which matrix was more likely to be file-backed... and then added an arbitrary penalty of 10 to the access cost of the chunks of that matrix. The new approach is very simple and easy to reason about.The streamlining was also assisted by the realization that
getAutoMultParallelAgnostic()was never actually exported. No users should have been able to use the non-agnostic matrix multiplication scheme, so I just got rid of it in the general case to make maintenance easier. I retained the non-agnostic option for self-cross-products due to its clear performance advantage though the default is still to use the agnostic mode for consistency with%*%.(I don't know what the future plans for the matrix multiplication algorithm will be, but being able to achieve consistent results that are agnostic to the number of workers, block size and chunk dimensions is very logistically useful. I know that there will be situations where an agnostic algorithm just won't cut it, e.g., due to chunk or block size constraints, and we may need a different algorithm in such cases; but having agnosticism as the default, where possible, really makes it easy to scale things up and down while still getting the same results.)
Finally, I enhanced the battery of unit tests.