-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Yes, I know I really shouldn't be tangling with either of these two variables, but bear with me here...
beachmat implements the colBlockApply() function, which was originally a wrapper around blockApply with grid= just set to colAutoGrid(x) so that I didn't have to keep on typing it all out. (Same for rowBlockApply.) Over time, though, it evolved to add some additional features that were useful to me:
- Avoid the overhead of a
*gCMatrixtoSparseArraySeedback to*gCMatrixconversion when dealing with functions that were capable of taking*gCMatrixinputs. - Split up in-memory matrices (specifically,
gCMatrixand ordinary matrices) prior to calling BiocParallel functions, to avoid the cost of serializing the entire matrix when only a fragment is used in each worker. - Avoid the overhead of block processing altogether when the matrix is in memory and no parallelization is requested, in which case we can just apply
FUNon the full matrix directly.
To do this, sometimes I would pass FUN to blockApply(), and other times I would apply FUN directly to the matrix or its split-up fragments. This worked pretty well, provided I added the grid ID attributes to the matrix prior to calling FUN manually. However, this is no longer the case with the changes I requested from #69. Oops.
I can mimic the creation of current_block_id and effective_grid so that it gets found by effectiveGrid() and friends, but this makes my code pretty fragile to any changes you make in the effectiveGrid() discovery mechanism. So I wonder whether it would be possible to expose a setter mechanism for my use case.
Of course, I'd be happy to push some of my changes to blockApply() itself, and then colBlockApply() could revert to being the wrapper that it used to be. However, some of those changes are a bit opinionated, e.g., it assumes that FUN is capable of taking *gCMatrix inputs because that's also what beachmat v3 supports.