Skip to content

Handling current_block_id and effective_grid for tooling on top of blockApply #71

@LTLA

Description

@LTLA

Yes, I know I really shouldn't be tangling with either of these two variables, but bear with me here...

beachmat implements the colBlockApply() function, which was originally a wrapper around blockApply with grid= just set to colAutoGrid(x) so that I didn't have to keep on typing it all out. (Same for rowBlockApply.) Over time, though, it evolved to add some additional features that were useful to me:

  • Avoid the overhead of a *gCMatrix to SparseArraySeed back to *gCMatrix conversion when dealing with functions that were capable of taking *gCMatrix inputs.
  • Split up in-memory matrices (specifically, gCMatrix and ordinary matrices) prior to calling BiocParallel functions, to avoid the cost of serializing the entire matrix when only a fragment is used in each worker.
  • Avoid the overhead of block processing altogether when the matrix is in memory and no parallelization is requested, in which case we can just apply FUN on the full matrix directly.

To do this, sometimes I would pass FUN to blockApply(), and other times I would apply FUN directly to the matrix or its split-up fragments. This worked pretty well, provided I added the grid ID attributes to the matrix prior to calling FUN manually. However, this is no longer the case with the changes I requested from #69. Oops.

I can mimic the creation of current_block_id and effective_grid so that it gets found by effectiveGrid() and friends, but this makes my code pretty fragile to any changes you make in the effectiveGrid() discovery mechanism. So I wonder whether it would be possible to expose a setter mechanism for my use case.

Of course, I'd be happy to push some of my changes to blockApply() itself, and then colBlockApply() could revert to being the wrapper that it used to be. However, some of those changes are a bit opinionated, e.g., it assumes that FUN is capable of taking *gCMatrix inputs because that's also what beachmat v3 supports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions