Skip to content

Add row/colsum fallbacks for non-DA/base matrix classes.#57

Closed
LTLA wants to merge 1 commit intoBioconductor:masterfrom
LTLA:master
Closed

Add row/colsum fallbacks for non-DA/base matrix classes.#57
LTLA wants to merge 1 commit intoBioconductor:masterfrom
LTLA:master

Conversation

@LTLA
Copy link
Contributor

@LTLA LTLA commented Dec 3, 2019

Pretty much as it says. The aim is to provide some slow inefficient fallbacks for these generics so that they can operate on sparse matrices (or anything with a colSums and rowSums method).

Perhaps these won't live in DelayedArray in the long term, but this is a good enough location for the time being; it allows me to actually use these generics in my various packages. Right now I'm wrapping colsum in another scater generic (.colsum) in order to catch the non-DA/non-base case.

@hpages
Copy link
Contributor

hpages commented Dec 3, 2019

This should not be necessary. DelayedArray:::.BLOCK_rowsum() and DelayedArray:::.BLOCK_colsum() are fast and work on anything that satisfies the "seed contract" (i.e. supports dim(), dimnames(), and extract_array()). That includes sparse matrices:

library(Matrix)
library(DelayedArray)

rowsum_ANY <- function(x, group, reorder=TRUE, ...)
{
    by.group <- split(seq_len(nrow(x)), group)
    out <- lapply(by.group, function(i) colSums(x[i,,drop=FALSE]))
    out <- do.call(rbind, out)
    if (!reorder) {
        out <- out[unique(group),,drop=FALSE]
    }
    out
}

m <- matrix(runif(1.2e8), nrow=12000)
m0 <- as(ifelse(m <= 0.05, m, 0), "dgCMatrix")  # 5% non-zero values
group <- sample(777, nrow(m0), replace=TRUE)

system.time(rs1 <- DelayedArray:::.BLOCK_rowsum(m0, group))
#    user  system elapsed 
#   0.884   0.927   0.941 

system.time(rs2 <- rowsum_ANY(m0, group))
#    user  system elapsed 
#  12.968   0.040  13.008 

identical(rs1, rs2)
# [1] TRUE

So either you wrap your sparse matrix in a DelayedArray object and call rowsum() on it (which is equivalent to calling DelayedArray:::.BLOCK_rowsum() directly on the sparse matrix):

system.time(rs3 <- rowsum(DelayedArray(m0), group))
#    user  system elapsed 
#   0.979   0.822   0.950 

identical(rs1, rs3)
# [1] TRUE

or, if that's not convenient enough, we could just temporarily add the 2 following lines to DelayedArray (until the sparseMatrixStats package provides a native implementation for this):

setMethod("rowsum", "sparseMatrix", .BLOCK_rowsum)
setMethod("colsum", "sparseMatrix", .BLOCK_colsum)

@LTLA
Copy link
Contributor Author

LTLA commented Dec 4, 2019

Oh, good.

or, if that's not convenient enough, we could just temporarily add the 2 following lines to DelayedArray (until the sparseMatrixStats package provides a native implementation for this):

Well, if it's going to be a stopgap, I would like a fallback ANY method that works for any matrix input. Seems like the current ANY method should really be the matrix method, and the true ANY method would actually call .BLOCK_*sum.

@hpages
Copy link
Contributor

hpages commented Dec 4, 2019

Seems like the current ANY method should really be the matrix method, and the true ANY method would actually call .BLOCK_*sum.

Tempting but I don't like the smell of this. I believe in the strong principle of keeping the base function the default method. I don't think the R/CRAN/Bioconductor democracy can work if package developers start feeling that they know what THE default method should do and thus take possession of it. What would prevent other packages from doing the same thing? This would quickly become messy. In other words, we don't own base::rowsum() so we need to be careful about these things.

That being said, I'm not saying that you won't find places in the S4Vectors/IRanges/GenomicRanges stack or in one of my other packages where I'm doing exactly what I'm preaching against (either because I didn't know what I was doing at the time or because it was introduced by someone else ;-) )

@LTLA
Copy link
Contributor Author

LTLA commented Dec 4, 2019

Hm. Sounds like I should just keep on using my .colsum until a general solution becomes available.

@LTLA LTLA closed this Dec 4, 2019
@hpages
Copy link
Contributor

hpages commented Dec 4, 2019

Or you can wrap whatever matrix-like object you have inside a DelayedArray shell and pass it to colsum().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants