Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 'sink' arg to blockApply() #10

Closed
PeteHaitch opened this issue Mar 12, 2018 · 5 comments
Closed

Adding 'sink' arg to blockApply() #10

PeteHaitch opened this issue Mar 12, 2018 · 5 comments
Assignees

Comments

@PeteHaitch
Copy link
Collaborator

PeteHaitch commented Mar 12, 2018

Hi Hervé,

Will blockApply() be gaining a sink or BACKEND argument? This can be convenient when blockApply()-ing over a matrix-like object to create a normalised matrix-like object, for example.

Thanks!

@PeteHaitch
Copy link
Collaborator Author

PeteHaitch commented Mar 12, 2018

FYI, this is what I'm currently using, but I'd prefer to retire this for an 'official' solution. It requires (and doesn't check) that the sink is the appropriate type and dimensions.

blockApplyWithRealization <- function(x, FUN, ..., grid = NULL, sink = NULL,
                                      BPREDO = list(), BPPARAM = bpparam()) {
    FUN <- match.fun(FUN)
    grid <- DelayedArray:::.normarg_grid(grid, x)
    nblock <- length(grid)
    bplapply(seq_len(nblock), function(b) {
        if (DelayedArray:::get_verbose_block_processing()) {
            message("Processing block ", b, "/", nblock, " ... ",
                    appendLF = FALSE)
        }
        viewport <- grid[[b]]
        block <- DelayedArray:::extract_block(x, viewport)
        if (!is.array(block)) {
            block <- DelayedArray:::.as_array_or_matrix(block)
        }
        attr(block, "from_grid") <- grid
        attr(block, "block_id") <- b
        block_ans <- FUN(block, ...)
        # NOTE: This is the only part different from DelayedArray::blockApply()
        if (!is.null(sink)) {
            write_block_to_sink(block_ans, sink, viewport)
            block_ans <- NULL
        }
        if (DelayedArray:::get_verbose_block_processing()) {
            message("OK")
        }
    },
    BPREDO = BPREDO,
    BPPARAM = BPPARAM)
}

@hpages
Copy link
Contributor

hpages commented Oct 2, 2020

Hi Pete,

This one has been patiently sitting in a corner for a while ;-)

blockApply() and family has matured a bit in the last 2 years, and now we have viewportApply() and viewportReduce() in addition to blockApply() and blockReduce().

viewportReduce() would be a better choice in general than blockApply(x, FUN, sink) for block-processing with on-the-fly writing to a realization sink. For your use case, it would look something like this:

library(DelayedArray)

## Normalization function:
my_powerful_normalization_algo <- function(m, shift=0) { m + shift }

## Block-processed version of the normalization function (we want to be as much
## backend-agnostic as we can, so no parallelization):
BLOCK_my_powerful_normalization_algo <- function(sink, m, shift=0, verbose=NA)
{
    stopifnot(identical(dim(sink), dim(m)))

    ## By setting 'block.shape' to "first-dim-grows-first" we're guaranteed to be
    ## compatible with realization sinks that only support linear writing (e.g.
    ## TENxRealizationSink objects).
    grid <- defaultAutoGrid(sink, block.shape="first-dim-grows-first")

    ## Define callback function to pass to viewportReduce().
    FUN <- function(viewport, sink, shift) {
        block <- read_block(m, viewport)
        block <- my_powerful_normalization_algo(block, shift)
        write_block(sink, viewport, block)
    }

    viewportReduce(FUN, grid, sink, shift=shift, verbose=verbose)
}

Let's try it:

library(TileDBArray)
## Matrix to normalize:
M <- writeTileDBArray(matrix(1:6000, nrow=50))

Create HDF5 realization sink:

library(HDF5Array)
sink <- HDF5RealizationSink(dim(M), chunkdim=c(20, 20))

Block process:

setAutoBlockSize(8000)
sink <- BLOCK_my_powerful_normalization_algo(sink, M, 0.1, verbose=TRUE)
# \ Processing viewport 1/12 ... OK
# \ Processing viewport 2/12 ... OK
# \ Processing viewport 3/12 ... OK
# \ Processing viewport 4/12 ... OK
# \ Processing viewport 5/12 ... OK
# \ Processing viewport 6/12 ... OK
# \ Processing viewport 7/12 ... OK
# \ Processing viewport 8/12 ... OK
# \ Processing viewport 9/12 ... OK
# \ Processing viewport 10/12 ... OK
# \ Processing viewport 11/12 ... OK
# \ Processing viewport 12/12 ... OK

Close sink and coerce:

close(sink)
as(sink, "DelayedArray")
# <50 x 120> matrix of class HDF5Matrix and type "double":
#         [,1]   [,2]   [,3] ... [,119] [,120]
#  [1,]    1.1   51.1  101.1   . 5901.1 5951.1
#  [2,]    2.1   52.1  102.1   . 5902.1 5952.1
#  [3,]    3.1   53.1  103.1   . 5903.1 5953.1
#  [4,]    4.1   54.1  104.1   . 5904.1 5954.1
#  [5,]    5.1   55.1  105.1   . 5905.1 5955.1
#   ...      .      .      .   .      .      .
# [46,]   46.1   96.1  146.1   . 5946.1 5996.1
# [47,]   47.1   97.1  147.1   . 5947.1 5997.1
# [48,]   48.1   98.1  148.1   . 5948.1 5998.1
# [49,]   49.1   99.1  149.1   . 5949.1 5999.1
# [50,]   50.1  100.1  150.1   . 5950.1 6000.1

This is with DelayedArray 0.15.14.

The major difference with the blockApply(x, FUN, sink) approach is that here we walk on a grid defined on the sink instead of on x. Turns out that for the particular normalization use case the transformation is isometric so x and the sink have the same geometry but this is not the case in general. For the general case, we want to walk on a grid defined on the sink. blockApply(x, FUN, sink) doesn't allow this.

Hope this makes sense,

H.

@hpages hpages closed this as completed Nov 2, 2020
@hpages
Copy link
Contributor

hpages commented Apr 27, 2021

Hi @PeteHaitch ,

Was never entirely happy with viewportReduce() as the primary tool for walking on a realization sink and filling it with data. Today I added sinkApply() in DelayedArray 0.17.11 as a slightly better tool for that. Name and interface are a little bit more intuitive, I hope. It's documented in ?sinkApply.

H.

@PeteHaitch
Copy link
Collaborator Author

Thanks, Herve. I'm not currently working on anything that requires this but it's good to have it there.

@hpages
Copy link
Contributor

hpages commented Apr 27, 2021

Yeah I realize this arrives kind of late. Oh well, maybe at some point when I've nothing else to do I'll replace blockApplyWithRealization() with sinkApply() in minfi and bsseq ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants