Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RleListMatrix (?) #62

Open
jonocarroll opened this issue Feb 28, 2020 · 1 comment
Open

Support for RleListMatrix (?) #62

jonocarroll opened this issue Feb 28, 2020 · 1 comment

Comments

@jonocarroll
Copy link

jonocarroll commented Feb 28, 2020

Related to #27, though I note that the following now works

library(DelayedArray)
DelayedArray(
  matrix(
    IntegerList(
      c(list(c(1L, 1L)), list(c(1L,1L)), list(c(1L,1L)), list(c(2L,2L)))
    ), 
    nrow = 2, ncol = 2)
)
#> <2 x 2> matrix of class DelayedMatrix and type "list":
#>      [,1] [,2]
#> [1,] 1, 1 1, 1
#> [2,] 1, 1 2, 2

Created on 2020-02-28 by the reprex package (v0.3.0)

Is there a motivation to support RleListMatrix? For the same use case as above, I'm using VariantAnnotation to build a CompressedVcf object and it has matrices of lists. The list elements are in many cases NA so it may be efficient to be able to store these as an Rle-derived object. I can't go as far as to verify that such a structure would benefit from Rle - would the elements be sufficiently contiguous?

My workaround at the moment is to collapse the list elements into single delimited strings, in which case DelayedArray or RleMatrix work out of the box. In this case the string concatenation results in the matrix object decreasing in size by a factor of ~8 (potentially due to global string pooling). Converting to RleMatrix reduces it again by an additional factor of ~16. Total compression from matrix of lists to character RleMatrix is 128x. If RleListMatrix was able provide a comparable benefit without converting to string then that could be very useful.

I'll link another issue to this one specific to VariantAnnotation, but I thought I'd check if this was a) possible; b) useful; and c) of interest.

Ping @lawremi who first proposed investigating support for this structure.

@LTLA
Copy link
Contributor

LTLA commented Jan 13, 2021

FYI the BumpyMatrix class may be helpful here, for storing non-scalar elements of the same type/class in each cell of a matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants