Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possble DelayedArray constructor over IntegerList (or other List) #27

Open
Liubuntu opened this issue Aug 23, 2018 · 3 comments
Open

possble DelayedArray constructor over IntegerList (or other List) #27

Liubuntu opened this issue Aug 23, 2018 · 3 comments

Comments

@Liubuntu
Copy link

Liubuntu commented Aug 23, 2018

Hi @hpages ,

In the VariantAnnotation package, the CollapsedVCF (or ExpandedVCF) are saving the data entries in IntegerList / CharacterList.... And in the development of VCFArray, we are trying to represent these data entries as DelayedArray instances. Now we are converting the data entries into array to add dimension, and then use the DelayedArray constructor over the array. Is it possible to have the DelayedArray constructor directly work on List object? so that the internal data saving are still using a more efficient way in List structure than the ordinary list? @mtmorgan

 > XX <- IntegerList(c(list(rep(NA, 2)), list(rep(NA, 2)), list(rep(NA, 2)), list(rep(NA, 2))))
 > XX
 IntegerList of length 4
 [[1]] <NA> <NA>
 [[2]] <NA> <NA>
 [[3]] <NA> <NA>
 [[4]] <NA> <NA>

 > DelayedArray(array(XX))
 <4> DelayedArray object of type "list":          ## the type is ordinary list. 
    [1]    [2]    [3]    [4]
 NA, NA NA, NA NA, NA NA, NA

!> DelayedArray(array(XX))[1:2]  
 [[1]]
 [1] NA NA

 [[2]]
 [1] NA NA

## What we want is something like: 

!> DelayedArray(XX)
 <4> DelayedArray object of type "IntegerList":      
    [1]    [2]    [3]    [4]
 NA, NA NA, NA NA, NA NA, NA

!> DelayedArray(XX)[1:2]  
 IntegerList of length 2
 [[1]] <NA> <NA>
 [[2]] <NA> <NA>
@hpages
Copy link
Contributor

hpages commented Aug 24, 2018

Hi Qian,

There are 2 ways to deal with this: (1) the quick-and-dirty way that would only let you wrap a List object in a 1-dimensional DelayedArray object, and (2) the more general solution that would let you wrap a List object in a DelayedArray object of arbitrary dimensions.

But first some background: Objects passed to DelayedArray() need to comply with the "seed contract" which means that they need to have dimensions. List derivatives don't support dim() in general (with the notable exception of ArrayGrid objects). And trying to set directly the "dim" attribute on them doesn't seem to work (for some reason, the methods package doesn't let us put attributes on S4 objects):

x <- rep(IntegerList(1:5, integer(0), NA, 3:-2), 3)
dim(x) <- c(6, 2)
# Error in dim(x) <- c(6, 2) : invalid first argument

Solution (1) only requires that you define a "dim" and "extract_array" method for List objects:

setMethod("dim", "List", function(x) length(x))
setMethod("extract_array", "List",
    function(x, index)
    {
        x_dim <- dim(x)
        ans_dim <- DelayedArray:::get_Nindex_lengths(index, x_dim)
        i <- index[[1L]]
        if (!is.null(i))
            x <- x[i]
        DelayedArray:::set_dim(as.list(x), ans_dim)
    }
)

Now that List objects comply with the "seed contract", they can be passed to DelayedArray():

DelayedArray(x)
# <12> DelayedArray object of type "list":
#                [1]                [2]                [3]                  . 
#      1, 2, 3, 4, 5                                    NA                  . 
#               [11]               [12] 
#                 NA 3, 2, 1, 0, -1, -2 

However, I'm not sure about the exact consequences of defining this "dim" method for List objects but I don't have a good feeling about it. There is a lot of code around the place that relies on things like if (is.null(dim(x))) in order to decide how to operate on an object. This is why I call this a quick-and-dirty solution.

Solution (2) is more general and much cleaner. It involves the following:

The easiest way to set arbitrary dimensions on a List object is to wrap the object in a thin wrapper that can hold the dim information. Something like this:

setClass("ListArraySeed",
    contains="Array",
    representation(
        dim="integer",
        L="List"
    )
)

Note that we use composition here instead of inheritance, which is a key aspect of this solution and why it is cleaner and safer than solution (1).

seed <- new("ListArraySeed", dim=c(6L,2L), L=x)
dim(seed)
# [1] 6 2
dimnames(seed)  # no dimnames for now but this would be easy to support
# NULL

Before we can pass this to DelayedArray(), we need to define an "extract_array" method:

### Will work if x@L supports linear (i.e. 1D-style) subsetting and as.list().
setMethod("extract_array", "ListArraySeed",
    function(x, index)
    {
        x_dim <- dim(x)
        ans_dim <- DelayedArray:::get_Nindex_lengths(index, x_dim)
        i <- DelayedArray:::to_linear_index(index, x_dim)
        ans <- as.list(x@L[i])
        DelayedArray:::set_dim(ans, ans_dim)
    }
)

Then:

DelayedArray(seed)
# <6 x 2> DelayedMatrix object of type "list":
#                   [,1]               [,2]
# [1,]      1, 2, 3, 4, 5                 NA
# [2,]                    3, 2, 1, 0, -1, -2
# [3,]                 NA      1, 2, 3, 4, 5
# [4,] 3, 2, 1, 0, -1, -2                   
# [5,]      1, 2, 3, 4, 5                 NA
# [6,]                    3, 2, 1, 0, -1, -2

And finally, in the same fashion that we have the HDF5ArraySeed/HDF5Array/HDF5Matrix trio, we would need to complete this with the ListArray and ListMatrix classes (would extend DelayedArray and DelayedMatrix, respectively), and with the ListArray() constructor. So you could just do something like:

M <- ListArray(x, dim=c(6, 2))

and this would return a ListMatrix instance which would degrade to a DelayedMatrix instance as soon as you start operating on it e.g. M[ , 1], t(M), etc...

As an extra convenience, the DelayedArray() constructor could be modified to work directly on List object x, in which case it would just call ListArray(x, dim=length(x)).

So this is feasible, but will require some significant new developments. Adding this to the TODO list but don't think I'll be able to get to this before September...

H.

@Liubuntu
Copy link
Author

The 2nd solution looks good and robust. I guess we are not in rush of this, but will be a great feature to be added in DelayedArray. We can close the issue for now if you want, as you already have it in your TODO list. :)

@hpages
Copy link
Contributor

hpages commented Aug 24, 2018

Let's keep it open. My TODO list is virtual and opened issues are part of it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants