Skip to content

Save delayed operations for portable recovery in other analysis environments.

Notifications You must be signed in to change notification settings

ArtifactDB/chihaya-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DelayedArrays to HDF5

Overview

Save delayed operations to HDF5 using the chihaya specification. This extracts operations out of a DelayedArray and stores them in a HDF5 file, where they can be used to reconstitute the same DelayedArray in a new R session - or indeed, in a different analysis framework altogether. The idea is to save the operations, which is usually cheap; rather than the results of the operations, which may be expensive for large datasets or when sparsity is broken.

Quick start

If we make a DelayedArray with arbitrary operations:

library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5) 
x <- log2(x + 1)
x
## <5 x 10> matrix of class DelayedMatrix and type "double":
##             [,1]        [,2]        [,3] ...       [,9]      [,10]
## [1,] 1.318228112 1.789374232 1.854133153   . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794   . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104   . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910   . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438   . 1.83012744 2.11469960

We can save it to file with the chihaya R package:

library(chihaya)
fpath <- tempfile(fileext=".h5")
saveDelayed(x, fpath, "my_delayed_array")
rhdf5::h5ls(fpath)
##                                     group             name       otype  dclass      dim
## 0                                       / my_delayed_array   H5I_GROUP
## 1                       /my_delayed_array             base H5I_DATASET   FLOAT    ( 0 )
## 2                       /my_delayed_array           method H5I_DATASET  STRING    ( 0 )
## 3                       /my_delayed_array             seed   H5I_GROUP
## 4                  /my_delayed_array/seed           method H5I_DATASET  STRING    ( 0 )
## 5                  /my_delayed_array/seed             seed   H5I_GROUP
## 6             /my_delayed_array/seed/seed            along H5I_DATASET INTEGER    ( 0 )
## 7             /my_delayed_array/seed/seed           method H5I_DATASET  STRING    ( 0 )
## 8             /my_delayed_array/seed/seed             seed   H5I_GROUP
## 9        /my_delayed_array/seed/seed/seed            index   H5I_GROUP
## 10 /my_delayed_array/seed/seed/seed/index                0 H5I_DATASET INTEGER        5
## 11       /my_delayed_array/seed/seed/seed             seed   H5I_GROUP
## 12  /my_delayed_array/seed/seed/seed/seed             data H5I_DATASET   FLOAT 100 x 10
## 13  /my_delayed_array/seed/seed/seed/seed           native H5I_DATASET INTEGER    ( 0 )
## 14            /my_delayed_array/seed/seed             side H5I_DATASET  STRING    ( 0 )
## 15            /my_delayed_array/seed/seed            value H5I_DATASET   FLOAT        5
## 16                 /my_delayed_array/seed             side H5I_DATASET  STRING    ( 0 )
## 17                 /my_delayed_array/seed            value H5I_DATASET   FLOAT    ( 0 )

And then reload it in a separate session:

y <- loadDelayed(fpath, "my_delayed_array")
y
## <5 x 10> matrix of class DelayedMatrix and type "double":
##             [,1]        [,2]        [,3] ...       [,9]      [,10]
## [1,] 1.318228112 1.789374232 1.854133153   . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794   . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104   . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910   . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438   . 1.83012744 2.11469960

The file at fpath follows the specification described here. This provides cross-language portability and ensures that the serialization process is robust to changes in the DelayedArray class structure.

Comments

Many of the basic operations in DelayedArray are supported. However, there are a few operations that are not described by the chihaya specification. An incomplete list is provided below:

  • is.na. This is missing as there is no accepted standard definition of missing-ness. (In comparison, is.nan is well-defined and is supported by the chihaya specification.)
  • All distribution functions, e.g., dpois, qunif and so on. These were omitted from the specification as they do not have native implementations in many frameworks.

About

Save delayed operations for portable recovery in other analysis environments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published