Skip to content

Latest commit

 

History

History
171 lines (128 loc) · 6.33 KB

README.md

File metadata and controls

171 lines (128 loc) · 6.33 KB

Build Status AppVeyor Build Status codecov.io CRAN_Status_Badge

h5 is an R interface to the HDF5 library under active development. It is available on Github and already released on CRAN for all major platforms (Windows, OS X, Linux). Online documentation for the package is available at http://h5.predictingdaemon.com.

HDF5 is an excellent library and data model to store huge amounts of data in a binary file format. Supporting most major platforms and programming languages it can be used to exchange data files in a language independent format. Compared to R's integrated save() and load() functions it also supports access to only parts of the binary data files and can therefore be used to process data not fitting into memory.

h5 utilizes the HDF5 C++ API through Rcpp and S4 classes. The package is covered by 200+ test cases with a coverage greater than 80%.

Install

h5 has already been released on CRAN, and can therefore be installed using

install.packages("h5")

The most recent development version can be installed from Github using devtools:

library(devtools)
install_github("mannau/h5")

Please note that this version has been tested with the current hdf5 library 1.8.14 (and 1.8.13 for OS X) - you should therefore install the most current hdf5 library including its C++ API for your platform.

Requirements

Windows

This package already ships the library for windows operating systems through h5-libwin. No additional requirements need to be installed.

OS X

Using OS X and Homebrew you can use the following command to install HDF5 library dependencies and headers:

brew install homebrew/science/hdf5 --enable-cxx

Linux (e.g. Debian, Ubuntu)

With Debian-based Linux systems you can use the following command to install the dependencies:

sudo apt-get install libhdf5-dev

For older versions (Debian Squeeze, Ubuntu Precise) it is required to install libhdf5-serial-dev:

sudo apt-get install libhdf5-serial-dev

Since h5 requires the 'new' v18 API version which does not seem to be installed on e.g. Precise it might be necessary to install the dependency libhdf5-serial-dev through the ppa:marutter/rrutter repository (Ubuntu) or soon directly the h5 package via cran2deb (Debian).

Custom Install Parameters

If the hdf5 library is not located in a standard directory recognized by the configure script the parameters CPPFLAGS and LIBS may need to be set manually. This can be done using the --configure-vars option for R CMD INSTALL in the command line, e.g

R CMD INSTALL h5_<version>.tar.gz --configure-vars='LIBS=<LIBS> CPPFLAGS=<CPPFLAGS>'

The most recent version with required paramters can also be directly installed from github using devtools in R:

require(devtools)
install_github("mannau/h5", args = "--configure-vars='LIBS=<LIBS> CPPFLAGS=<CPPFLAGS>'")

A concrete OS X example setting could look like this:

R CMD INSTALL h5_0.9.2.tar.gz --configure-vars='LIBS=-L/usr/local/Cellar/hdf5/1.8.13/lib -L/usr/local/opt/szip/lib  -L. -lhdf5_cpp -lhdf5 -lz -lm CPPFLAGS=-I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include'

Quick Start

We start by creating an HDF5 file holding a numeric vector, an integer matrix and a character array.

library(h5)
testvec <- rnorm(10)
testmat <- matrix(1:9, nrow = 3)
row.names(testmat) <- 1:3
colnames(testmat) <- c("A", "BE", "BU")
letters1 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
letters2 <- paste(LETTERS[runif(45, min = 1, max = length(LETTERS))])
testarray <- array(paste0(letters1, letters2), c(3, 3, 5))

file <- h5file("test.h5")
# Save testvec in group 'test' as DataSet 'testvec'
file["test/testvec"] <- testvec
file["test/testmat"] <- testmat
file["test/testarray"] <- testarray
h5close(file)

We can now retrieve the data from the file

file <- h5file("test.h5")
dataset_testmat <- file["test/testmat"]
# We can now retrieve all data from the DataSet object using e.g. the  subsetting operator
dataset_testmat[]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

We can also subset the data directly, e.g. row 1 and 3

dataset_testmat[c(1, 3), ]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    3    6    9

Note, that we have now lost the row- and column names associated with the testmat object in the retrieved matrix. HDF5 supports metadata with attributes, which we need to add to (retrieve from) the DataSet manually.

h5attr(dataset_testmat, "rownames") <- row.names(testmat)
h5attr(dataset_testmat, "colnames") <- colnames(testmat)

We can now retrieve our matrix including meta-data as follows:

outmat <- dataset_testmat[]
row.names(outmat) <- h5attr(dataset_testmat, "rownames")
colnames(outmat) <- h5attr(dataset_testmat, "colnames")
identical(outmat, testmat)
## [1] TRUE

Do not forget to close the HDF5 file in the end

h5close(file)

License License

This package is shipped with a BSD-2-Clause License.