Permalink
Switch branches/tags
Nothing to show
Find file Copy path
179 lines (115 sloc) 5.27 KB

Getting started

Installation

**** Please note: the central luarocks server has another package called hdf5 (http://colberg.org/lua-hdf5/) - if you use 'luarocks install' you may get that one instead. ****

**** Please note also: torch-hdf5 now requires version 1.8.14 or greater of hdf5! ****

OS X

brew tap homebrew/science
brew install hdf5
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

Note: if luarocks make fails with an unsatisfied dependency, the luarocks being used is likely not the one provided by torch. Try using [torch install directory]/install/bin/luarocks instead.

Ubuntu < 13.04

sudo apt-get install libhdf5-serial-dev hdf5-tools
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

Ubuntu >= 13.04

sudo apt-get install libhdf5-serial-dev hdf5-tools
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"

Writing from torch

require 'hdf5'
local myFile = hdf5.open('/path/to/write.h5', 'w')
myFile:write('/path/to/data', torch.rand(5, 5))
myFile:close()

Reading from torch

require 'hdf5'
local myFile = hdf5.open('/path/to/read.h5', 'r')
local data = myFile:read('/path/to/data'):all()
myFile:close()

Reading from Matlab

h5read /path/to/file.h5 /location/of/data

See the Matlab documentation for further information.

Reading from Python

You need to install a library:

$ pip install h5py

Then:

import h5py
myFile = h5py.File('/path/to/file.h5', 'r')

# The '...' means retrieve the whole tensor
data = myFile['location']['of']['data'][...]
print(data)

See also the h5py manual.

Reading from R

You need to install a library:

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

Then:

library(rhdf5)
mydata <- h5read("/path/to/file.h5", "/location/of/data")
str(mydata)

Alternative libraries for R include 'h5r' and 'ncdf4'.

More advanced usage

Compression, chunking, and other options

You can optionally pass a DataSetOptions object to specify how you want data to be written:

require 'hdf5'
local myFile = hdf5.open('/path/to/write.h5', 'w')
local options = hdf5.DataSetOptions()
options:setChunked(32, 32)
options:setDeflate()
myFile:write('/path/to/data', torch.rand(500, 500), options)
myFile:close()

Partial reading

You can read from a dataset without loading the whole thing at once:

local myFile = hdf5.open('/path/to/read.h5','r')
-- Specify the range for each dimension of the dataset.
local data = myFile:read('/path/to/data'):partial({start1, end1}, {start2, end2})
myFile:close()

Note that, for efficiency, hdf5 may still load (but not return) more than just the piece you ask for - depending on what options the file was written with. For example, if the dataset is chunked, it should just load the chunks that overlap with the part you ask for.

Size of the data

Getting the size of the dataset without loading the data:

local myFile = hdf5.open('/path/to/read.h5','r')
local dim = myFile:read('/path/to/data'):dataspaceSize()
myFile:close()

Tensor Type of the data

Checking the type of torch.Tensor without loading the data:

local myFile = hdf5.open('/path/to/read.h5','r')
local factory = myFile:read('/path/to/data'):getTensorFactory()
myFile:close()

Reading HDF5 file from multiple threads

If you want to use HDF5 from multiple threads, you will need a thread-safe build of the underlying HDF5 library. Otherwise, you will get random crashes. See the HDF5 docs for how to build a thread-safe version.

If you want to do this from torch you will also need to install torch threads. Then you can

    local mainfile = hdf5.open('/path/to/read.h5','r')
    local nthreads = 2
    local data = nil
    local worker = function(h5file)
        torch.setnumthreads(1)
        print(__threadid)
        return h5file:read("data" .. __threadid):all()
    end
    local pool = threads.Threads(nthreads, function(threadid) require'torch' require'hdf5'end)
    pool:specific(true)

    for i=1,nthreads do
        pool:addjob(i, worker, function(_data) data = _data end, mainfile)
    end
    for i=1,nthreads do
        pool:dojob()
        print(data:size(1)==10)
    end
    mainfile:close()

Command-line

There are also a number of handy command-line tools.

h5ls

Lists specified features of HDF5 file contents.

h5dump

Examine the contents of an HDF5 file and dump those contents to an ASCII file.

h5diff

Compare two HDF5 files.

h5copy

Copies HDF5 objects from a file to a new file

Other

See this page for many more HDF5 tools.

Elsewhere

Libraries for many other languages and tools exist, too. See this list for more information.