The export2hdf5
programme is a utility for fusing multiple data sources into one HDF5 file. For instance, you might have simultaneously recorded physiological signals (e.g., multi-channel data) using several devices, each possibly having their own data format. To make the analysis of the combined data easier you can use export2hdf5
to fuse all these recordings into one HDF5 file.
You can also add metadata for each channel and decide how channels in the original recording are mapped to paths in the HDF5 file.
Please note that export2hdf5
requires Python 3.
To install the latest version of export2hdf5
into a virtual environment using PIP proceed as follows:
virtualenv -p python3 hdf5example
cd hdf5example
source bin/activate
pip3 install git+https://github.com/bwrc/export2hdf5
After this you can use export2hdf5
directly from the command line or import it into Python scripts (more detailed instructions below).
Please note that export2hdf5
depends on (amongst other packages) NumPy and SciPy and you might want to install these separately using, e.g., your operating system package manager. In this case, install export2hdf5
without dependencies as follows:
pip3 install --no-dependencies git+https://github.com/bwrc/export2hdf5
export2hdf5
is tested on GNU/Linux. On Linux you might have to use sudo
for the installation if you do not use a virtual environment or want to install the export2hdf5
globally.
The export2hdf5
utility requires a configuration file in json-format. The configuration file specifies the filenames of the data sources. It is assumed that each data source has one or more channels, corresponding to different time series. For biomedical data the time series are typically different biosignals, e.g., brain signals recorded from different scalp locations or different ECG leads. Events can be discrete and are stored as a compouned HDF5 dataset.
A sample configuration file is provided with export2hdf
(config_sample.json
). A short example with one three-channel recording is given below.
{
"output": {
"filename": "/tmp/example.h5"
},
"datasets": [
{
"filename": "/path/to/the/recording.edf",
"data_type": "edf",
"maps": [
{
"path": "ECG/Faros",
"channels": [
"ECG_1",
"ECG_2",
"ECG_3"
],
"shared_group": 1,
"meta": [
{
"channels": [
"*"
],
"info": {
"comment": "default comment for all channels",
"comment2": "another default comment for all channels"
}
},
{
"channels": [
"ECG_1"
],
"info": {
"comment": "this is a comment for channel ECG_1",
"example": "this is a second comment for ECG_2"
}
},
{
"channels": [
"ECG_2"
],
"info": {
"example2": "this is a comment for ECG2"
}
}
]
}
]
}
]
}
In this example, output
specifies the name of the HDF5 that is to be created. It it exists it will be overwritten.
Datasets
are specified as elements in the 'datasets' list:
filename
: the filename of the data sourcedata_type
: defines the type of data so that the correct import module can be used, see below for details on supported data formatsmaps
: defines the mappings, i.e., mapping of channels in the data source to resources in the HDF5 file. Thepath
in the map gives the resource in the HDF5 file and the channels to be exported to this resource are given in thechannels
array. The wildcard*
is supported and means all channels in the dataset, i.e., all channels in the file.shared_group
: Boolean defining whether or not all of the channels in the current should share the same time vector. The channels can share the same time vector if they are sampled simultaneously at the same rate.meta
: provide additional metadata. The metadata is given in structures containing information on whichchannels
the metadata is relevant for. The wildcard*
is supported and means all channels. The metadata (e.g., comments) are entered in theinfo
section, in which different tags can be used (e.g.,comment
ornote
).
Exporting multiple groups from the same file to different groups in the HDF5 file is accomplished by adding multiple maps to one dataset, each map having a different path and a different set of channels (the channel sets can be overlapping in HDF5 resources). For instance, the (partial) configuration
{ "filename" : "/path/to/embla.edf",
"data_type" : "edf",
"maps" : [
{ "path" : "EEG/Titanium",
"channels" : ["M1", "M2", "E1", "E2", "Fz", "C3", "C4", "Oz"],
"shared_group" : 1
},
{ "path" : "EMG/Titanium",
"channels" : ["ChinL", "ChinR"],
"shared_group" : 1
}
]
}
places the EEG channels in the HDF5 resource EEG/Titanium
and the EMG channels in the resource EMG/Titanium
.
Note that for events (neurone_events
currently the only event type) only path
should be given in maps
. An example is given next.
{
"filename": "/path/to/the/neurone/recording/",
"data_type": "neurone_events",
"maps": [
{
"path": "Events/Neurone"
}
]
}
Text notes can also be stored as follows.
{
"filename": "/path/to/a/text/file.txt",
"data_type": "text",
"maps": [
{
"path": "Notes/Note_1"
}
]
}
The export2hdf5
utility can be used directly from the command line.
To get usage instructions:
export2hdf5 --help
To only validate that a given configuration file is OK:
export2hdf5 --config <path to config file> --validate-only
To fuse the data into an HDF5 file based on information in the configuration file:
export2hdf5 --config <path to config file>
This produces an HDF5 file in the location configured in the output
section of the configuration file. All files are read from the locations specified in the locations specified in the datasets
section in the configuration file.
The export2hdf5
utility can also be used a module from Python, e.g., for automation of data export. Below is a brief example of how export2hdf5
can be called from Python. Please note that export2hdf5
requires Python 3.
from export2hdf5 import export_hdf5
# define the configuration file
config_file = "config_example.json"
# validate the configuration file
config_check = export_hdf5.validate_config(config_file)
if config_check is None:
print("Configuration file OK.")
# export data
export_hdf5.export_hdf5(config_file)
export2hdf5
currently supports the following data formats:
edf
: data stored in the European Data formatedf_faros
: data recoreded using the Mega Electronics Ltd Faros device. The data is stored in the EDF. This function automatically converts accelerometer units to G from mG.mydarwin_ibi
: IBI data exported from the MyDarwin analysis platformmydarwin_summary
: Summary data exported from the MyDarwin analysis platformempatica
: data recorded using an Empatica E4 deviceshimmer
: data recorded using a Shimmer devicebodyguard_ibi
: interbeat interval (IBI) data exported form the Firstbeat Bodyguard platformbodyguard_acc
: acceleration data exported form the FirstBeat Bodyguard platformbodyguard_features
: features exported from the FirstBeat Bodyguard platformbodyguard_features_misc
: more features exported from the FirstBeat Bodyguard platformpsg_hypnogram
: hypnogram data in the RemLogic XML formatpsg_arousal
: arousal events in the RemLogic XML formatneurone
: data recorded using an Bittium NeurOne device.neurone_events
: events from data recorded using an Bittium NeurOne device.actigraph
: data recorded using an ActiGraph device. The data must be exported to CSV format. Both 3-axis accelerometer data sampled at 50 Hz and raw data (accelerometer, gyroscope, magnetometer, temperature) data sampled at 100 Hz is supported.text
: general text (UTF-8), e.g., notes.
# load library
import h5py
import numpy as np
# get file handle
with h5py.File('/path/to/example.hdf5','r') as h5_file:
# list groups in the file
groups = list(h5_file.keys())
print(groups)
# read EEG data from one channel
data = h5_file["/EEG/Device/Fz"]
It is recommended to use the
rhdf5
-package
which supports compound datasets. See the link for installation
instructions.
## load library
require(rhdf5)
## define filename
h5_file <- '/path/to/example.hdf5'
## list contents of file
rhdf5::h5ls(h5_file)
## read EEG data from one channel
data <- rhdf5::h5read(h5_file, '/EEG/Device/Fz')
MATLAB supports readng of HDF5-files without additional toolboxes.
% define filename
h5_file = '/path/to/example.hdf5'
% list contents of file
hi = h5info(h5_file)
h5disp(h5_file)
h5disp(h5_file, '/EEG/Device/T8')
% read EEG data from one channel
data = h5read(h5_file, '/EEG/Device/Fz')
export2hdf5
is licensed under the MIT license. Please see the file LICENSE for more details.