Overview

The export2hdf5 programme is a utility for fusing multiple data sources into one HDF5 file. For instance, you might have simultaneously recorded physiological signals (e.g., multi-channel data) using several devices, each possibly having their own data format. To make the analysis of the combined data easier you can use export2hdf5 to fuse all these recordings into one HDF5 file.

You can also add metadata for each channel and decide how channels in the original recording are mapped to paths in the HDF5 file.

Installation

Please note that export2hdf5 requires Python 3.

To install the latest version of export2hdf5 into a virtual environment using PIP proceed as follows:

virtualenv -p python3 hdf5example
cd hdf5example
source bin/activate
pip3 install git+https://github.com/bwrc/export2hdf5

After this you can use export2hdf5 directly from the command line or import it into Python scripts (more detailed instructions below).

Please note that export2hdf5 depends on (amongst other packages) NumPy and SciPy and you might want to install these separately using, e.g., your operating system package manager. In this case, install export2hdf5 without dependencies as follows:

   pip3 install --no-dependencies git+https://github.com/bwrc/export2hdf5

export2hdf5 is tested on GNU/Linux. On Linux you might have to use sudo for the installation if you do not use a virtual environment or want to install the export2hdf5 globally.

Using export2hdf5

The configuration file

The export2hdf5 utility requires a configuration file in json-format. The configuration file specifies the filenames of the data sources. It is assumed that each data source has one or more channels, corresponding to different time series. For biomedical data the time series are typically different biosignals, e.g., brain signals recorded from different scalp locations or different ECG leads. Events can be discrete and are stored as a compouned HDF5 dataset.

A sample configuration file is provided with export2hdf (config_sample.json). A short example with one three-channel recording is given below.

{
  "output": {
    "filename": "/tmp/example.h5"
  },
  "datasets": [
    {
      "filename": "/path/to/the/recording.edf",
      "data_type": "edf",
      "maps": [
        {
          "path": "ECG/Faros",
          "channels": [
            "ECG_1",
            "ECG_2",
            "ECG_3"
          ],
          "shared_group": 1,
          "meta": [
            {
              "channels": [
                "*"
              ],
              "info": {
                "comment": "default comment for all channels",
                "comment2": "another default comment for all channels"
              }
            },
            {
              "channels": [
                "ECG_1"
              ],
              "info": {
                "comment": "this is a comment for channel ECG_1",
                "example": "this is a second comment for ECG_2"
              }
            },
            {
              "channels": [
                "ECG_2"
              ],
              "info": {
                "example2": "this is a comment for ECG2"
              }
            }
          ]
        }
      ]
    }
  ]
}

In this example, output specifies the name of the HDF5 that is to be created. It it exists it will be overwritten.

Datasets are specified as elements in the 'datasets' list:

filename : the filename of the data source
data_type : defines the type of data so that the correct import module can be used, see below for details on supported data formats
maps : defines the mappings, i.e., mapping of channels in the data source to resources in the HDF5 file. The path in the map gives the resource in the HDF5 file and the channels to be exported to this resource are given in the channels array. The wildcard * is supported and means all channels in the dataset, i.e., all channels in the file.
shared_group : Boolean defining whether or not all of the channels in the current should share the same time vector. The channels can share the same time vector if they are sampled simultaneously at the same rate.
meta : provide additional metadata. The metadata is given in structures containing information on which channels the metadata is relevant for. The wildcard * is supported and means all channels. The metadata (e.g., comments) are entered in the info section, in which different tags can be used (e.g., comment or note).

Exporting multiple groups from the same file to different groups in the HDF5 file is accomplished by adding multiple maps to one dataset, each map having a different path and a different set of channels (the channel sets can be overlapping in HDF5 resources). For instance, the (partial) configuration

{ "filename" : "/path/to/embla.edf",
  "data_type" : "edf",
  "maps" : [
      { "path" : "EEG/Titanium",
        "channels" : ["M1", "M2", "E1", "E2", "Fz", "C3", "C4", "Oz"],
        "shared_group" : 1
      },
      { "path" : "EMG/Titanium",
        "channels" :  ["ChinL", "ChinR"],
        "shared_group" : 1
      }
  ]
}

places the EEG channels in the HDF5 resource EEG/Titanium and the EMG channels in the resource EMG/Titanium.

Note that for events (neurone_events currently the only event type) only path should be given in maps. An example is given next.

{
  "filename": "/path/to/the/neurone/recording/",
  "data_type": "neurone_events",
  "maps": [
    {
      "path": "Events/Neurone"
    }
  ]
}

Text notes can also be stored as follows.

{
  "filename": "/path/to/a/text/file.txt",
  "data_type": "text",
  "maps": [
    {
      "path": "Notes/Note_1"
    }
  ]
}

Using export2hdf5 from the command line

The export2hdf5 utility can be used directly from the command line.

To get usage instructions:

export2hdf5 --help

To only validate that a given configuration file is OK:

export2hdf5 --config <path to config file> --validate-only

To fuse the data into an HDF5 file based on information in the configuration file:

export2hdf5 --config <path to config file>

This produces an HDF5 file in the location configured in the output section of the configuration file. All files are read from the locations specified in the locations specified in the datasets section in the configuration file.

Using export2hdf5 as a module from Python

The export2hdf5 utility can also be used a module from Python, e.g., for automation of data export. Below is a brief example of how export2hdf5 can be called from Python. Please note that export2hdf5 requires Python 3.

from export2hdf5 import export_hdf5

# define the configuration file
config_file = "config_example.json"

# validate the configuration file
config_check = export_hdf5.validate_config(config_file)

if config_check is None:
    print("Configuration file OK.")

# export data
export_hdf5.export_hdf5(config_file)

Supported data types

export2hdf5 currently supports the following data formats:

edf : data stored in the European Data format
edf_faros : data recoreded using the Mega Electronics Ltd Faros device. The data is stored in the EDF. This function automatically converts accelerometer units to G from mG.
mydarwin_ibi : IBI data exported from the MyDarwin analysis platform
mydarwin_summary : Summary data exported from the MyDarwin analysis platform
empatica : data recorded using an Empatica E4 device
shimmer : data recorded using a Shimmer device
bodyguard_ibi : interbeat interval (IBI) data exported form the Firstbeat Bodyguard platform
bodyguard_acc : acceleration data exported form the FirstBeat Bodyguard platform
bodyguard_features : features exported from the FirstBeat Bodyguard platform
bodyguard_features_misc : more features exported from the FirstBeat Bodyguard platform
psg_hypnogram : hypnogram data in the RemLogic XML format
psg_arousal : arousal events in the RemLogic XML format
neurone : data recorded using an Bittium NeurOne device.
neurone_events : events from data recorded using an Bittium NeurOne device.
actigraph : data recorded using an ActiGraph device. The data must be exported to CSV format. Both 3-axis accelerometer data sampled at 50 Hz and raw data (accelerometer, gyroscope, magnetometer, temperature) data sampled at 100 Hz is supported.
text : general text (UTF-8), e.g., notes.

Reading HDF5 -files in other programming languages

Python

# load library
import h5py
import numpy as np

# get file handle
with h5py.File('/path/to/example.hdf5','r') as h5_file:
    # list groups in the file
    groups = list(h5_file.keys())
    print(groups)

    # read EEG data from one channel
    data = h5_file["/EEG/Device/Fz"]

R

It is recommended to use the rhdf5-package which supports compound datasets. See the link for installation instructions.

## load library
require(rhdf5)

## define filename
h5_file <- '/path/to/example.hdf5'

## list contents of file
rhdf5::h5ls(h5_file)

## read EEG data from one channel
data <- rhdf5::h5read(h5_file, '/EEG/Device/Fz')

MATLAB

MATLAB supports readng of HDF5-files without additional toolboxes.

% define filename
h5_file = '/path/to/example.hdf5'

% list contents of file
hi = h5info(h5_file)
h5disp(h5_file)
h5disp(h5_file, '/EEG/Device/T8')

% read EEG data from one channel
data = h5read(h5_file, '/EEG/Device/Fz')

License

export2hdf5 is licensed under the MIT license. Please see the file LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
export2hdf5		export2hdf5
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_sample.json		config_sample.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Using export2hdf5

The configuration file

Using export2hdf5 from the command line

Using export2hdf5 as a module from Python

Supported data types

Reading HDF5 -files in other programming languages

Python

R

MATLAB

License

About

Releases

Packages

Languages

License

bwrc/export2hdf5

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Using export2hdf5

The configuration file

Using export2hdf5 from the command line

Using export2hdf5 as a module from Python

Supported data types

Reading HDF5 -files in other programming languages

Python

R

MATLAB

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages