# Create hdf5 file from gt3x files

In this notebook, code snippets are presented to collect and store the data from various .gt3x files to one hdf5 file. This has the advantage that further processing becomes much quicker as the gt3x files do not need to be decoded everytime.

In [1]:
import h5py
import os
from multiprocessing import Pool

from tqdm.notebook import tqdm
import glob2
import paat
from paat import io

# Set file path to relevant files
GT3X_BASE_PATH = os.path.join(os.sep, 'run', 'media', 'msw', 'LaCie', 'Actigraph_raw')
HDF5_FILE_PATH = os.path.join(os.sep, 'run', 'media', 'msw', 'LaCie', 'ACTIGRAPH_TU7.hdf5')

## Parallel execution

In [None]:
def process_file(file_path):
    # Load data from gt3x file
    time, acceleration, meta = paat.io.read_gt3x(file_path)
    
    while True:
        try:
            # Save data to new hdf5 file
            with h5py.File(HDF5_FILE_PATH, 'a') as hdf5_file:
                grp = hdf5_file.create_group(meta["Subject_Name"])
                paat.io.save_dset(grp, "acceleration", time, acceleration, meta)
            break
        except Exception:
            pass
            
            
# Get all gt3x files
gt3x_files = glob2.glob(os.path.join(GT3X_BASE_PATH, '**', '*.gt3x'))
    
# Process all files
with Pool(16) as p:
    list(tqdm(p.imap(process_file, gt3x_files), total=len(gt3x_files)))
    

  0%|          | 0/6157 [00:00<?, ?it/s]

## Iterative execution

``` python
def process_file(file_path):
    # Load data from gt3x file
    time, acceleration, meta = paat.io.read_gt3x(file_path)
    
    # Save data to new hdf5 file
    with h5py.File(HDF5_FILE_PATH, 'w') as hdf5_file:
        grp = hdf5_file.create_group(meta["Subject_Name"])
        paat.io.save_dset(grp, "acceleration", time, acceleration, meta)
          
        
# Get all gt3x files
gt3x_files = glob2.glob(os.path.join(GT3X_BASE_PATH, '**', '*.gt3x'))

for file_path in tqdm(gt3x_files):    
    process_file(file_path)
```

<div class="alert alert-info">

**Note:**

The code above takes a considerable amount of time to execute and is therefore not executed here. However, in some cases you might be interested in running the code on just a single core

</div>
