## Building a h5 dataset

__Prerequisite__: The script assumes that the data is already extracted using `dircad_data_extractor.ipynb`. The function `get_patient_case_dirc` bundles the extracted data into three-dimensional volumes.

The built database includes 20 groups. 
Each group includes an X (volume) and Y (segmentation) property.
In training phase, the batch generator selects samples from the dataset (e.g., select slice 45 of patient 3 => `return (hdf.get(3)['X'][45], hdf.get(3)['Y'][45])`)

In [None]:
import h5py
import numpy as np
from libs.data_extractor import get_patient_case_dirc
from libs.preprocessing import normalize_data, crop_x_y

The data is storaged in `data/training_data.h5`

In [None]:
for i in range(1,21):
    s = 'w' if i== 1 else 'r+'
    X, Y = get_patient_case_dirc(i)
    with h5py.File('data/training_data.h5', s, libver='latest') as hdf:
        # Clipping
        X = np.clip(X, -400., 400.)

        index = str(i  - 1) # to [0, 19] range
        group = hdf.create_group(index)
        
        # Cropping
        X, Y = crop_x_y(X, Y)
        
        # Normalizing
        X = normalize_data(X)

        #Storage
        group.create_dataset('x', data= X.reshape(X.shape + (1,)), dtype= 'float32')
        group.create_dataset('y', data= Y, dtype = 'uint8')
        hdf.swmr_mode = True