
Load datasets stored in .h5 file format
=======================================

This example demonstrates how to load the data from a stored .h5 file and to build a 
data generator.

At first, we create a small temporary dataset by utilizing dataset1, compounding 5 source cases and the CSM as input feature.    

In [2]:
import os
import tensorflow as tf
from acoupipe import Dataset1
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # change tensorflow log level for doc purposes

# training dataset
d1 = Dataset1(
        split="training",
        size=5,
        features=["csm"])

# save to .h5 file
d1.save_h5(name="/tmp/dataset.h5")

100%|██████████| 5/5 [00:03<00:00,  1.55it/s]


The AcouPipe toolbox provides the `LoadH5Dataset` class to load the datasets stored into HDF5 format.
One can access each individual sample/source case by the h5f attribute of the class. To extract the first input feature ('csm' in this case) of the dataset:


In [3]:
from acoupipe import LoadH5Dataset

dataset_h5 = LoadH5Dataset(name="/tmp/dataset.h5")

s1 = dataset_h5.h5f['1']['csm'][:] # we use [:] to copy the data from file into the variable s1

## Building a TensorFlow/Keras Dataset 

With these definitions, a Python generator can be created which can be consumed by the Tensorflow Dataset API. Here, the dataset comprises the location, squared sound pressure, and the CSM. 

In [6]:

data_generator = dataset_h5.get_dataset_generator(
            features=['loc','p2','csm'], # the desired features to return from the file
            )

# provide the signature of the features
output_signature = {
            'loc' : tf.TensorSpec(shape=(3,None), dtype=tf.float32),
            'p2' : tf.TensorSpec(shape=(None,None), dtype=tf.float32),
            'csm':  tf.TensorSpec(shape=(None,64,64,None), dtype=tf.float32),
            }

dataset = tf.data.Dataset.from_generator(
            generator=data_generator,
            output_signature=output_signature
            )

data = next(iter(dataset))
print(data['loc'])

tf.Tensor(
[[-0.07077019 -0.04793944  0.31192958  0.138183  ]
 [ 0.03267673  0.07899582 -0.09707411  0.26443505]
 [ 0.5         0.5         0.5         0.5       ]], shape=(3, 4), dtype=float32)
