
Model training with training data generated on the fly
=======================================================

If the data generation speed of an AcouPipe dataset is fast enough to incorporate data on the fly for model training depends on the specific use case. 
This example demonstrates how to generate training data on the fly for supervised source localization tasks.

Here, the example demonstrates single source localization model training similar as in [KHS19], but without predicting the source strength. 
For demonstration, the underlying Cross-spectral matrix is created analytically (no time data is simulated).


## Build the dataset generator

At first, we manipulate the dataset config to only create single source examples on a smaller grid  of size $51 \times 51$

In [1]:
import numpy as np
from acoupipe.datasets.synthetic import DatasetSynthetic1

# training dataset
dataset = DatasetSynthetic1(max_nsources=1, tasks=2, mode='analytic')       

# we manipulate the grid to have a finer resolution
dataset.config.grid.increment = 1/50 # 51 x 51 grid

# build TensorFlow datasets for training and validation
training_dataset = dataset.get_tf_dataset(
    features=["sourcemap","loc"], f=2000, split="training",size=100000)
validation_dataset = dataset.get_tf_dataset(
    features=["sourcemap","loc"], f=2000, split="validation",size=10000)

2023-11-09 13:08:59.690698: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-09 13:08:59.801338: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-09 13:08:59.804306: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-11-09 13:08:59.804321: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore 

The TensorFlow dataset API can be used to build a data pipeline from the data generator. Here, batches with 16 source cases are used.

In [2]:
import tensorflow as tf 

def yield_features_and_labels(data):   
    feature = data['sourcemap'][0]
    f_max = tf.reduce_max(feature)
    feature /= f_max
    label = data['loc'][:2]
    return (feature,label)

training_dataset = training_dataset.map(yield_features_and_labels).batch(16).repeat()
validation_dataset = validation_dataset.map(yield_features_and_labels).batch(16)


## Train the model

Now, one can build the ResNet50V2 model and use the data to fit the model. This may take several hours, depending on the computational infrastructure. 

In [3]:
# build model architecture
model = tf.keras.Sequential(
    tf.keras.applications.resnet_v2.ResNet50V2(
    include_top=False,
    weights=None,
    input_shape=(51,51,1),
    ))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(2, activation=None))

# compile and fit
model.compile(optimizer=tf.optimizers.Adam(1.5*10e-4),loss='mse')
model.fit(training_dataset,validation_data=validation_dataset, epochs=1,steps_per_epoch=1000)

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
2023-11-09 13:09:12,291	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
[2m[36m(SamplerActor pid=1114260)[0m 2023-11-09 13:09:14.885016: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AV

  20/1000 [..............................] - ETA: 6:35 - loss: 19.2806

KeyboardInterrupt: 

After successfully training, the model can be used for source characteristic prediction.

In [None]:
import matplotlib.pyplot as plt
from acoular import L_p

sourcemap, labels = next(iter(validation_dataset))
sourcemap = sourcemap[0].numpy()
labels = labels[0].numpy()
sourcemap /= sourcemap.max()
prediction = model.predict(sourcemap[np.newaxis])[0]

extent = dataset.config.grid.extend() 

plt.figure()
plt.imshow(L_p(sourcemap.squeeze()).T,
            vmax=L_p(sourcemap.max()),
            vmin=L_p(sourcemap.max())-15,
            extent=extent,
            origin="lower")
plt.plot(prediction[0],prediction[1],'x',label="prediction")
plt.plot(labels[0],labels[1],'x',label="label")
plt.colorbar()
plt.legend()