## Welcome to this tutorial on building data iterators with the `Harmonia` class from the `NanTex` library!
In this notebook, we will learn how to use the `Harmonia` module of the `NanTex` package to package and provide synthetic overlay data generated with the `Tekhne` module for training and testing of the `NanTex` networks.

### Requirements
- Synthetic overlay data generated with the `Tekhne` module of the `NanTex` package (see `tekhne_tutorial.ipynb`)
- `NanTex` package installed (see `installation_guide.md`)
- Basic knowledge of Python and Jupyter Notebooks

If you have not yet generated synthetic overlay data, please refer to the `tekhne_tutorial.ipynb` notebook first.

## Part 0: Dependencies

In [None]:
## Dependencies
import os
import json
import numpy as np
import matplotlib.pyplot as plt

## NanTex modules
from nantex.batching import Harmonia
from nantex.util import pltStyler

## Part I: Configure and Instantiate Harmonia

In [None]:
## Generate Config

# define and create config directory
config_dir = '../configs/'
os.makedirs(config_dir, exist_ok=True)

# generate boilerplate config file
Harmonia.generate_boilerplate_config_file(config_dir)

# show the generated config file
with open(os.path.join(config_dir, 'harmonia_config.json'), 'r') as f:
    print(f.read())

**Heads-up** \\\ Some of our modules contain convenience functions that interface with the Windows file system. If you are using a UNIX based system (Linux, MacOS), you will need to provide the directory paths manually in the configuration files. Please refer to the docstrings of the respective functions for more information.

## UNIX

In [None]:
## define train and validation directories
raw_source:str = '../path/to/directory/containing/training/olverays/'
val_source:str = '../path/to/directory/containing/validation/olverays/'

# write to config file
with open(os.path.join(config_dir, 'harmonia_config.json'), 'r') as f:
    config = json.load(f)

# update config with new paths
config['raw_source'] = raw_source
config['val_source'] = val_source

# write updated config back to file
with open(os.path.join(config_dir, 'harmonia_config.json'), 'w') as f:
    json.dump(config, f, indent=4)

In [None]:
## Instantiate
BatchProvider:Harmonia
BatchProvider = Harmonia.from_config(config_file_path='../configs/harmonia_config.json',
                                     datatype='npy',
                                     DEBUG=True)

# the configuration can also be passed as a dictionary
with open('../configs/harmonia_config.json', 'r') as f:
    config = json.load(f)
    
BatchProvider = Harmonia(config=config,
                         datatype='npy',
                         DEBUG=True)

## Windows

In [None]:
## Instantiate
BatchProvider:Harmonia
BatchProvider = Harmonia.from_config(config_file_path='../configs/harmonia_config.json',
                                     datatype='npy',
                                     DEBUG=True)

### Checkpoint I: Instantiate the `Harmonia` class

In [None]:
## Let's check the configuration
BatchProvider.pprint_config()

## Part II: Build Data Iterators and Visualize Batches

In [None]:
## Build Data Iterators
train_batcher, validation_batcher = BatchProvider.build()

## Build Train Iterator
train_iterator = iter(train_batcher)

### Checkpoint II: Build Data Iterators

The `Harmonia` class provides a method called `build()` that constructs training and validation data iterators based on the configuration provided during instantiation. These iterators can be used to fetch batches of data during the training process. They are implemented as TensorFlow `tf.data.Dataset` objects, which are optimized for performance and can be easily integrated into PyTorch training loops. 

These iterators will yield batches of data in the form of tuples, where each tuple contains a batch of input images and their corresponding labels (masks). The shape and content of these batches depend on the configuration parameters set during the instantiation of the `Harmonia` class. A `Stop Iteration` exception will be raised when the iterator reaches the end of the dataset, i.e. when all batches have been consumed.

Make sure that you have enough batches in your dataset to avoid running into this exception during training. You can adjust the `batch_size` and `steps_per_epoch` parameters in the configuration to control the number of batches and the size of each batch. You may also specify a `multiply_factor` to artificially increase the number of batches by repeating the dataset multiple times. However, this may lead to overfitting if the same samples are seen too often during training. You should rather consider generating more synthetic data with the `Tekhne` module.

In [None]:
## Demonstrate fetching batches from the iterator
print("Fetching batches from the training iterator:\n")
iterator = 0
while True:
    try:
        x, y = next(train_iterator)
        print(x.shape, y.shape)
        iterator += 1
    except StopIteration:
        break

print()
print("Total batches in one epoch:", len(train_batcher))
print("Stop iteration hit at:", iterator)

## Part III: Visualize Batches

In [None]:
## Build Data Iterators
train_batcher, validation_batcher = BatchProvider.build()

In [None]:
# apply stylesheet
pltStyler().enforce_stylesheet()

x, y = train_batcher.dataset[np.random.randint(0, len(train_batcher.dataset))]
x, y = np.asarray(x), np.asarray(y)

# print data ranges
print("Overlay range:", np.min(x), np.max(x))
print("Overlay mean:", np.mean(x), "Overlay std:", np.std(x))
print("Mask ranges:", [(np.min(y[i]), np.max(y[i])) for i in range(y.shape[0])])
print("Mask means:", [np.mean(y[i]) for i in range(y.shape[0])])
print("Mask stds:", [np.std(y[i]) for i in range(y.shape[0])])

# plot input and masks
fig, ax = plt.subplots(1, y.shape[0]+1, figsize=(5*(y.shape[0]+1), 5))

# plot input
ax[0].imshow(x[0], cmap='gray')
ax[0].set_title('Overlay Image', fontsize=16)

# plot masks
for i in range(y.shape[0]):
    ax[i+1].imshow(y[i,...], cmap='gray')
    ax[i+1].set_title(f'Mask {i+1}', fontsize=16)