## Welcome to the Tutorial on Generating Synthetic Overlays of Nanotextural Structures
In this notebook, we will learn how to use `Tekhne`, a part of the `NanTex` library, to generate synthetic overlays of nanotextural structures. The goal is to generate datasets that can be used to train, validate and test the deep learning models developed in the `NanTex` library.

### Requirements
- Basic knowledge of Python
- Python 3.11 or higher
- Poetry

We assume you followed the installation instructions in the README.md file. If you haven't, please do so **before** proceeding.

**Heads-up** \\\ Some of our modules contain convenience functions that interface with the Windows file system. If you are using a different operating system, we advise you to follow the "UNIX" part of the tutorial.

## Part 0: Preparations

### Dependencies

In [None]:
## Dependencies
import os
import numpy as np
from glob import glob
import matplotlib.pyplot as plt

## NanTex modules
from nantex.data_preparation import Tekhne
from nantex.util import pltStyler

### DEMO DATA
Using your favorite web browser, download the demo dataset from Zenodo [here](https://doi.org/10.5281/zenodo.17120603). The dataset is a compressed file named `NanTex SRM Dataset â€” SMLM I (ShareLoc).rar`. After downloading, decompress the file to a directory of your choice. You should see nine (9) folders named `$FEATURE$_$USE$` (e.g. *ACT_train*) containing microscopy images.

## Part I: Configure and Instantiate the Tekhne instance

In [None]:
# configure the generator
tekhne_config = {
    "mode": 'overlay',          # what output to generate, 'overlay' or 'rotation'
    "multi_core": False,        # use multiple cores, powered by ray
    "augment": False,           # apply data augmentation
    "patches": 32,              # number of patches to extract from each input file
    "patchsize": (256,256),     # size of extracted patches
    "imagesize": (2048,2048),   # size of full output images
    "dtype_out": np.float32,    # output data type
    "dtype_in": np.uint8,       # input data type
    "DEBUG": True               # print debug messages
}

### UNIX

In [None]:
## Define Metaparameters and data paths

# data import paths
root_feature_1:str = 'path/to/your/first/feature'
root_feature_2:str = 'path/to/your/second/feature'
root_feature_3:str = 'path/to/your/third/feature'
... # add more features when needed

# outpath for prepared data
data_path_out = "path/to/your/output/directory"

# grab data
root_features:list[str] = [root_feature_1, 
                           root_feature_2, 
                           root_feature_3] # add more features when needed
# get list of files for each feature
root_features = [glob(f"{root}/*.npy") for root in root_features]

In [None]:
## Configure and Initialize Tekhne instance
OLGen:Tekhne
OLGen = Tekhne.from_glob(*root_features, data_path_out = data_path_out, **tekhne_config)

### Windows

In [None]:
## setup
OLGen: Tekhne
OLGen = Tekhne.from_explorer(**tekhne_config)

### Checkpoint I
By now you should have a Tekhne instance configured. Let's take a look at the metadata of the instance.

In [None]:
OLGen.metadata

## Tutorial Part II: Single Core Generation of Full-Frame Synthetic Overlays

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config.update({
    "mode": 'overlay',          # we're going to just overlay features
    "multi_core": False,        # disable multi-core for simplicity
    "patches": 0,               # no patches, full-frame generation
    "imagesize": (2048,2048),   # size of full output images
    "dtype_out": np.uint16,     # 16-bit output
    "DEBUG": True,              # print debug messages
    "disable_auto_standardization": True  # disable auto standardization for full-frame generation
})
OLGen.configure(**tekhne_config)

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (single core)
OLGen.generate_overlay()

### Checkpoint II
You should have generated a number of synthetic overlays. Let's take a look at your data directory and one of the generated overlays.

**Before we proceed - try the following:**
* try changing the mode to "rotation"
* try lowering the imagesize
* try increasing the imagesize (zero-padding will be applied if necessary)

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(data_files[0])

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig,axs = plt.subplots(1,data.shape[0], figsize=(data.shape[0] * 5,5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap='magma')
    axs[i].set_title(f"Feature {i+1}")
    axs[i].axis('off')

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## Cleanup the files in the output directory
import shutil, os

# get the filecount in the output directory
data_files_in = glob(f"{OLGen.data_path_out}/*.npy")

# remove all files in the output directory
shutil.rmtree(OLGen.data_path_out)
os.makedirs(OLGen.data_path_out, exist_ok=True)

# check files left
data_files_out = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Cleaned up {len(data_files_in) - len(data_files_out)} files.")

# restore input data from backup
OLGen.restore_input_data_from_backup()

## Part III: Multi-Core Generation of Full-Frame Synthetic Overlays

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config.update({
    "mode": 'rotation',         # we're going to overlay features with rotation
    "multi_core": True,         # enable multi-core for faster processing
    "patches": 0,               # no patches, full-frame generation
    "imagesize": (2048,2048),   # size of full output images
    "dtype_out": np.uint16,     # 16-bit output
    "DEBUG": True,              # print debug messages
    "disable_auto_standardization": True  # disable auto standardization for full-frame generation
})
OLGen.configure(**tekhne_config)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(num_cpu=12,               # number of cpu cores to use
                       launch_dashboard=True)   # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()  

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core() # shutdown the ray cluster. cleanup the resources

### Checkpoint III
You should have generated a number of synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig,axs = plt.subplots(1,data.shape[0], figsize=(data.shape[0] * 5,5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap='magma')
    axs[i].set_title(f"Feature {i+1}")
    axs[i].axis('off')

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## Cleanup the files in the output directory
import shutil, os

# get the filecount in the output directory
data_files_in = glob(f"{OLGen.data_path_out}/*.npy")

# remove all files in the output directory
shutil.rmtree(OLGen.data_path_out)
os.makedirs(OLGen.data_path_out, exist_ok=True)

# check files left
data_files_out = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Cleaned up {len(data_files_in) - len(data_files_out)} files.")

# restore input data from backup
OLGen.restore_input_data_from_backup()

## Part IV: Patched Multi-Core Generation of Synthetic Overlays

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config.update({
    "mode": 'rotation',         # we're going to overlay features with rotation
    "multi_core": True,         # enable multi-core for faster processing
    "patches": 32,              # 32 patches for patch-based generation
    "patchsize": (256,256),     # size of extracted patches
    "imagesize": (2048,2048),   # size of full output images
    "dtype_out": np.float32,    # 32-bit output
    "dtype_in": np.uint8,       # input data type
    "DEBUG": True,              # print debug messages
    "disable_auto_standardization": False  # disable auto standardization for full-frame generation
})
OLGen.configure(**tekhne_config)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(num_cpu=16,               # number of cpu cores to use
                       launch_dashboard=True)    # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()  

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core() # shutdown the ray cluster. cleanup the resources

## Checkpoint IV
You should have generated a number of synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig,axs = plt.subplots(1,data.shape[0], figsize=(data.shape[0] * 5,5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap='magma')
    axs[i].set_title(f"Feature {i+1}")
    axs[i].axis('off')

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## Cleanup the files in the output directory
import shutil, os

# get the filecount in the output directory
data_files_in = glob(f"{OLGen.data_path_out}/*.npy")

# remove all files in the output directory
shutil.rmtree(OLGen.data_path_out)
os.makedirs(OLGen.data_path_out, exist_ok=True)

# check files left
data_files_out = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Cleaned up {len(data_files_in) - len(data_files_out)} files.")

# restore input data from backup
OLGen.restore_input_data_from_backup()

## Part V: Augmented Patched Multi-Core Generation of Synthetic Overlays

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config.update({
    "mode": 'rotation',         # we're going to overlay features with rotation
    "multi_core": True,         # enable multi-core for faster processing
    "augment": True,            # enable augmented generation
    "patches": 8,               # 32 patches for patch-based generation
    "patchsize": (256,256),     # size of extracted patches
    "imagesize": (2048,2048),   # size of full output images
    "dtype_out": np.float32,    # 32-bit output
    "dtype_in": np.uint8,       # input data type
    "DEBUG": True,              # print debug messages
    "disable_auto_standardization": False  # disable auto standardization for full-frame generation
})
OLGen.configure(**tekhne_config)

In [None]:
## setup augmentation pipeline
import albumentations as A

# parts
parts = [A.VerticalFlip(p=0.5), A.HorizontalFlip(p=0.5), A.MedianBlur(p=0.3, blur_limit=5)]

# compose and assign
OLGen.augmentation_pipeline = A.Compose(parts)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(num_cpu=12,              # number of cpu cores to use
                       launch_dashboard=True)   # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()  

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core() # shutdown the ray cluster. cleanup the resources

## Checkpoint V
You should have generated a number of augmented patches extracted from synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig,axs = plt.subplots(1,data.shape[0], figsize=(data.shape[0] * 5,5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap='magma')
    axs[i].set_title(f"Feature {i+1}")
    axs[i].axis('off')

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## Cleanup the files in the output directory
import shutil, os

# get the filecount in the output directory
data_files_in = glob(f"{OLGen.data_path_out}/*.npy")

# remove all files in the output directory
shutil.rmtree(OLGen.data_path_out)
os.makedirs(OLGen.data_path_out, exist_ok=True)

# check files left
data_files_out = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Cleaned up {len(data_files_in) - len(data_files_out)} files.")

# restore input data from backup
OLGen.restore_input_data_from_backup()

## Part VI: EXTRA - A More Sophisticated Augmentation Pipeline

In [None]:
## Dependencies
import albumentations as A
from typing import List

In [None]:
## Setup Pipelines

# Define types
train_transform_schedule: List[A.ImageOnlyTransform]
val_transform_schedule: List[A.ImageOnlyTransform]
test_transform_schedule: List[A.ImageOnlyTransform]

In [None]:
# Define the train augmentation pipelines
train_transform_schedule = [
    A.RandomCrop(
        256,
        256,
        p=1,  # <- always apply
    ),  # Randomly crop the image <- choose a random crop of 256x256
    A.HorizontalFlip(p=0.5),  # Randomly flip the image horizontally (50% of the time)
    A.VerticalFlip(p=0.5),  # Randomly flip the image vertically (50% of the time)
    # Apply median blur with a 30% probability, kernes size is 5 <- play with the size to enhance the effect.
    # ADJUST IF, SHOULD OPENCV THROW A WEIRD ERROR.
    # (https://stackoverflow.com/questions/13193207/unsupported-format-or-combination-of-formats-when-using-cvreduce-method-in-ope)
    A.MedianBlur(p=0.3, blur_limit=3),
]

# building blocks for various applications in microscopy
# building blocks = [A.GaussNoise(p=0.5),
#                    A.MedianBlur(p=0.7, blur_limit=(3, 5)),
#                    A.RandomBrightnessContrast(p=0.3, brightness_limit=0.2, contrast_limit=0.2)]

In [None]:
# Define the validation augmentation pipelines
# it is important to have the same transformations for validation
val_transform_schedule = train_transform_schedule

In [None]:
# Define the test augmentation pipelines
# note that we do not want to apply blurring or other soft transformations as we assume peak quality for the test set
# in training, we use blurring to make the model more robust to noise
test_transform_schedule = [
    A.RandomCrop(256, 256, p=1),  # p=1 == apply always
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.5),
]

In [None]:
## Compose the transformations
train_augmentation_pipeline: A.Compose = A.Compose(train_transform_schedule)
val_augmentation_pipeline: A.Compose = A.Compose(val_transform_schedule)
test_augmentation_pipeline: A.Compose = A.Compose(test_transform_schedule)