## Welcome to the DEMO on Generating Synthetic Overlays of Nanotextural Structures
In this notebook, we will generate synthetic overlays of nanotextural structures using the NanTex library. We will prepare training, validation, and testing datasets by overlaying multiple features with various augmentations.

### Requirements
- Basic knowledge of Python
- Python 3.11 or higher
- Poetry

We assume you followed the installation instructions in the README.md file. If you haven't, please do so **before** proceeding.


### Estimated Runtime
This notebook should take approximately 20-60 minutes to run, depending on your system's performance.

## Part 0: Preparation

### Data Preparation
Using your favorite web browser, download the demo dataset from Zenodo [here](https://doi.org/10.5281/zenodo.17120603). The dataset is a compressed file named `NanTex SRM Dataset — SMLM I (ShareLoc).rar`. After downloading, decompress the file to a directory of your choice. You should see nine (9) folders named `$FEATURE$_$USE$` (e.g. *ACT_train*) containing microscopy images.

### Dependencies

In [None]:
## Dependencies
import os
import numpy as np
from glob import glob
import matplotlib.pyplot as plt

## NanTex modules
from nantex.data_preparation import Tekhne
from nantex.util import pltStyler

### Path Setup

In [None]:
## Path setup

# select demo data path
path_to_demo_data = "path/to/NanTex SRM Dataset — SMLM I (ShareLoc)"

# select output path
path_to_output = "path/to/output/folder"

# create output folders
for name in ["train", "val", "test"]:
    os.makedirs(os.path.join(path_to_output, name), exist_ok=True)

## Part I: Training Data Generation

### Data Selection

In [None]:
## DEMO CASE - Training data preparation
data_to_generate = "train"  # options are 'train', 'val', 'test

# data import paths
root_feature_1: str = os.path.join(path_to_demo_data, f"MIC_{data_to_generate}")
root_feature_2: str = os.path.join(path_to_demo_data, f"LNP_{data_to_generate}")
root_feature_3: str = os.path.join(path_to_demo_data, f"ACT_{data_to_generate}")

# outpath for prepared data
## you are going to use the same routine to prepare data for training, validation, and testing
data_path_out = os.path.join(
    path_to_output, data_to_generate
)  # <- we are gonna start with training data

# grab data
root_features: list[str] = [root_feature_1, root_feature_2, root_feature_3]

# get list of files for each feature
root_features = {path: glob(f"{path}/*.png") for path in root_features}

# choose two (2) images per feature at random for DEMO purposes too cut time
if data_to_generate == "train":
    root_features = [
        np.random.choice(root_features[path], size=2, replace=False).tolist()
        for path in root_features
    ]
else:
    root_features = [root_features[path] for path in root_features]

### Configure and Instantiate the Tekhne instance

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config = {
    "mode": "rotation",  # we're going to overlay features with rotation
    "multi_core": True,  # enable multi-core for faster processing
    "augment": True,  # enable augmented generation
    "patches": 32,  # 32 patches for patch-based generation
    "patchsize": (256, 256),  # size of extracted patches
    "imagesize": (2048, 2048),  # size of full output images
    "dtype_out": np.float32,  # 32-bit output
    "dtype_in": np.uint8,  # input data type
    "DEBUG": True,  # print debug messages
    "disable_auto_standardization": False,  # disable auto standardization for full-frame generation
}

In [None]:
## Configure and Initialize Tekhne instance
OLGen: Tekhne
OLGen = Tekhne.from_glob(*root_features, data_path_out=data_path_out, **tekhne_config)

### Generate Training Data

In [None]:
## setup augmentation pipeline
import albumentations as A

# parts
parts = [
    A.VerticalFlip(p=0.5),
    A.HorizontalFlip(p=0.5),
    A.MedianBlur(p=0.3, blur_limit=5),
]

# compose and assign
OLGen.augmentation_pipeline = A.Compose(parts)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(
    num_cpu=12,  # number of cpu cores to use
    launch_dashboard=True,
)  # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core()  # shutdown the ray cluster. cleanup the resources

## Checkpoint I
You should have generated a number of augmented patches extracted from synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig, axs = plt.subplots(1, data.shape[0], figsize=(data.shape[0] * 5, 5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap="magma")
    axs[i].set_title(f"Feature {i + 1}")
    axs[i].axis("off")

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## cleanup
OLGen:Tekhne
OLGen = None

## Part II: Validation Data Generation

### Data Selection

In [None]:
## DEMO CASE - Training data preparation
data_to_generate = "val"  # options are 'train', 'val', 'test

# data import paths
root_feature_1: str = os.path.join(path_to_demo_data, f"MIC_{data_to_generate}")
root_feature_2: str = os.path.join(path_to_demo_data, f"LNP_{data_to_generate}")
root_feature_3: str = os.path.join(path_to_demo_data, f"ACT_{data_to_generate}")

# outpath for prepared data
## you are going to use the same routine to prepare data for training, validation, and testing
data_path_out = os.path.join(
    path_to_output, data_to_generate
)  # <- we are gonna start with training data

# grab data
root_features: list[str] = [root_feature_1, root_feature_2, root_feature_3]

# get list of files for each feature
root_features = {path: glob(f"{path}/*.png") for path in root_features}

# choose two (2) images per feature at random for DEMO purposes too cut time
if data_to_generate == "train":
    root_features = [
        np.random.choice(root_features[path], size=2, replace=False).tolist()
        for path in root_features
    ]
else:
    root_features = [root_features[path] for path in root_features]

### Configure and Instantiate the Tekhne instance

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config = {
    "mode": "rotation",  # we're going to overlay features with rotation
    "multi_core": True,  # enable multi-core for faster processing
    "augment": True,  # enable augmented generation
    "patches": 8,  # 8 patches for patch-based generation
    "patchsize": (256, 256),  # size of extracted patches
    "imagesize": (2048, 2048),  # size of full output images
    "dtype_out": np.float32,  # 32-bit output
    "dtype_in": np.uint8,  # input data type
    "DEBUG": True,  # print debug messages
    "disable_auto_standardization": False,  # disable auto standardization for full-frame generation
}

In [None]:
## Configure and Initialize Tekhne instance
OLGen: Tekhne
OLGen = Tekhne.from_glob(*root_features, data_path_out=data_path_out, **tekhne_config)

### Generate Validation Data

In [None]:
## setup augmentation pipeline
import albumentations as A

# parts
parts = [
    A.VerticalFlip(p=0.5),
    A.HorizontalFlip(p=0.5),
    A.MedianBlur(p=0.3, blur_limit=5),
]

# compose and assign
OLGen.augmentation_pipeline = A.Compose(parts)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(
    num_cpu=12,  # number of cpu cores to use
    launch_dashboard=True,
)  # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core()  # shutdown the ray cluster. cleanup the resources

## Checkpoint II
You should have generated a number of augmented patches extracted from synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig, axs = plt.subplots(1, data.shape[0], figsize=(data.shape[0] * 5, 5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap="magma")
    axs[i].set_title(f"Feature {i + 1}")
    axs[i].axis("off")

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## cleanup
OLGen:Tekhne
OLGen = None

## Part III: Generation of Full-Frame Synthetic Overlays for Testing

In [None]:
## Configure for Single Core Generation of Synthetic Overlays
tekhne_config.update(
    {
        "mode": "rotation",  # we're going to overlay features with rotation
        "multi_core": True,  # enable multi-core for faster processing
        "patches": 0,  # no patches, full-frame generation
        "imagesize": (2048, 2048),  # size of full output images
        "dtype_out": np.uint16,  # 16-bit output
        "DEBUG": True,  # print debug messages
        "disable_auto_standardization": True,  # disable auto standardization for full-frame generation
    }
)
OLGen.configure(**tekhne_config)

In [None]:
## setup multi core (multi core)
OLGen.setup_multi_core(
    num_cpu=12,  # number of cpu cores to use
    launch_dashboard=True,
)  # launch the ray dashboard <- default is True

## The runtime instance is accessible via:
OLGen._ray_instance

In [None]:
## let's check how many combinations we can generate
OLGen.estimate_number_of_outputs()

In [None]:
## generate the overlay (multi core)
OLGen.generate_overlay()

In [None]:
## cleanup (multi core)
OLGen.shutdown_multi_core()  # shutdown the ray cluster. cleanup the resources

## Checkpoint III
You should have generated a number of synthetic rotational overlays. Let's take a look at your data directory and one of the generated overlays.

In [None]:
## lets find your data
data_files = glob(f"{OLGen.data_path_out}/*.npy")
print(f"Found {len(data_files)} files.")

In [None]:
## let's look at one of the generated overlays
data = np.load(np.random.choice(data_files))

# apply stylesheet
pltStyler().enforce_stylesheet()

# plot overlays
fig, axs = plt.subplots(1, data.shape[0], figsize=(data.shape[0] * 5, 5))
for i in range(data.shape[0]):
    axs[i].imshow(data[i], cmap="magma")
    axs[i].set_title(f"Feature {i + 1}")
    axs[i].axis("off")

axs[-1].set_title("Overlay")
plt.tight_layout()

In [None]:
## cleanup
OLGen:Tekhne
OLGen = None