# U-Net Quickstart

## 1. Introduction

[U-Net](https://www.nature.com/articles/s41592-018-0261-2) is a convolutional neural network for semantic segmentation of images. This implementation of U-Net, optimized for binary segmentation of biological microscopy images and movies, gives users a high-level interface to (i) augment training data, (ii) train with various loss functions and (iii) predict large tif-files.   

If you need help using a function, you can always try running `help(whichever_interesting_function)` or just look at the source code. If you need help using a class (one that is directly under the `biu.unet` director), trying to understand the examples in this notebook probably will be more helpful than finding the documentation of that function.

IMPORTANT: Two packages that depend on your hardware need to be installed manually before running bio-image-unet. Convolutional neural networks run much faster on Nvidia-GPUs than on CPUs. To enable training and prediction on GPUs, users need to install [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) and the corresponding version of PyTorch from [the official PyTorch website](https://pytorch.org/get-started/locally/). Select the correct distribution of CUDA on this webpage and run the command in your terminal. bio-image-unet doesn't depend on a specific version of CUDA and has been tested with PyTorch 1.7.0+.

To install `bio-image-unet` from [PyPI](https://pypi.org/project/bio-image-unet/), execute  `pip install bio-image-unet` in your terminal. Finally, to import the U-Net package, write `import biu.unet as unet`.

## 2. Training data generation and augmentation

The `DataProcess` class creates a PyTorch `Dataset` object. Paths of directories with training data and labels as well as parameters for data processing and data augmentation are all specified upon initialization function, see below.

Training images and labels need to have the following folder structure (images and labels with exactly identical names in separate folders):

```
path/to/training/data/
|
├── image
│   ├── 1.tif
│   ├── 2.tif
│   ├── 3.tif
│   ├── 4.tif
│   ├── image3.tif
│   ├── whatever_name42.tif
│   ├── 5.tif
│   ├── ...
│   ├── ...
└── label
    ├── 1.tif
    ├── 2.tif
    ├── 3.tif
    ├── 4.tif
    ├── image3.tif
    ├── whatever_name42.tif
    ├── 5.tif
    ├── ...
    ├── ...
```

The `DataProcess` class creates a data set when initialized. It takes the following parameters as arguments:

```
Create training data object for network training

1) Create folder structure for training data
2) Move and preprocess training images
3) Split input images into tiles
4) Augment training data
5) Create object of PyTorch Dataset class for training

Parameters
----------
source_dir : Tuple[str, str]
    Path of training data [images, labels]. Images need to be tif files.
dim_out : Tuple[int, int]
    Resize dimensions of images for training
aug_factor : int
    Factor of image augmentation
data_path : str
    Base path of directories for training data
dilate_mask
    Radius of binary dilation of masks [-2, -1, 0, 1, 2]
dilate_kernel : str
    Dilation kernel ('disk' or 'square')
val_split : float
    Validation split for training
invert : bool
    If True, greyscale binary labels is inverted
skeletonize : bool
    If True, binary labels are skeletonized
create : bool, optional
    If False, existing data set in data_path is used
clip_threshold : Tuple[float, float]
    Clip thresholds for intensity normalization of images
shiftscalerotate : [float, float, float]
    Shift, scale and rotate image during augmentation
noise_amp : float
    Amplitude of Gaussian noise for image augmentation
brightness_contrast : Tuple[float, float]
    Adapt brightness and contrast of images during augmentation
rescale : float, optional
    Rescale all images and labels by factor rescale
```

In [4]:
# import bio-image-unet package
import biu.unet as unet
import os

In [6]:
# path to training data
dir_images = 'E:/path/to/images/'
dir_masks = 'E:/path/to/labels'

# path to directory for training data generation (is created automatically, drive should have enough storage)
data_path = './data/'

# generation of training data set
dataset = unet.DataProcess([dir_images, dir_masks], data_path=data_path, create=True, dilate_mask=2, skeletonize=False, 
                            noise_amp=10, brightness_contrast=(0.15, 0.15), aug_factor=10, invert=True, clip_threshold=(0., 99.8), 
                            dim_out=(256, 256), shiftscalerotate=(0, 0, 0), rescale=None)


0 files found
Number of training images: 0


## Training

The `Trainer` class is for training U-Net. When initialized, following training parameters are specified (see below). When the object is successfully created, the training is started with `trainer.start()`. 

```
Class for training of neural network. Creates trainer object, training is started with .start().

Parameters
----------
dataset
    Training data, object of PyTorch Dataset class
num_epochs : int
    Number of training epochs
network
    Network class (Default Unet)
batch_size : int
    Batch size for training
lr : float
    Learning rate
n_filter : int
    Number of convolutional filters in first layer
val_split : float
    Validation split
save_dir : str
    Path of directory to save trained networks
save_name : str
    Base name for saving trained networks
save_iter : bool
    If True, network state is save after each epoch
load_weights : str, optional
    If not None, network state is loaded before training
loss : str
    Loss function ('BCEDice', 'Tversky' or 'logcoshTversky')
loss_params : Tuple[float, float]
    Parameter of loss function, depends on chosen loss function
```

In [None]:
# choose model
model = unet.Unet
# create trainer
trainer = unet.Trainer(dataset, num_epochs=100, network=model, batch_size=10, lr=0.0001, n_filter=32, val_split=0.2, 
             save_dir='./', save_name='model.pt', save_iter=False, load_weights=False, loss_function='BCEDice', loss_params=(0.5, 0.5))

# test data
test_data_path =  'E:/path/of/test/data/'
result_path = 'E:/path/of/test/data/results/'
os.makedirs(result_path, exist_ok=True)  # create result_path
             
# start training
trainer.start(test_data_path=test_data_path, result_path=result_path)

## Prediction of data

The `Prediction` class predicts single images and movies with U-Net. Prediction is started upon initialization. The initialization method has following arguments: 
````
Prediction of tif files with standard 2D U-Net

1) Loading file and preprocess (normalization)
2) Resizing of images into patches with resize_dim
3) Prediction with U-Net
4) Stitching of predicted patches and averaging of overlapping regions

Parameters
----------
imgs : ndarray/str
    numpy array of images or path of tif file
result_name : str
    path for result
model_params : str
    path of u-net parameters (.pt file)
network
    Network class (Default: U-Net)
resize_dim
    Image dimensions for resizing for prediction
invert : bool
    Invert greyscale of image(s) before prediction
clip_threshold : Tuple[float, float]
    Clip threshold for image intensity before prediction
add_tile : int, optional
    Add additional tiles for splitting large images to increase overlap
normalize_result : bool
    If true, results are normalized to [0, 255]
progress_notifier:
    Wrapper to show tqdm progress notifier in gui
````

In [None]:
# predict tif file
tif_file = '/path/of/tif/file.tif'
result_name = '/path/of/result/tif/file.tif'
model_params = '/path/of/U-Net/model/params.pt'
prediction = unet.Predict(tif_file, result_name, network=unet.Unet, model_params=model_params, invert=False, 
                          resize_dim=(1024, 1024), clip_threshold=(0., 99.8))