<hr style="height:2px;">

# Demo: Training data generation for denoising of *Tribolium castaneum*

This notebook demonstrates training data generation for 3D denoising task, where corresponding pairs of low and high quality stacks are can be acquired. 

These pairs should be registered, which is best achieved by acquiring both stacks _interleaved_, i.e. as different channels hat correspond to the different exposure/laser settings. 

We will use a single pair of Tribolium stacks for demonstration, whereas in your application you should aim to acquire at least 10-50 stacks from different developmental timepoins to ensure a well trained model. 


More Documentation is available at http://csbdeep.bioimagecomputing.com/doc/

In [None]:
from __future__ import print_function, unicode_literals, absolute_import, division
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
from IPython.core.display import display, HTML
display(HTML("<style>.rendered_html { font-size: 16px; }</style>"))
import os
from tifffile import imread

from csbdeep.utils import plot_some
from csbdeep.utils import download_and_extract_zip_file
from csbdeep.data import RawData, create_patches

<hr style="height:2px;">

# Download example data

First we download some example training data, consisting of low-SNR and high-SNR 3D images of Tribolium.

We only download a single stack for demonstration, but you should have ~20-50 different stacks at different developmental timepoints for your own application.



In [None]:
download_and_extract_zip_file(
    url = 'https://cloud.mpi-cbg.de/index.php/s/jKHFIS4isNwagMd/download',
    provides = ('raw_data/tribolium/%s/nGFP_0.1_0.2_0.5_20_13_late.tif'%d for d in ('GT','low'))
)

The low/high images should now reside in your working directory like this (GT stands for 'Ground-truth' and represents high-SNR stacks):

    raw_data/tribolium
    ├── GT
    │   └── nGFP_0.1_0.2_0.5_20_13_late.tif
    └── low
        └── nGFP_0.1_0.2_0.5_20_13_late.tif

We can plot both via maximum-projection:

In [None]:
y = imread('raw_data/tribolium/GT/nGFP_0.1_0.2_0.5_20_13_late.tif')
x = imread('raw_data/tribolium/low/nGFP_0.1_0.2_0.5_20_13_late.tif')
print('image size =', x.shape)

plt.figure(figsize=(15,10))
plot_some(np.stack([x,y]),
          title_list=[['low (maximum projection)','GT (maximum projection)']], 
          pmin=2,pmax=99.8);

<hr style="height:2px;">

# Training Patch generation

CARE expects to be given the training data images as a `RawData` object, which defines how to get the pairs of high/low stacks and the semantics of each axis (e.g. which one is considered a color channel).

In our case we have two folders "low" and "GT" where corresponding low and high-SNR stacks have identical filenames, so we can simply use `RawData.from_folder` and set axes to `"ZYX"` (as we dont have a multi-colored image). 


In [None]:
raw_data = RawData.from_folder(
    basepath    = 'raw_data/tribolium',
    source_dirs = ['low'],
    target_dir  = 'GT',
    axes        = 'ZYX',
)

From corresponding stacks, we now generate some 3D patches of size (16,64,64). Typically, you should use more patches, the more trainings stacks you have. By default patches are sampled from non-background regions (i.e. that are above a relative threshold), see the documenation of `create_patches` for details.

In [None]:
X, Y, XY_axes = create_patches (
    raw_data            = raw_data,
    patch_size          = (16,64,64),
    n_patches_per_image = 1024,
    save_file           = 'my_training_data.npz',
)

In [None]:
assert X.shape == Y.shape
print("shape of X,Y =", X.shape)
print("axes  of X,Y =", XY_axes)

## Show

This shows the maximum projection of some of those patch pairs (even rows: *input*, odd rows: *target*)

In [None]:
for i in range(2):
    plt.figure(figsize=(16,4))
    sl = slice(8*i, 8*(i+1)), 0
    plot_some(X[sl],Y[sl],title_list=[np.arange(sl[0].start,sl[0].stop)])
    plt.show()
None;