# Bigstream Tutorial
---

Bigstream is an extremely powerful and customizable tool. This tutorial will only show a small piece of what bigstream can do, but it's a reasonable place to start. This notebook will show small image in-memory registration and large image bigger-than-memory registration for a small example dataset.

Our example data is stored in .n5 format, which can be read with the zarr library. Zarr files themselve would also be perfectly fine. For the small image in-memory registration you simply need numpy arrays.

## Define Inputs
---

In [None]:
# modules for loading/writing data
import numpy as np
import zarr, tifffile

# file paths to tutorial data
# replace the capitalized text below with the path to your copy of the bigstream repository
fix_path = '/PATH/TO/BIGSTREAM/REPOSITORY/resources/fix.n5'
mov_path = '/PATH/TO/BIGSTREAM/REPOSITORY/resources/mov.n5'

# create Zarr file objects
fix_zarr = zarr.open(store=zarr.N5Store(fix_path), mode='r')
mov_zarr = zarr.open(store=zarr.N5Store(mov_path), mode='r')

# get pointers to the low res scale level
# still just pointers, no data loaded into memory yet
fix_lowres = fix_zarr['/lowres']
mov_lowres = mov_zarr['/lowres']

# we need the voxel spacings for the low res data sets
# we can compute them from the low res data set metadata
fix_meta = fix_lowres.attrs.asdict()
mov_meta = mov_lowres.attrs.asdict()
fix_lowres_spacing = np.array(fix_meta['pixelResolution']) * fix_meta['downsamplingFactors']
mov_lowres_spacing = np.array(mov_meta['pixelResolution']) * mov_meta['downsamplingFactors']
fix_lowres_spacing = fix_lowres_spacing[::-1]  # put in zyx order to be consistent with image data
mov_lowres_spacing = mov_lowres_spacing[::-1]

# read small image data into memory as numpy arrays
fix_lowres_data = fix_lowres[...]
mov_lowres_data = mov_lowres[...]

# sanity check: print the voxel spacings and lowres dataset shapes
print(fix_lowres_spacing, mov_lowres_spacing)
print(fix_lowres_data.shape, mov_lowres_data.shape)

# get pointers to the high res scale level
fix_highres = fix_zarr['/highres']
mov_highres = mov_zarr['/highres']

# we need the voxel spacings for the high res data sets
# we can compute them from the high res data set metadata
fix_meta = fix_highres.attrs.asdict()
mov_meta = mov_highres.attrs.asdict()
fix_highres_spacing = np.array(fix_meta['pixelResolution']) * fix_meta['downsamplingFactors']
mov_highres_spacing = np.array(mov_meta['pixelResolution']) * mov_meta['downsamplingFactors']
fix_highres_spacing = fix_highres_spacing[::-1]
mov_highres_spacing = mov_highres_spacing[::-1]

# sanity check: print the voxel spacings and lowres dataset shapes
print(fix_highres_spacing, mov_highres_spacing)
print(fix_highres.shape, mov_highres.shape)

# write data to view in fiji or similar
# here we write the highres data to disk, which requires loading it all into memory
# this is fine for tutorial data, but in practice you would not do this with your
# large data
tifffile.imsave('./fix_lowres_data.tiff', fix_lowres_data)
tifffile.imsave('./mov_lowres_data.tiff', mov_lowres_data)
tifffile.imsave('./fix_highres_data.tiff', fix_highres[...])
tifffile.imsave('./mov_highres_data.tiff', mov_highres[...])

## In memory affine registration of smaller images
---

Here, we will use the `alignment_pipeline` function from the `bigstream.align` module. This function allows you to construct complex alignment pipelines including various affine and deformable stages. We will construct a pipeline to do two steps: (1) a feature point and ransac based affine alignment that will find a good initial fit and (2) a gradient descent based affine alignment that will refine the first step to an even better fit.

In [None]:
# you should first read the docstrings for alignment_pipeline and apply_transform
# to understand more about what we're going to do
from bigstream.align import alignment_pipeline
from bigstream.transform import apply_transform

print("alignment_pipeline\n", alignment_pipeline.__doc__)
print('\n\n------------------------')
print("apply_transform\n", apply_transform.__doc__)

In [None]:
# in-memory alignment functions are in bigstream.align
# functions for resampling data after a transform are found are in bigstream.transform
from bigstream.align import alignment_pipeline
from bigstream.transform import apply_transform

# define arguments for the feature point and ransac stage (you'll understand these later)
ransac_kwargs = {'blob_sizes':[6, 20]}

# define arguments for the gradient descent stage (you'll understand these later)
affine_kwargs = {
    'shrink_factors':(2,),
    'smooth_sigmas':(2.5,),
    'optimizer_args':{
        'learningRate':0.25,
        'minStep':0.,
        'numberOfIterations':400,
    },
}

# define the alignment steps
steps = [('ransac', ransac_kwargs), ('affine', affine_kwargs)]

# execute the alignment
affine = alignment_pipeline(
    fix_lowres_data, mov_lowres_data,
    fix_lowres_spacing, mov_lowres_spacing,
    steps,
)

# resample the moving image data using the transform you found
aligned = apply_transform(
    fix_lowres_data, mov_lowres_data,
    fix_lowres_spacing, mov_lowres_spacing,
    transform_list=[affine,],
)

# write results
np.savetxt('./affine.mat', affine)
tifffile.imsave('./affine_lowres.tiff', aligned)

# load precomputed result (handy to use later if you've already run the cell)
affine = np.loadtxt('./affine.mat')

At this point you should pop open an image viewer and look at `fix_lowres_data.tif` and `affine_lowres.tif` to see what the alignment accomplished. Ideally, you should superimpose the aligned data onto the fixed data (e.g. in fiji make a multichannel image) to view the alignment accuracy through the whole volume.

Because this tutorial data is pretty small, even these global steps are enough to align the data almost perfectly. In a bit we'll proceed with an unnecessary but instructive blockwise deformable alignment just to teach how it works.

## More reading - Get to know the API for fundamental alignment steps
---

In [None]:
# Here are all the things you can chain together with alignment_pipeline
# you should read the docstrings for these functions to learn more about how to build alignments
from bigstream.align import (
    feature_point_ransac_affine_align,
    random_affine_search,
    affine_align,
    deformable_align
)
from bigstream.configure_irm import configure_irm

# these are the alignment functions
print("feature_point_ransac_affine\n", feature_point_ransac_affine_align.__doc__)
print('\n\n------------------------')
print("random_affine_search\n", random_affine_search.__doc__)
print('\n\n------------------------')
print("affine_align\n", affine_align.__doc__)
print('\n\n------------------------')
print("deformable_align\n", deformable_align.__doc__)
print('\n\n------------------------')

# also - affine_align and deformable_align both pass additional keyword arguments to this configure_irm function
# configure_irm really is the heart of bigstream and to become an expert user you should eventually become familiar with
# the SimpleITK ImageRegistrationMethod API. But you can save all that for later.
print("configure_irm", configure_irm.__doc__)

## Out of memory affine and deformable registration of big images
---

Here we will use the `distributed_piecewise_alignment_pipeline` function from `bigstream.piecewise_align`. This function breaks the images up into blocks and runs the same `alignment_pipeline` function we used before on each of those blocks separately. The results are smoothly stitched back together into a single transform of the entire (large) moving image. In this case, we will again construct an alignment of two steps: (1) a feature point and ransac alignment to find a good initial fit and (2) a deformable alignment that should tighten up even the smallest edges into correspondence.

Each block is treated as a separate job and jobs are run as parallel as possible up to the limit of the number of compute workers you have. If you are on a workstation, compute workers will be cpu cores. If you are on a cluster you have more power in determining what a compute worker will be.

In [None]:
# out-of-memory alignment functions are in bigstream.piecewise_align
# functions for resampling large images with large transforms are in bigstream.piecewise_transform
from bigstream.piecewise_align import distributed_piecewise_alignment_pipeline
from bigstream.piecewise_transform import distributed_apply_transform

# define arguments for the feature point and ransac stage (if you read the feature_point_ransac_affine_align
# docstring you shoud understand your options here now)
ransac_kwargs = {'blob_sizes':[6, 20], 'nspots':1000}

# define arguments for the deformable stage (if you read the deformable_align docstring and the configure_irm
# docstring you should understand your options here now)
deform_kwargs = {
    'smooth_sigmas':(0.25,),
    'control_point_spacing':128.0,
    'control_point_levels':(1,),
    'optimizer_args':{
        'learningRate':0.25,
        'minStep':0.,
        'numberOfIterations':10,
    },
}

# define the alignment steps
steps = [('ransac', ransac_kwargs), ('deform', deform_kwargs)]

# define the blocksize for breaking up the image domain, this is in voxels
blocksize = [128,]*3

# configure options for your compute environment, cluster or local workstation
## Distributed computing in bigstream is handled by dask and helped along by another package I wrote
## called ClusterWrap. ClusterWrap will use a local_cluster object by default, which will use the
## resources of your local machine for distributed computing. If you want to distribute your job
## on a cluster, you will need to learn about ClusterWrap and probably dask-jobqueue as well.
cluster_kwargs={}

# Execute alignment, note we are providing the global alignment found with the lowres data as
# an initialization. Also note, the deform will be stored as a zarr array on disk.
deform = distributed_piecewise_alignment_pipeline(
    fix_highres, mov_highres,
    fix_highres_spacing, mov_highres_spacing,
    steps,
    blocksize=blocksize,
    write_path='./deform.zarr',
    static_transform_list=[affine,],
    cluster_kwargs=cluster_kwargs,
)

# apply the two transforms we found (global affine from lowres data and local deform from highres data)
# the resampled data is now located in a zarr array on disk
aligned = distributed_apply_transform(
    fix_highres, mov_highres,
    fix_highres_spacing, mov_highres_spacing,
    transform_list=[affine, deform],
    blocksize=blocksize,
    write_path='./deformed.zarr',
    cluster_kwargs=cluster_kwargs,
)

# We'll write the deformed data into a format that most viewers can read
# you wouldn't normally be able to do this for huge images, you would either look at smaller crops
# or use something like bigdataviewer
tifffile.imsave('./deformed.tiff', aligned[...])

At this point you should pop open an image viewer and look at `fix_highres_data.tiff` and `deformed.tiff` to see what the alignment accomplished. Ideally, you should superimpose the aligned data onto the fixed data (e.g. in fiji make a multichannel image) to view the alignment accuracy through the whole volume.

For a real out-of-memory application, images could be 100x larger than the example images here. In such a case you will see a significant different between the quality of the global low resolution alignment and the local out-of-memory high resolution alignment. The strategy shown in this tutorial is still the right one however, you should align your data in stages from coarse to fine - global alignments at lower resolution followed by local alignments at higher resolution.

## Next Steps
---

Where to go from here? I suggest trying to adapt a copy of this notebook to data of your own. Get comfortable with the image formats used here (most importantly zarr). Get more knowledgeable about the `bigstream.align`, `bigstream.transform`, `bigstream.piecewise_align`, and `bigstream.piecewise_transform` APIs. Look into the `bigstream.configure_irm` API and the `SimpleITK.ImageRegistrationMethod` API. These are the essential ingredients for doing big data registrations.

Another important step is getting to know your compute environment. If you are computing on a local workstation, you should know how to parameterize dask to create workers on your machine. Similarly, if you're on a cluster you should know how to parameterize dask to create workers in that environment. All of these things can be specified to bigstream through the `cluster_kwargs` argument shown in the cell above.