# Alignment with bigstream
---

BigStream is a library of tools for image registration of huge images. It uses big data tools like Zarr and DASK to enable working with too-large-for-memory datasets and to make costly alignments finish in practical amounts of time by distributing the work on your workstation or cluster.

This tutorial will walk you through the following steps:

    1. Reading image data and metadata using Zarr
    2. Global affine alignment
    3. Local affine alignments
    4. Local deformable alignments

## Preliminary
---

Make sure BigStream is installed: `pip install bigstream`

You should also get the source code, which is located here: https://github.com/GFleishman/bigstream. \
Follow the instrucions on github to clone the repository, which contains the example data used for this tutorial.

## Tutorial data
---

In the BigStream repository the `resources` folder contains two images in N5 format.\
We will first access the data and metadata in these files using Zarr, which was installed with BigStream.

In [None]:
# import library for reading
import zarr

# file paths to tutorial data N5 files
fix_path = '/groups/scicompsoft/home/fleishmang/source/bigstream/resources/fix.n5'
mov_path = '/groups/scicompsoft/home/fleishmang/source/bigstream/resources/mov.n5'

# create Zarr file object using N5Stores
fix_zarr = zarr.open(store=zarr.N5Store(fix_path), mode='r')
mov_zarr = zarr.open(store=zarr.N5Store(mov_path), mode='r')

`fix_zarr` and `mov_zarr` are just lazy pointers to the N5 files; no image data has been loaded into memory yet.\
The first alignment step, global affine, only needs low resolution data; which we assume fits into available memory.

In [None]:
# we'll need numpy now
import numpy as np

# get pointers to the low res scale level
# still just pointers, no data loaded into memory yet
fix_lowres = fix_zarr['/lowres']
mov_lowres = mov_zarr['/lowres']

# we need the voxel spacings for the low res data sets
# we can compute them from the low res data set metadata
fix_meta = fix_lowres.attrs.asdict()
mov_meta = mov_lowres.attrs.asdict()
fix_lowres_spacing = np.array(fix_meta['pixelResolution']) * fix_meta['downsamplingFactors']
mov_lowres_spacing = np.array(mov_meta['pixelResolution']) * mov_meta['downsamplingFactors']

# read data into memory as numpy arrays
# Why transpose? zarr reads data as zyx, but we prefer xyz (metadata is already xyz)
fix_lowres_data = fix_lowres[...].transpose(2, 1, 0)
mov_lowres_data = mov_lowres[...].transpose(2, 1, 0)

# sanity check: print the voxel spacings and lowres dataset shapes
print(fix_lowres_spacing, mov_lowres_spacing)
print(fix_lowres_data.shape, mov_lowres_data.shape)

## Global affine alignment
---

The affine alignment algorithm is composed of these steps:

    1. Key point extraction from fixed and moving images
    2. Correspondence matching between key points in the fixed and moving point sets using neighborhood correlation
    3. Affine alignment using a RANSAC filter on the point correspondences

But this is all accomplished with one function call. The following cell will take some time to run:

In [None]:
# affine alignment functions are in bigstream.affine
from bigstream import affine

# see below for explanation of parameters
global_affine = affine.ransac_affine(
    fix_lowres_data, mov_lowres_data,
    fix_lowres_spacing, mov_lowres_spacing,
    min_radius=6, max_radius=20, match_threshold=0.75,
)

# sanity check: print the result
print(global_affine)

# For tutorial data, should be approximately:
# [[ 9.91070543e-01  3.40205967e-02 -5.63125159e-03 -8.12590407e+01]
#  [-3.86715664e-02  1.02023436e+00 -6.30453307e-03 -1.33468687e+01]
#  [ 1.19321249e-02 -2.21371673e-02  9.67441910e-01 -2.42373194e-01]]

### Parameters and output

    min_radius: radius in voxels of smallest expected blob/cell size
    
    max_radius: radius in voxels of largest expected blob/cell size
    
    match_threshold: neighborhood correlation between two key points must exceed this value for it to be a valid match
    
Other optional parameters are:
    
    cc_radius: key points are matched using correlation of the data in their neighborhoods, this is the neighborhood radius in voxels (default: 12)
    
    nspots: the maximum number of key point pairs to use to compute the affine alignment (default: 5000)
    
    align_threshold: points are considered aligned by the affine if they are less than this value apart, in micrometers (default: 2.0)
    
    num_sigma_max: the maximum number of filters to run in the blob detection (default: 10)
    
The return value is:
    
    global_affine: the return value is a 3x4 affine transform matrix as a numpy array; this describes correspondence between points in the fixed image and moving image.

### Applying global affine

The alignment only gave us the affine transform matrix. Here we apply it to the moving image:

In [None]:
# functions for applying transforms are in bigstream.transform
from bigstream import transform

# apply the global affine to the moving image
mov_lowres_aligned = transform.apply_global_affine(
    fix_lowres_data, mov_lowres_data,
    fix_lowres_spacing, mov_lowres_spacing,
    global_affine,
)

Now we can compare the fixed, moving, and aligned data. Try running this cell a few times with different values for `slc`.

In [None]:
# we'll visualize the results with some image plots
from matplotlib import pyplot as plt

# plot some image slices to check on things
slc = 70
f_slc = fix_lowres_data[..., slc]
a_slc = mov_lowres_aligned[..., slc]
m_slc = mov_lowres_data[..., slc]

# normalize for display
f_slc = f_slc.astype(np.float32) / f_slc.max()
a_slc = a_slc.astype(np.float32) / a_slc.max()
m_slc = m_slc.astype(np.float32) / m_slc.max()

# make RGB versions
f_rgb = np.zeros(f_slc.shape + (3,))
f_rgb[..., 0] = f_slc * 2
a_rgb = np.zeros(a_slc.shape + (3,))
a_rgb[..., 0] = f_slc * 2
a_rgb[..., 1] = a_slc * 2
m_rgb = np.zeros(m_slc.shape + (3,))
m_rgb[..., 1] = m_slc * 2

# create figure and subplots
fig = plt.figure(figsize=(12,24))
fig.add_subplot(1, 3, 1)
plt.imshow(f_rgb)
fig.add_subplot(1, 3, 2)
plt.imshow(f_rgb)
plt.imshow(a_rgb)
fig.add_subplot(1, 3, 3)
plt.imshow(m_rgb)
plt.show()

## Local affine alignments
---

The tutorial dataset is small, so the global affine lines up almost all of the data well. But with a real dataset we would likely need to refine the global alignment with local alignments. In local affine alignment, the images are carved into overlapping tiles and a separate affine is computed for each tile. For large data sets there can be many tiles. To make this process tractable a cluster is constructed using [ClusterWrap](https://github.com/GFleishman/ClusterWrap) and DASK. The local affines are all computed in parallel on their own workers.


The `tiled_ransac_affine` has a few new arguments related to the tiling:

    blocksize: iterable, length equal to the image dimension. The size of tiles in voxels.
    
    cluster_kwargs: extra arguments to the cluster constructor. See ClusterWrap for details.

All the optional arguments from `ransac_affine` are available here as well.

In [None]:
# Note use of mov_lowres_aligned as moving image rather than mov_lowres_data
# Note also that fix_lowres_spacing is used as the "moving" voxel spacing here
local_affines = affine.tiled_ransac_affine(
    fix_lowres_data, mov_lowres_aligned,
    fix_lowres_spacing, fix_lowres_spacing,
    min_radius=6, max_radius=20, match_threshold=0.75,
    blocksize=[128,]*3,
)

# sanity check, print shape of local_affines
# the first three dimensions index over the tiles
# the last two dimensions are the 3x4 affine matrices for each tile
print(local_affines.shape)

### Applying local affines

This function is conceptually similar to `apply_global_affine` used previously, but similar to `tiled_ransac_affine` this function distributes the computation over tiles. You need to provide *the same* blocksize used for `tiled_ransac_affine`. Note also that you should include the `global_affine` to ensure both transforms are applied.

There are 3 options for the output of this function. Which of the three options you get is controlled through 2 new parameters:

    write_path: Default is None. But if a file path is provided, the aligned data will be written as a zarr file to disk and a reference to that zarr.Array object is returned. This is useful when your data is too large to fit into memory.
    
    lazy: True or False. If True, the resampling computation is not performed, only a reference to the lazy dask.Array object is returned. If False, the computation is executed but rather than writing to disk (the case if a write_path is given), the result is returned. You must be sure your data fits into RAM to choose this option.

Since we're working with the lowres data - we omit `write_path` and set `lazy=False`, but if we were applying local affines to high resolution data, or a very large dataset in general, we would use `write_path=/path/to/output.zarr`

In [None]:
# apply the local affines to the moving image
#   Note we're using mov_lowres_data again - it's better
#   to provide the global and local affines together. They
#   are composed into a single transform - that way the moving
#   image is only resampled one time.
mov_lowres_aligned = transform.apply_local_affines(
    fix_lowres_data, mov_lowres_data,
    fix_lowres_spacing, mov_lowres_spacing,
    local_affines,
    blocksize=[128,]*3,
    global_affine=global_affine,
    lazy=False,
)

Similar to before, we can inspect the alignment:

In [None]:
# plot some image slices to check on things
slc = 70
f_slc = fix_lowres_data[..., slc]
a_slc = mov_lowres_aligned[..., slc]
m_slc = mov_lowres_data[..., slc]

# normalize for display
f_slc = f_slc.astype(np.float32) / f_slc.max()
a_slc = a_slc.astype(np.float32) / a_slc.max()
m_slc = m_slc.astype(np.float32) / m_slc.max()

# make RGB versions
f_rgb = np.zeros(f_slc.shape + (3,))
f_rgb[..., 0] = f_slc * 2
a_rgb = np.zeros(a_slc.shape + (3,))
a_rgb[..., 0] = f_slc * 2
a_rgb[..., 1] = a_slc * 2
m_rgb = np.zeros(m_slc.shape + (3,))
m_rgb[..., 1] = m_slc * 2

# create figure and subplots
fig = plt.figure(figsize=(12,24))
fig.add_subplot(1, 3, 1)
plt.imshow(f_rgb)
fig.add_subplot(1, 3, 2)
plt.imshow(f_rgb)
plt.imshow(a_rgb)
fig.add_subplot(1, 3, 3)
plt.imshow(m_rgb)
plt.show()

## Local deformable alignments
---

Deformable alignment is the final step, where every voxel is assigned its own displacement vector. Deformable alignment can take advantage of high resolution features, so we'll be using larger arrays of higher resolution data. Here we assume the data is too large for memory on a single machine and the computation would be intractably slow using a single machine. This section shows how BigStream's use of distributed computing enables you to align your data despite of these difficulties.

Compare the cell below to the second cell in the _Tutorial Data_ section. Note that here we do not read the data into memory but proceed with the lazy zarr.Array objects.

In [None]:
# get pointers to the high res scale level
# still just pointers, no data loaded into memory
fix_highres = fix_zarr['/highres']
mov_highres = mov_zarr['/highres']

# we need the voxel spacings for the high res data sets
# we can compute them from the high res data set metadata
fix_meta = fix_highres.attrs.asdict()
mov_meta = mov_highres.attrs.asdict()
fix_highres_spacing = np.array(fix_meta['pixelResolution']) * fix_meta['downsamplingFactors']
mov_highres_spacing = np.array(mov_meta['pixelResolution']) * mov_meta['downsamplingFactors']

# sanity check: print the voxel spacings and lowres dataset shapes
print(fix_highres_spacing, mov_highres_spacing)
print(fix_highres.shape, mov_highres.shape)

Note the voxel spacings are in xyz order but the lazy array shapes are still zyx.


Before we can deform, we need a highres version of the moving data with the global and local affines applied. Compare the cell below to the first cell in the _Apply local affines_ subsection. First note we provide the highres lazy zarr.Arrays as inputs. We happen to know they are 2x larger along each axis - so importantly the `blocksize` parameter has been doubled. We are also assuming the aligned data will not fit into memory, so here we use the `write_path` parameter discussed previously. Finally, there is a new parameter:

    transpose: List of 3 boolean values. Should we transpose the axis order of the fixed image, moving image, or transform data.

In this case, since the fixed and moving image data are being read from zarr files, they must be transposed - but because the transform is being constructed from the global and local affine matrices, it does not need to be transposed.

In [None]:
# this time, we'll write the aligned data to disk
write_path = './mov_highres_affine_aligned.zarr'

# apply the affines to the highres moving image
mov_highres_aligned = transform.apply_local_affines(
    fix_highres, mov_highres,
    fix_highres_spacing, mov_highres_spacing,
    local_affines,
    blocksize=[256,]*3,
    global_affine=global_affine,
    write_path=write_path,
    transpose=[True, True, False],
)

Ensure the application was successful:

In [None]:
# plot some image slices to check on things
#   Note for highres data we doubled the slice number
#   The fix/mov images still need to be transposed,
#   But mov_highres_aligned, which we made in the previous step
#   was written out in xyz order
slc = 140
f_slc = fix_highres[slc, ...].transpose(1,0)
a_slc = mov_highres_aligned[..., slc]
m_slc = mov_highres[slc, ...].transpose(1,0)

# normalize for display
f_slc = f_slc.astype(np.float32) / f_slc.max()
a_slc = a_slc.astype(np.float32) / a_slc.max()
m_slc = m_slc.astype(np.float32) / m_slc.max()

# make RGB versions
f_rgb = np.zeros(f_slc.shape + (3,))
f_rgb[..., 0] = f_slc * 2
a_rgb = np.zeros(a_slc.shape + (3,))
a_rgb[..., 0] = f_slc * 2
a_rgb[..., 1] = a_slc * 2
m_rgb = np.zeros(m_slc.shape + (3,))
m_rgb[..., 1] = m_slc * 2

# create figure and subplots
fig = plt.figure(figsize=(12,24))
fig.add_subplot(1, 3, 1)
plt.imshow(f_rgb)
fig.add_subplot(1, 3, 2)
plt.imshow(f_rgb)
plt.imshow(a_rgb)
fig.add_subplot(1, 3, 3)
plt.imshow(m_rgb)
plt.show()

Now we can execute the deformable alignment on the highres data sets:

In [None]:
# deform functions are in bistream.deform
from bigstream import deform

# a write location for the final transform
write_path = './full_highres_transform.zarr'

full_field = deform.tiled_deformable_align(
    fix_highres, mov_highres_aligned,
    fix_highres_spacing, fix_highres_spacing,
    blocksize=[256,]*3,
    transpose=[True, False],
    global_affine=global_affine, local_affines=local_affines,
    write_path=write_path,
    deform_kwargs={'iterations':[1, 1]},
)

### Applying full transform

In [None]:
# a write path for the final result
write_path = './mov_highres_deform_aligned.zarr'

# apply the local affines to the moving image
mov_highres_aligned = transform.apply_position_field(
    fix_highres, mov_highres,
    fix_highres_spacing, mov_highres_spacing,
    full_field,
    blocksize=[256,]*3,
    transpose=[True, True, False],
    write_path=write_path,
)

In [None]:
# plot some image slices to check on things
slc = 140
f_slc = fix_highres[slc, ...].transpose(1,0)
a_slc = mov_highres_aligned[..., slc]
m_slc = mov_highres[slc, ...].transpose(1,0)

# normalize for display
f_slc = f_slc.astype(np.float32) / f_slc.max()
a_slc = a_slc.astype(np.float32) / a_slc.max()
m_slc = m_slc.astype(np.float32) / m_slc.max()

# make RGB versions
f_rgb = np.zeros(f_slc.shape + (3,))
f_rgb[..., 0] = f_slc * 2
a_rgb = np.zeros(a_slc.shape + (3,))
a_rgb[..., 0] = f_slc * 2
a_rgb[..., 1] = a_slc * 2
m_rgb = np.zeros(m_slc.shape + (3,))
m_rgb[..., 1] = m_slc * 2

# create figure and subplots
fig = plt.figure(figsize=(12,24))
fig.add_subplot(1, 3, 1)
plt.imshow(f_rgb)
fig.add_subplot(1, 3, 2)
plt.imshow(f_rgb)
plt.imshow(a_rgb)
fig.add_subplot(1, 3, 3)
plt.imshow(m_rgb)
plt.show()