# Uploading Image Stacks

In this notebook, we will upload an image stack to BossDB as a dataset.

## Requirements

You will need to have a directory full of images to upload, and they should:
* Sort by their names, alphabetically (i.e., `0001.png`, `0002.png`, etc.)
* The images should all have the same dimensions in the horizontal and vertical directions
* There should be no missing tiles.

If you are creating a new BossDB channel, you will also need to have `resource-manager` permissions on BossDB. If you don't have these permissions (perhaps you just created an account?), please email ingests@bossdb.org with information about your dataset. We will add the correct permissions, and we are also happy to help you run your ingest!

In [None]:
# Uncomment this cell to install requirements:
# !pip3 install pillow tqdm numpy

# Get the latest version of intern:
# !pip3 install git+https://github.com/jhuapl-boss/intern.git

In [None]:
from pathlib import Path
from intern import array
from PIL import Image
from tqdm.auto import tqdm
import numpy as np

## Parameters

You can update the cell below to match your ingest preferences:

In [None]:
# The path to where the images live. Don't forget the extension! You can also
# omit an extension if EVERY file in the directory will be an image for upload.
PATH_TO_IMAGES = Path("/path/to/images/dataset1/")
IMAGE_EXTENSION = ".tiff"

# How many z-slices to upload at once.
# For best performance, this number should be a multiple of 16. If you have a
# lot of RAM, you can increase this number to, say, 64. If you start getting
# out-of-memory errors, you can decrease this number.
# Try to avoid reducing it below 16, but if you must, you will get the best 
# performance with 8 or 4.
UPLOAD_INCREMENT = 16

# Channel name.
# This is entirely up to you; you should put all research for a single paper
# in the same collection, and all channels that should be co-registered should
# live in the same experiment. You should use one channel for each z-stack. For
# more information, see [this video](https://youtu.be/gbbfWDThELU?t=81)
BOSSDB_URI = "bossdb://matelsky2022/notebook_tutorial/em"


# In ZYX order, the size of a voxel:
VOXEL_SIZE = (40, 4, 4)

# The size units of a voxel:
VOXEL_UNITS = "nanometers"

# The number of times to retry a chunk upload before failing. Set this higher
# (perhaps ~10) if you are leaving the job unattended for a long time.
RETRY_MAX = 1

# Data-type of the image data.
DTYPE = "uint8" # "uint8" or "uint64"

# The source channel, if there is one. If there are multiple segmentation 
# channels and one imagery channel, then you can use this to specify the image
# channel name. Otherwise, specify None:
SOURCE_CHANNEL = None # "image" perhaps

## Run the ingest.

If you are just running a regular ingest, you don't have to edit anything below
this cell; just run the rest of the notebook.

In [None]:
# Calculate the number of Z slices in the stack:
image_paths = sorted(PATH_TO_IMAGES.glob("*" + IMAGE_EXTENSION))
num_z_slices = len(image_paths)

assert len(image_paths) > 0, f"No images found in the specified directory {PATH_TO_IMAGES}/*{IMAGE_EXTENSION}."

# Get the size of the first image, which we will assume is the same size as all 
# other images:
image = Image.open(image_paths[0])
image_size = image.size

shape_zyx = (num_z_slices, image_size[1], image_size[0])

print(f"Found {num_z_slices} z-slices in the stack, for a dataset shape of {shape_zyx}.")

In [None]:
if SOURCE_CHANNEL:
    boss_dataset = array(
        BOSSDB_URI,
        extents=shape_zyx,
        dtype=DTYPE,
        voxel_size=VOXEL_SIZE,
        voxel_unit=VOXEL_UNITS,
        create_new=True,
        source_channel=SOURCE_CHANNEL,
    )
else:
    boss_dataset = array(
        BOSSDB_URI,
        extents=shape_zyx,
        dtype=DTYPE,
        voxel_size=VOXEL_SIZE,
        voxel_unit=VOXEL_UNITS,
        create_new=True,
    )

In [None]:
# Iterate in groups of UPLOAD_INCREMENT. If there are image tiles left over at
# the end, they will be uploaded separately.
for i in tqdm(range(0, shape_zyx[0], UPLOAD_INCREMENT)):
    if i + UPLOAD_INCREMENT > shape_zyx[0]:
        # We're at the end of the stack, so upload the remaining images.
        images = [Image.open(path) for path in image_paths[i : shape_zyx[0]]]
    else:
        images = [Image.open(path) for path in image_paths[i : i + UPLOAD_INCREMENT]]
    stacked = np.stack([np.array(image, dtype=DTYPE) for image in images], axis=0)

    retry_count = 0
    while True:
        try:
            boss_dataset[
                i : i + stacked.shape[0], 0 : stacked.shape[1], 0 : stacked.shape[2]
            ] = stacked
            break
        except Exception as e:
            print(f"Error uploading chunk {i}-{i + stacked.shape[0]}: {e}")
            retry_count += 1
            if retry_count > RETRY_MAX:
                raise e
            print("Retrying...")
            continue

In [None]:
# Print the visualization link to view the data in Neuroglancer:
print(boss_dataset.visualize)