# Fetch SmartSPIM Data From S3

In this notebook we demonstrate how to fetch SmartSPIM data from the "aind-open-data" AWS S3 bucket outside of CodeOcean. The notebook is intended as a companion to the "RegisterToCCF" notebook which assumes that data is available via the mounted filesystem in the CodeOcean cloud platform. Note that metadata on the output image should be set carefully, refer to the "RegisterToCCF" notebook on metadata considerations for SmartSPIM input images.

SmartSPIM sample data is made publicly available by the Allen Institute for Neural Dynamics.

## See Also

CodeOcean capsule: https://codeocean.allenneuraldynamics.org/capsule/5051231/tree/v4

# Initialize the Notebook

In [1]:
import json
import os
import re
import sys

import itk
import ome_zarr
from ome_zarr.reader import Reader as OMEZarrReader
from ome_zarr.io import ZarrLocation
import numpy as np
import s3fs
import zarr

In [2]:
SOURCE_BUCKET_NAME = "aind-open-data"
SUBJECT_ID = "652506"
SAMPLE_CHANNEL = "Ex_561_Em_593"
SAMPLE_LEVEL = 3
SAMPLE_NAME = f"{SUBJECT_ID}_{SAMPLE_CHANNEL}"

SMARTSPIM_OUTPUT_FILENAME = (
    f"data/input/{SUBJECT_ID}/{SAMPLE_NAME}_L{SAMPLE_LEVEL}.nii.gz"
)

print(f"Zarr file will be written to {SMARTSPIM_OUTPUT_FILENAME}")

Zarr file will be written to data/input/652506/652506_Ex_561_Em_593_L3.nii.gz


In [3]:
os.makedirs(os.path.dirname(SMARTSPIM_OUTPUT_FILENAME), exist_ok=True)

## Query AWS S3 For Available Data

The first step in fetching data is knowing what data is available to fetch. AIND has made SmartSPIM sample data available in multiscale OME-Zarr format on AWS S3.

AIND has used several naming conventions in its data:
 - "aind-open-data": The AIND public S3 bucket where data is stored
 - "SmartSPIM_\<id>_\<date>_stitched_\<date>": Organize data by collection/stitching date
 - "processed/OMEZarr": working with processed data in OME-Zarr format
 - "Ex_\<num>_Em_\<num>.zarr": Organize data by excitation/emission metrics for imaging
 
Naming conventions are subject to change in the future.

In [4]:
fs = s3fs.S3FileSystem(anon=True)

In [5]:
# Get all available samples
available_sample_buckets = fs.ls(SOURCE_BUCKET_NAME)
print(
    f"{len(available_sample_buckets)} buckets available at s3://aind-open-data/"
)

available_sample_buckets_iter = iter(available_sample_buckets)
for _ in range(5):
    print(next(available_sample_buckets_iter))
print("...")

578 buckets available at s3://aind-open-data/
aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10
aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10_stitched_2023-02-02_22-28-35
aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34
aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34_stitched_2023-02-20_19-47-09
aind-open-data/SmartSPIM_018559_2023-01-10_10-40-57
...


In [6]:
# Find the sample bucket of interest
sample_bucket_name = next(
    bucket_name
    for bucket_name in fs.ls(SOURCE_BUCKET_NAME)
    if re.match(f"(.+){SUBJECT_ID}(.+)stitched(.+)", bucket_name)
)

print(f"Selected sample bucket: {sample_bucket_name}")

Selected sample bucket: aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54


In [7]:
# View available sample channels
sample_bucket_channels_name = sample_bucket_name + "/processed/OMEZarr"
print(f"Available channels at {sample_bucket_channels_name}:")
fs.ls(sample_bucket_channels_name)

Available channels at aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr:


['aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/.zgroup',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_445_Em_469.zarr',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_488_Em_525.zarr',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr']

In [8]:
# Select the channel of interest
channel_bucket = sample_bucket_channels_name + f"/{SAMPLE_CHANNEL}.zarr"
print(f"Channel bucket contents:")
fs.ls(channel_bucket)

Channel bucket contents:


['aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/.zattrs',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/.zgroup',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/0',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/1',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/2',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/3',
 'aind-open-data/SmartSPIM_652506_2023-01-09_10-18-12_stitched_2023-01-13_19-00-54/processed/OMEZarr/Ex_561_Em_593.zarr/4']

## Read Metadata from AWS S3

In [9]:
# View OME-Zarr attributes for the sample channel
with fs.open(channel_bucket + "/.zattrs", "r") as f:
    ome_zarr_metadata = json.loads(f.read())

axis_spacings = {}
axis_units = {}
condensed_axes = []
for axis, spacing in zip(
    ome_zarr_metadata["multiscales"][0]["axes"],
    ome_zarr_metadata["multiscales"][0]["datasets"][SAMPLE_LEVEL][
        "coordinateTransformations"
    ][0]["scale"],
):
    if axis["type"] == "space" and axis["unit"] not in [
        "micrometer",
        "millimeter",
    ]:
        raise KeyError(f'Unexpected spatial type: {axis["type"]}')
    axis_spacings[axis["name"]] = (
        spacing * 1e-3
        if "unit" in axis.keys() and axis["unit"] == "micrometer"
        else spacing
    )
    print(
        f"{axis['name']} -> {spacing} {axis['unit'] if 'unit' in axis.keys() else ''} ({axis_spacings[axis['name']]})"
    )

# We have foreknowledge that 't' and 'c' axes are empty (size 1)
# ITK and OME-Zarr/numpy access conventions are reversed
itk_spatial_axes = ("x", "y", "z")

t -> 1.0 millisecond (1.0)
c -> 1.0  (1.0)
z -> 16.0 micrometer (0.016)
y -> 14.4 micrometer (0.014400000000000001)
x -> 14.4 micrometer (0.014400000000000001)


In [10]:
with fs.open(
    sample_bucket_name.split("_stitched")[0] + "/acquisition.json", "r"
) as f:
    acquisition_metadata = json.loads(f.read())

assert acquisition_metadata["subject_id"] == SUBJECT_ID
assert (
    acquisition_metadata["schema_version"] == "0.4.3"
)  # Version at time of writing
print(acquisition_metadata["axes"])

[{'name': 'X', 'dimension': 2, 'direction': 'Left_to_right', 'unit': 'micrometer'}, {'name': 'Y', 'dimension': 1, 'direction': 'Posterior_to_anterior', 'unit': 'micrometer'}, {'name': 'Z', 'dimension': 0, 'direction': 'Superior_to_inferior', 'unit': 'micrometer'}]


In [11]:
# Acquisition spatial axis directions are encoded in ITK as a bitmap composition
# of enumerated values in tertiary, secondary, primary order.
# https://github.com/InsightSoftwareConsortium/ITK/blob/cce48b3cfa5c8f41be0bf7393183132ab34f1a06/Modules/Core/Common/include/itkSpatialOrientation.h#L111-L112

ITK_COORDINATE_TERMS = {
    "anterior": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Anterior,
    "inferior": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Inferior,
    "left": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Left,
    "posterior": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Posterior,
    "right": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Right,
    "superior": itk.SpatialOrientationEnums.CoordinateTerms_ITK_COORDINATE_Superior,
}

acquisition_space_name = []
itk_direction_enum = 0

for axis_name in reversed(itk_spatial_axes):
    axis = next(
        el
        for el in acquisition_metadata["axes"]
        if el["name"].lower() == axis_name
    )
    axis_direction = axis["direction"].split("_to_")[1]
    acquisition_space_name.append(axis_direction)

    itk_direction_enum <<= 8
    itk_direction_enum |= ITK_COORDINATE_TERMS[axis_direction]

print(f'{"-".join(reversed(acquisition_space_name))}: {itk_direction_enum}')

# Values can be manually double checked against ITK coordinate space enums, i.e.
print(
    itk.SpatialOrientationEnums.ValidCoordinateOrientations_ITK_COORDINATE_ORIENTATION_RAI
)

right-anterior-inferior: 525570
525570


## Read Zarr Voxel Data From S3

We use build-in S3 support with `zarr.io` methods to fetch voxel data and convert to ITK format. Note that the image here is read in with incomplete metadata. Refer to "RegisterToCCF" for proper metadata handling.

In [12]:
# Use zarr S3 support to read store
store = s3fs.S3Map(root=channel_bucket, s3=fs, check=False)
cache = zarr.LRUStoreCache(store, max_size=2**28)
root = zarr.group(store=cache)

root.info

0,1
Name,/
Type,zarr.hierarchy.Group
Read-only,False
Store type,zarr.storage.LRUStoreCache
No. members,5
No. arrays,5
No. groups,0
Arrays,"0, 1, 2, 3, 4"


In [13]:
voxel_array = np.squeeze(np.asarray(root[SAMPLE_LEVEL]))
itk_image = itk.image_view_from_array(voxel_array.astype(np.float32))

In [14]:
# Apply subsequent metadata as done in RegisterToCCF
itk_image.SetSpacing(
    [axis_spacings[axis_name] for axis_name in itk_spatial_axes]
)

print(itk_image)

Image (0000029F395F2BC0)
  RTTI typeinfo:   class itk::Image<float,3>
  Reference Count: 1
  Modified Time: 9
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 0
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  Spacing: [0.0144, 0.0144, 0.016]
  Origin: [0, 0, 0]
  Direction: 
1 0 0
0 1 0
0 0 1

  IndexToPointMatrix: 
0.0144 0 0
0 0.0144 0
0 0 0.016

  PointToIndexMatrix: 
69.4444 0 0
0 69.4444 0
0 0 62.5

  Inverse Direction: 
1 0 0
0 1 0
0 0 1

  PixelContainer: 
    ImportImageContainer (0000029E95EB7490)
      RTTI typeinfo:   class itk::ImportImageContainer<unsigned __int64,float>
      Referenc

In [15]:
# ITK is in "right-to-left", "anterior-to-posterior", "inferior-to-superior" (LPS) space.

orient_filter = itk.OrientImageFilter.New(itk_image)
orient_filter.SetGivenCoordinateOrientation(itk_direction_enum)
orient_filter.SetDesiredCoordinateOrientation(
    itk.SpatialOrientationEnums.ValidCoordinateOrientations_ITK_COORDINATE_ORIENTATION_LPS
)
orient_filter.UpdateOutputInformation()

itk_image.CopyInformation(orient_filter.GetOutput())
print(itk_image)

Image (0000029F395F2BC0)
  RTTI typeinfo:   class itk::Image<float,3>
  Reference Count: 2
  Modified Time: 96
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 0
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [925, 1280, 464]
  Spacing: [0.0144, 0.0144, 0.016]
  Origin: [13.3056, 18.4176, 7.408]
  Direction: 
-1 0 0
0 -1 0
0 0 -1

  IndexToPointMatrix: 
-0.0144 0 0
0 -0.0144 0
0 0 -0.016

  PointToIndexMatrix: 
-69.4444 0 0
0 -69.4444 0
0 0 -62.5

  Inverse Direction: 
-1 0 0
0 -1 0
0 0 -1

  PixelContainer: 
    ImportImageContainer (0000029E95EB7490)
      RTTI typeinfo:   class itk::ImportImageContainer<unsigned 

In [16]:
itk.imwrite(itk_image, SMARTSPIM_OUTPUT_FILENAME, compression=True)