# Fetch SmartSPIM Data From S3

In this notebook we demonstrate how to fetch SmartSPIM data from the "aind-open-data" AWS S3 bucket outside of CodeOcean. The notebook is intended as a companion to the "RegisterToCCF" notebook which assumes that data is available via the mounted filesystem in the CodeOcean cloud platform. Note that metadata on the output image should be set carefully, refer to the "RegisterToCCF" notebook on metadata considerations for SmartSPIM input images.

SmartSPIM sample data is made publicly available by the Allen Institute for Neural Dynamics.

# Initialize the Notebook

In [1]:
import re

import itk
import ome_zarr
from ome_zarr.reader import Reader as OMEZarrReader
from ome_zarr.io import ZarrLocation
import numpy as np
import sys
import s3fs
import zarr

In [2]:
SOURCE_BUCKET_NAME = "aind-open-data"
SAMPLE_ID = "631680"
SAMPLE_CHANNEL = "Ex_647_Em_690"

SMARTSPIM_OUTPUT_FILENAME = f"../results/{SAMPLE_ID}_{SAMPLE_CHANNEL}.nii.gz"

## Query AWS S3 For Available Data

The first step in fetching data is knowing what data is available to fetch. AIND has made SmartSPIM sample data available in multiscale OME-Zarr format on AWS S3.

AIND has used several naming conventions in its data:
 - "aind-open-data": The AIND public S3 bucket where data is stored
 - "SmartSPIM_\<id>_\<date>_stitched_\<date>": Organize data by collection/stitching date
 - "processed/OMEZarr": working with processed data in OME-Zarr format
 - "Ex_\<num>_Em_\<num>.zarr": Organize data by excitation/emission metrics for imaging
 
Naming conventions are subject to change in the future.

In [3]:
fs = s3fs.S3FileSystem(anon=True)

In [4]:
# Get all available samples
available_sample_buckets = fs.ls(SOURCE_BUCKET_NAME)
print(f"{len(available_sample_buckets)} buckets available at s3://aind-open-data/")

available_sample_buckets_iter = iter(available_sample_buckets)
for _ in range(5):
    print(next(available_sample_buckets_iter))
print("...")

545 buckets available at s3://aind-open-data/
aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10
aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10_stitched_2023-02-02_22-28-35
aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34
aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34_stitched_2023-02-20_19-47-09
aind-open-data/SmartSPIM_018559_2023-01-10_10-40-57
...


In [5]:
# Find the sample bucket of interest
sample_bucket_name = next(
    bucket_name
    for bucket_name in fs.ls(SOURCE_BUCKET_NAME)
    if re.match(f"(.+){SAMPLE_ID}(.+)stitched(.+)", bucket_name)
)

print(f"Selected sample bucket: {sample_bucket_name}")

Selected sample bucket: aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18


In [6]:
# View available sample channels
sample_bucket_channels_name = sample_bucket_name + "/processed/OMEZarr"
print(f"Available channels at {sample_bucket_channels_name}:")
fs.ls(sample_bucket_channels_name)

Available channels at aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr:


['aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/.zgroup',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_488_Em_525.zarr',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_561_Em_600.zarr',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr']

In [7]:
# Select the channel of interest
channel_bucket = sample_bucket_channels_name + f"/{SAMPLE_CHANNEL}.zarr"
print(f"Channel bucket contents:")
fs.ls(channel_bucket)

Channel bucket contents:


['aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/.zattrs',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/.zgroup',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/0',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/1',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/2',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/3',
 'aind-open-data/SmartSPIM_631680_2022-09-09_13-52-33_stitched_2022-11-10_17-18-18/processed/OMEZarr/Ex_647_Em_690.zarr/4']

## Read Metadata from AWS S3

In [8]:
# View OME-Zarr attributes for the sample channel
with fs.open(channel_bucket + "/.zattrs", "r") as f:
    print(f.read())

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "t",
                    "type": "time",
                    "unit": "millisecond"
                },
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "x",
                    "type": "space",
                    "unit": "micrometer"
                }
            ],
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                1.

## Read Zarr Voxel Data From S3

We use build-in S3 support with `zarr.io` methods to fetch voxel data and convert to ITK format. Note that the image here is read in with incomplete metadata. Refer to "RegisterToCCF" for proper metadata handling.

In [9]:
# Use zarr S3 support to read store
store = s3fs.S3Map(root=channel_bucket, s3=fs, check=False)
cache = zarr.LRUStoreCache(store, max_size=2**28)
root = zarr.group(store=cache)

root.info

0,1
Name,/
Type,zarr.hierarchy.Group
Read-only,False
Store type,zarr.storage.LRUStoreCache
No. members,5
No. arrays,5
No. groups,0
Arrays,"0, 1, 2, 3, 4"


In [10]:
voxel_array = np.squeeze(np.asarray(root[4]))
itk_image = itk.image_view_from_array(voxel_array.astype(np.float32))

# Apply subsequent metadata as done in RegisterToCCF

In [11]:
itk.imwrite(itk_image, SMARTSPIM_OUTPUT_FILENAME, compression=True)