DataJoint U24 - Workflow Volume


### Intro


This notebook will describe the steps to use Element Volume for interacting with BossDB.
Prior to using this notebook, please refer to documentation for
[Element installation instructions](https://datajoint.com/docs/elements/user-guide/) and refer to [BossDB resources](https://www.youtube.com/watch?v=eVNr6Pzxoh8) for information on generating an account and configuring `intern`.

Importantly, you'll need an `intern` config file, which should look like this:

```cfg
    # ~/.intern/intern.cfg
    [Default]
    protocol = https
    host = api.bossdb.io
    token = <YOUR_TOKEN>
```


In [1]:
import datajoint as dj
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
dj.conn()

[2023-02-09 12:59:13,967][INFO]: Connecting cbroz@dss-db.datajoint.io:3306
[2023-02-09 12:59:14,356][INFO]: Connected cbroz@dss-db.datajoint.io:3306


DataJoint connection (connected) cbroz@dss-db.datajoint.io:3306

In [2]:
dj.config["custom"]["database.prefix"] = "cbroz_wfboss_"
dj.config["custom"][
    "vol_root_data_dir"
] = "/Users/cb/Documents/data/U24_SampleData/boss/"
from workflow_volume.pipeline import volume, BossDBInterface, bossdb

# volume.Volume.delete_quick()

In [3]:
volume.Volume()

volume_id  shorthand for this volume,"resolution_id  Shorthand for convention. For BossDB, integer value.",subject,session_datetime,z_size  total number of voxels in z dimension,y_size  total number of voxels in y dimension,x_size  total number of voxels in x dimension,slicing_dimension  perspective of slices,"channel  data type or modality (e.g., EM, segmentation, etc.)",url  dataset URL
takemura/takemura13,4,,,1460,750,750,z,image,https://api.bossdb.io/v1/mgmt/resources/takemura/takemura13/image


`BossDBInterface` works much like `intern.array`, but with additional functionality for managing records in your Element Volume schema. We can optionally link this dataset to a session in our pipeline via a session key.

Note, however, that we'll have to change our notation slightly. Whereas we can directly index into a dataset to get slices, we'll need to either provide slices as a string or a tuple.


### Testing


In [3]:
data = BossDBInterface(
    "bossdb://takemura/takemura13/image", resolution=4, session_key={}
)



Using `intern` notion, we can look at Z slice 300, from Y voxels 200-500, and X voxels 0 to 700.


In [4]:
data[300, 200:501, 0:701]

array([[139, 152, 129, ..., 160, 125, 154],
       [148, 146, 127, ..., 134, 118, 170],
       [ 75, 173, 145, ..., 107, 115, 111],
       ...,
       [ 98,  99,  99, ..., 119,  93,  87],
       [ 91,  86,  88, ...,  86,  79,  84],
       [118, 130, 102, ...,  96,  88, 111]], dtype=uint8)

The same data can be downloaded and loaded into Element Volume using either of the following commands.

Without a session directory provided via `get_session_directory` in `workflow_volume.paths`, we will infer an output directory based on the BossDB path from `get_vol_root_data_dir`.


In [5]:
# data.download(slice_key=(300,slice(200,501),slice(0,701)))
data.download(slice_key="[300,200:501,0:701]")

Our volume is stored in the `Volume`

In [7]:
volume.Volume()

volume_id  shorthand for this volume,"resolution_id  Shorthand for convention. For BossDB, integer value.",subject,session_datetime,z_size  total number of voxels in z dimension,y_size  total number of voxels in y dimension,x_size  total number of voxels in x dimension,slicing_dimension  perspective of slices,"channel  data type or modality (e.g., EM, segmentation, etc.)",url  dataset URL
takemura/takemura13,4,,,1460,750,750,z,image,https://api.bossdb.io/v1/mgmt/resources/takemura/takemura13/image


With `Slice` corresponding to slices

In [8]:
volume.Volume.Slice()

volume_id  shorthand for this volume,"resolution_id  Shorthand for convention. For BossDB, integer value.",id  Nth voxel in slicing_dimension,zoom_id  Shorthand for zoom convention,file_path  filepath relative to root data directory
takemura/takemura13,4,300,X0-701_Y200-501,takemura/takemura13/image/Res4_ZoomX0-701_Y200-501_Z300.png


Each BossDB resolution will have a unique entry in the `Resolution` table

In [9]:
volume.Resolution()

"resolution_id  Shorthand for convention. For BossDB, integer value.","voxel_unit  e.g., nanometers",voxel_z_size  size of one z dimension voxel in voxel_units,voxel_y_size  size of one y dimension voxel in voxel_units,voxel_x_size  size of one x dimension voxel in voxel_units
4,nanometers,70.4,70.4,704.0


And, the `Zoom` table retain information about the X/Y windows we use.

In [10]:
volume.Zoom()

zoom_id  Shorthand for zoom convention,first_start  Starting voxel in first dimension (X if taking Z slices),first_end  Ending voxel plus 1 in first dimension,second_start  Starting voxel in second dimension (Y if taking Z slices),second_end  Ending voxel plus 1 in second dimension
Full Image,0,,0,
X0-701_Y200-501,0,701.0,200,501.0


Changing any of these pieces of information would download different data.

In [12]:
data.download(slice_key=(slice(300, 311), slice(100, 401), slice(100, 401)))

In [None]:
import logging
import numpy as np
from workflow_volume.pipeline import volume, bossdb, session, subject
from workflow_volume.paths import get_vol_root_data_dir
from element_volume.volume import *

# from workflow_volume.pipeline import BossDBInterface

# em_data = BossDBInterface("bossdb://Kasthuri/ac4/em", resolution=0)
# seg_data = BossDBInterface("bossdb://Kasthuri/ac4/neuron", resolution=0)
# em_data = BossDBInterface("bossdb://witvliet2020/Dataset_1/em", resolution=0)
# seg_data = BossDBInterface("bossdb://witvliet2020/Dataset_1/segmentation", resolution=0)

logger = logging.getLogger("datajoint")

volume_key = dict(volume_id="Thy1")


def drop_schemas():
    from datajoint_utilities.dj_search.lists import drop_schemas

    prefix = dj.config["custom"]["database.prefix"]
    drop_schemas(prefix, dry_run=False, force_drop=True)


def drop_tables():
    tables = [
        volume.Connectome,
        volume.ConnectomeTask,
        volume.ConnectomeParamset,
        volume.Segmentation,
        volume.Segmentation.Cell,
        volume.CellMapping,
        volume.SegmentationTask,
        volume.SegmentationParamset,
    ]
    for t in tables:
        t.drop_quick()


class upload:
    @classmethod
    def manual_entry(cls):
        from datetime import datetime

        subject.Subject.insert1(
            dict(subject="sub1", sex="M", subject_birth_date=datetime.now()),
            skip_duplicates=True,
        )
        session.Session.insert1(
            dict(
                **(subject.Subject & "subject='sub1'").fetch1("KEY"),
                session_id=1,
                session_datetime=datetime.now(),
            ),
            skip_duplicates=True,
        )
        session.SessionDirectory.insert1(
            dict(**session.Session.fetch1("KEY"), session_dir="sample"),
            skip_duplicates=True,
        )
        volume.Resolution.insert1(
            dict(
                resolution_id="990nm",
                voxel_unit="micrometers",
                voxel_z_size=1,
                voxel_y_size=0.5,
                voxel_x_size=0.5,
                downsampling=0,
            ),
            skip_duplicates=True,
        )

        coll, exp, chann, seg = (
            "DataJointTest",
            "test",
            "CalciumImaging",
            "Segmentation",
        )

        bossdb.BossDBURLs.load_bossdb_info(
            collection=coll,
            experiment=exp,
            volume=chann,
            segmentation=seg,
            skip_duplicates=True,
        )
        url_key = (
            bossdb.BossDBURLs.Volume & dict(collection_experiment=f"{coll}/{exp}")
        ).fetch1()

        raw_data = cls.load_sample_data()
        raw_data_shape = raw_data.shape

        volume.Volume.insert1(
            dict(
                volume_id="Thy1",
                resolution_id="990nm",
                session_id=1,
                z_size=raw_data_shape[0],
                y_size=raw_data_shape[1],
                x_size=raw_data_shape[2],
                channel=chann,
                **url_key,
                volume_data=raw_data,
            ),
            skip_duplicates=True,
        )

    def load_sample_data():
        from tifffile import TiffFile
        from PIL import Image
        from pathlib import Path

        root_dir = get_vol_root_data_dir()[0]
        image_fp = root_dir + "sample/zstack_Gcamp_00001_00012.tif"
        png_fp = root_dir + "sample/Z%02d.png"
        image_sample = TiffFile(image_fp).asarray()[250:270, 1000:1246, :]
        if not Path(png_fp % 0).exists():
            for z in range(20):
                Image.fromarray(image_sample[z]).save(png_fp % z)
        return image_sample

    def upload_from_volume():
        volume.Volume.upload(volume_key)
        # Error uploading chunk 0-20: ndarray is not C-contiguous


class download:
    def add_manual_boss_url():
        bossdb.BossDBURLs.load_bossdb_info(
            collection="Kasthuri",
            experiment="ac4",
            volume="em",
            segmentation="neuron",
            skip_duplicates=True,
        )
        bossdb.BossDBURLs.load_bossdb_info(
            collection="witvliet2020",
            experiment="Dataset_1",
            volume="em",
            segmentation="segmentation",
            skip_duplicates=True,
        )

    def download_volume_via_classmeth():
        volume.Volume.download(
            url="bossdb://witvliet2020/Dataset_1/em",
            slice_key="[100:120,1000:1500,1000:1500]",
            save_images=True,
            save_ndarray=True,
            image_mode="P",
            skip_duplicates=True,
        )

    def download_seg_via_classmeth():
        volume.SegmentationParamset.insert_new_params(
            segmentation_method="bossdb",
            paramset_idx=1,
            params=dict(
                slice_key="[100:120,1000:1500,1000:1500]",
                save_images=True,
                save_ndarray=True,
                image_mode="P",
                skip_duplicates=True,
            ),
        )
        volume.SegmentationTask.insert1(
            dict(
                volume_id="witvliet2020/Dataset_1",
                resolution_id=0,
                task_mode="load",
                paramset_idx=1,
                **(
                    bossdb.BossDBURLs.Segmentation & "collection_experiment LIKE 'wit%'"
                ).fetch1(),
            )
        )
        volume.Segmentation.populate()

    @classmethod
    def run_all(cls):
        cls.add_manual_boss_url()
        cls.download_volume_via_classmeth()
        cls.download_seg_via_classmeth()