tensorstore support #26

JoOkuma · 2023-01-18T19:30:23Z

Tensorstore could be used for the zarr backend.

A few advantages are:

It's faster.
It supports transposes and other transformations on the coordinate space without loading the data.
It loads the data lazily until requested through np.asarray

The text was updated successfully, but these errors were encountered:

ziw-liu · 2023-01-18T21:04:20Z

It supports transposes and other transformations on the coordinate space without loading the data.

This will be useful for alternative viewing in mehta-lab/recOrder#297.

ziw-liu · 2023-01-19T20:03:19Z

I'm not sure if I'm reading this from the TensorStore documentation for the file Key-Value Store driver correctly:

This driver is only supported on Windows 10 RS1 or later, due to its reliance on file operations with POSIX semantics.

Does this mean that it does not support other OS? Or this sentence only applies to Windows since Linux and macOS are naturally POSIX-compatible?

ziw-liu · 2023-01-19T21:46:13Z

Another question related to #18 (comment):
How does TensorStore handle empty chunks? The zarr-python implementation supports sparse arrays.

ziw-liu · 2023-01-19T21:56:19Z

@JoOkuma @AhmetCanSolak From your experience, is TS mature enough to be the only Zarr backend for iohub? Is there any missing feature as compared to zarr-python that we might find blocking in the future?

JoOkuma · 2023-01-19T22:01:55Z

@ziw-liu , they have the fill value parameter.

Or this sentence only applies to Windows since Linux and macOS are naturally POSIX-compatible?

I think that's the case, because I've been using on Linux for a while.

Is there any missing feature as compared to zarr-python that we might find blocking in the future?

Nothing comes to mind right now. I never tried using with datasets with multiple "wells", but ours light-sheet dataset sometimes have multiples keys, so it should work fine.

ziw-liu · 2023-01-19T23:27:52Z

ours light-sheet dataset sometimes have multiples keys, so it should work fine.

Are those multiple keys multiple zarrays or zgroups? I couldn't find documentation about managing dataset hierarchy in TS. Can you try with this example OME-NGFF HCS dataset?

Edit: aka the equivalent for this feature in zarr-python, which is essential for NGFF support.

AhmetCanSolak · 2023-01-19T23:49:00Z

@JoOkuma @AhmetCanSolak From your experience, is TS mature enough to be the only Zarr backend for iohub? Is there any missing feature as compared to zarr-python that we might find blocking in the future?

I think it can be the only Zarr backend, we don't need to support pre-Windows10, we don't have any such machine at hub AFAIK. It is pretty feature complete also.

TensorStore supports nested Zarr groups. What do you want to test @ziw-liu with the example OME-NGFF HCS dataset? If you could give me more details, I am happy to help testing.

ziw-liu · 2023-01-19T23:53:22Z

What do you want to test @ziw-liu with the example OME-NGFF HCS dataset?

Just to see if TS can do basic reading/writing with it. I'm a bit confused as the words 'group' and 'hierarchy' don't seem to exist at all in the doc and issues.

ziw-liu · 2023-01-20T05:00:27Z

Also not sure about how the Zarr attributes (.zattrs file described in the Zarr spec) is handled by TS. The zarr-python implementation takes care of things like caching, consolidation, and (de)serialization. This feature is used by the NGFF as well as some of iohub's downstream packages (e.g. recOrder, microDL) to store metadata.

AhmetCanSolak · 2023-01-20T16:35:31Z

The way I understand it (please correct me where I am wrong @JoOkuma), you can use zarr to open any group load any array with tensorstore. TS supports caching, (de)serialization and so. According to TS documentation, only filters are not supported: https://google.github.io/tensorstore/driver/zarr/index.html#limitations

I opened OME-NGFF HCS Dataset without any issues with following script:

import tensorstore as ts
import zarr


def main():
    store_path = "20200812-CardiomyocyteDifferentiation14-Cycle1.zarr"
    internal_path = "B/03/0/0"
    
    # open zarr store
    zs = zarr.open(store_path, mode='a')
    array = zs[internal_path]

    # read zarr
    dataset = ts.open({
        "driver": "zarr",
        "kvstore": {
            "driver": "file",
            "path": f"{store_path}/{internal_path}",
        },
        "metadata": {
            "shape": array.shape,
            "order": array.order,
            "dtype": array.dtype.str,
        }
    }, open=True).result()

    print(dataset)


if __name__ == '__main__':
    main()

Output:

TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
  },
  'driver': 'zarr',
  'dtype': 'uint16',
  'kvstore': {
    'driver': 'file',
    'path': '20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/0/',
  },
  'metadata': {
    'chunks': [1, 1, 2160, 2560],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': 1,
    },
    'dimension_separator': '/',
    'dtype': '<u2',
    'fill_value': 0,
    'filters': None,
    'order': 'C',
    'shape': [1, 2, 2160, 5120],
    'zarr_format': 2,
  },
  'transform': {
    'input_exclusive_max': [[1], [2], [2160], [5120]],
    'input_inclusive_min': [0, 0, 0, 0],
  },
})

ziw-liu · 2023-01-20T17:05:30Z

you can use zarr to open any group load any array with tensorstore

Sorry I wasn't quite clear, by:

is TS mature enough to be the only Zarr backend for iohub

I meant to ask if TS is a complete implementation of the Zarr specification and iohub can stop depending on zarr-python if it used TS. And now it seems that TS only implemented the array storage protocol. Then essentially we still need zarr-python around, and adding TS is a '+1 dependency' move. I think it will significantly increase the complexity of code, as the thing (reading an array) we can do with one line via zarr-python now needs 10+ lines constructing an extra json object.

TS supports caching, (de)serialization and so.

I couldn't find how TS handles .zattrs as zarr-python does with zarr.attrs.Attributes. Can you provide a snippet for this too? (e.g. read the channel name OME metadata field from the example dataset).

JoOkuma · 2023-01-20T21:45:06Z

I dug a little deeper into this in some more general use cases that we did not try in DEXP.

tensorstore loads zarr arrays and not groups / hierarchies, so it cant replace zarr-python completely, and it doesn't interact with .zattrs because they can live at a group level.

Therefore, I think it should be an optional backend, and it can be implemented later on top of the default zarr backend, I'm glad to help with this.
I'm ok with some iohub functionalities (e.g., lazy loading of arbitrary chunks) being exclusive to a tensorstore backend if it makes things simpler.

JoOkuma · 2023-01-20T21:48:11Z

I don't think it's worth implementing our own code to transverse the zarr hierarchy to avoid depending on zarr-python.

tensorstore adds more complexity than the default zarr-python, and in my opinion, it shouldn't be a priority since it isn't a drop-in replacement.

ziw-liu · 2023-01-20T23:51:50Z

@JoOkuma @AhmetCanSolak Thanks for the input! For now I will keep using zarr-python. And I imagine adding tensorstore as an alternative array protocol can be done later without too much refactoring. Maybe something like:

from iohub.ngff import HCSDataset
import numpy as np

dataset = HCSDataset.open("hcs.zarr", mode="a")
image = np.random.randint((1, 2, 3, 1024, 1024), dtype=np.uint16)

# uses zarr-python
dataset["B/5/2/0"] = image
# uses tensorstore
write_future = dataset["B/5/3/0"].delayed_write(image)
write_future.result()

AhmetCanSolak · 2023-01-21T00:38:20Z

I would suggest us to have TS as a backend option @ziw-liu in the future. Datasets read with iohub will be pass around for different purposes and I like to make sure once TS backend is selected all assignment/write/read functionalities using TS. It would be a bit of a refactoring, but I am happy to work on it together.

JoOkuma added the enhancement New feature or request label Jan 18, 2023

ziw-liu added the performance Speed and memory usage of the code label Feb 1, 2023

mattersoflight mentioned this issue Feb 15, 2023

identify optimal chunk size for computational imaging, DL, and visualization workflows #33

Open

mattersoflight added this to the 0.2.0 milestone Feb 15, 2023

mattersoflight mentioned this issue Feb 18, 2023

Should ngff.ImageArray have a different name? #60

Closed

ziw-liu added the NGFF OME-NGFF (OME-Zarr format) label Mar 8, 2023

ilan-theodoro mentioned this issue Jun 20, 2023

Add conversion from Zarr array to tensorstore #150

Closed

ziw-liu linked a pull request Jun 21, 2023 that will close this issue

Add conversion from Zarr array to tensorstore #150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorstore support #26

tensorstore support #26

JoOkuma commented Jan 18, 2023

ziw-liu commented Jan 18, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

JoOkuma commented Jan 19, 2023

ziw-liu commented Jan 19, 2023 •

edited

Loading

AhmetCanSolak commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 20, 2023

AhmetCanSolak commented Jan 20, 2023

ziw-liu commented Jan 20, 2023

JoOkuma commented Jan 20, 2023

JoOkuma commented Jan 20, 2023

ziw-liu commented Jan 20, 2023 •

edited

Loading

AhmetCanSolak commented Jan 21, 2023

tensorstore support #26

tensorstore support #26

Comments

JoOkuma commented Jan 18, 2023

ziw-liu commented Jan 18, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

JoOkuma commented Jan 19, 2023

ziw-liu commented Jan 19, 2023 • edited Loading

AhmetCanSolak commented Jan 19, 2023

ziw-liu commented Jan 19, 2023

ziw-liu commented Jan 20, 2023

AhmetCanSolak commented Jan 20, 2023

ziw-liu commented Jan 20, 2023

JoOkuma commented Jan 20, 2023

JoOkuma commented Jan 20, 2023

ziw-liu commented Jan 20, 2023 • edited Loading

AhmetCanSolak commented Jan 21, 2023

ziw-liu commented Jan 19, 2023 •

edited

Loading

ziw-liu commented Jan 20, 2023 •

edited

Loading