# TensorStore Performance Evaluation for MBO Workflows

This notebook evaluates whether TensorStore provides performance benefits for your imaging data workflows compared to current approaches using tifffile, zarr, and dask.

## What is TensorStore?

TensorStore is a C++ library with Python bindings designed for:
- **High-performance I/O**: Asynchronous reads/writes with caching
- **Multiple formats**: Native support for Zarr, N5, neuroglancer_precomputed
- **Storage flexibility**: Local filesystem, network drives, cloud storage (GCS, S3)
- **Advanced indexing**: Composable virtual views without copying data
- **Concurrency**: Safe multi-process/multi-machine access with ACID guarantees

## Key Questions

1. **Speed**: Does TensorStore read/write faster than tifffile + zarr?
2. **Network drives**: Does async I/O + caching help with slow network storage?
3. **Memory**: How does memory usage compare?
4. **Integration**: Can it replace MboRawArray or work alongside it?
5. **Use cases**: When should you use TensorStore vs current tools?

In [11]:
# Install tensorstore if needed
# !pip install tensorstore

import tensorstore as ts
import numpy as np
import zarr
from numcodecs import Blosc
import time
from pathlib import Path
import tempfile
from tqdm import tqdm

from mbo_utilities import imread, imwrite

In [12]:
import tensorstore as ts
import numpy as np
import zarr
import time
from pathlib import Path
import tempfile

## Section 1: Create Test Data

Generate realistic test data similar to your ScanImage acquisitions.

In [13]:
# Create test data matching your typical dataset dimensions
# Typical: (T, Y, X) = (1000-5000, 512, 512) or multi-plane (T*Z, Y, X)

test_shape = (2000, 512, 512)  # ~2GB uint16 dataset
chunk_size = (100, 128, 128)   # Reasonable chunk size

print(f"Creating test data: {test_shape}")
print(f"Data size: {np.prod(test_shape) * 2 / 1e9:.2f} GB (uint16)")

# Generate synthetic data with realistic statistics
np.random.seed(42)
test_data = np.random.randint(0, 4095, size=test_shape, dtype=np.uint16)

# Add some structure (spots + background) to simulate real calcium imaging
for t in range(test_shape[0]):
    # Add ~100 Gaussian spots per frame
    for _ in range(100):
        y, x = np.random.randint(50, test_shape[1]-50, 2)
        amplitude = np.random.randint(500, 2000)
        Y, X = np.ogrid[:test_shape[1], :test_shape[2]]
        mask = ((Y - y)**2 + (X - x)**2) < 25
        test_data[t][mask] = np.clip(test_data[t][mask] + amplitude, 0, 4095)

print(f"Test data stats: mean={test_data.mean():.1f}, std={test_data.std():.1f}")

Creating test data: (2000, 512, 512)
Data size: 1.05 GB (uint16)
Test data stats: mean=2074.1, std=1189.2


## Section 2: Write Performance - TensorStore vs Zarr vs TIFF

Compare write speeds for different backends.

In [None]:
tmpdir = Path(tempfile.mkdtemp())
print(f"Temp directory: {tmpdir}")

results = {}

# ============================================================================
# 1. TensorStore write (Zarr backend)
# ============================================================================
print("\n[1/4] Writing with TensorStore (Zarr)...")
ts_path = tmpdir / "tensorstore.zarr"

start = time.time()

# Create TensorStore dataset
dataset_ts = ts.open({
    'driver': 'zarr',
    'kvstore': {
        'driver': 'file',
        'path': str(ts_path),
    },
    'metadata': {
        'compressor': {
            'id': 'blosc',
            'cname': 'lz4',
            'clevel': 5,
            'shuffle': 1,
        },
        'dtype': '<u2',
        'shape': list(test_shape),
        'chunks': list(chunk_size),
    },
    'create': True,
    'delete_existing': True,
}).result()

# Write asynchronously
write_future = dataset_ts.write(test_data)
write_future.result()  # Wait for completion

ts_write_time = time.time() - start
results['tensorstore_write'] = ts_write_time
print(f"TensorStore write: {ts_write_time:.2f}s")

# ============================================================================
# 2. Standard Zarr write (using Zarr v3 API)
# ============================================================================
print("\n[2/4] Writing with standard Zarr...")
zarr_path = tmpdir / "standard.zarr"

start = time.time()

# Zarr v3 uses codecs instead of compressor
z = zarr.create(
    store=str(zarr_path),
    shape=test_shape,
    chunks=chunk_size,
    dtype=np.uint16,
    codecs=[Blosc(cname='lz4', clevel=5, shuffle='shuffle')],  # Zarr v3 API
    overwrite=True,
)
z[:] = test_data

zarr_write_time = time.time() - start
results['zarr_write'] = zarr_write_time
print(f"Zarr write: {zarr_write_time:.2f}s")

# ============================================================================
# 3. MBO imwrite (TIFF)
# ============================================================================
print("\n[3/4] Writing with mbo_utilities.imwrite (TIFF)...")
tiff_path = tmpdir / "test.tif"

start = time.time()
imwrite(tiff_path, test_data, compression='lzw')
tiff_write_time = time.time() - start
results['tiff_write'] = tiff_write_time
print(f"TIFF write: {tiff_write_time:.2f}s")

# ============================================================================
# 4. MBO imwrite (Zarr)
# ============================================================================
print("\n[4/4] Writing with mbo_utilities.imwrite (Zarr)...")
mbo_zarr_path = tmpdir / "mbo.zarr"

start = time.time()
imwrite(mbo_zarr_path, test_data)
mbo_zarr_write_time = time.time() - start
results['mbo_zarr_write'] = mbo_zarr_write_time
print(f"MBO Zarr write: {mbo_zarr_write_time:.2f}s")

print("\n" + "="*60)
print("WRITE PERFORMANCE SUMMARY")
print("="*60)
for name, t in results.items():
    print(f"{name:25s}: {t:6.2f}s  ({test_data.nbytes/t/1e6:6.1f} MB/s)")

## Section 3: Read Performance - Full Array

Compare read speeds when loading entire dataset into memory.

In [None]:
read_results = {}

# ============================================================================
# 1. TensorStore read
# ============================================================================
print("[1/4] Reading with TensorStore...")

start = time.time()
dataset_ts = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(ts_path)},
}).result()
data_ts = dataset_ts.read().result()
ts_read_time = time.time() - start
read_results['tensorstore_read'] = ts_read_time
print(f"TensorStore read: {ts_read_time:.2f}s")

# ============================================================================
# 2. Standard Zarr read
# ============================================================================
print("\n[2/4] Reading with standard Zarr...")

start = time.time()
z = zarr.open(str(zarr_path), mode='r')
data_zarr = z[:]
zarr_read_time = time.time() - start
read_results['zarr_read'] = zarr_read_time
print(f"Zarr read: {zarr_read_time:.2f}s")

# ============================================================================
# 3. TIFF read
# ============================================================================
print("\n[3/4] Reading with mbo_utilities.imread (TIFF)...")

start = time.time()
data_tiff = imread(tiff_path)[:]
tiff_read_time = time.time() - start
read_results['tiff_read'] = tiff_read_time
print(f"TIFF read: {tiff_read_time:.2f}s")

# ============================================================================
# 4. MBO Zarr read
# ============================================================================
print("\n[4/4] Reading with mbo_utilities.imread (Zarr)...")

start = time.time()
data_mbo_zarr = imread(mbo_zarr_path)[:]
mbo_zarr_read_time = time.time() - start
read_results['mbo_zarr_read'] = mbo_zarr_read_time
print(f"MBO Zarr read: {mbo_zarr_read_time:.2f}s")

print("\n" + "="*60)
print("READ PERFORMANCE SUMMARY (Full Array)")
print("="*60)
for name, t in read_results.items():
    print(f"{name:25s}: {t:6.2f}s  ({test_data.nbytes/t/1e6:6.1f} MB/s)")

# Verify data integrity
print("\nVerifying data integrity...")
assert np.array_equal(data_ts, test_data), "TensorStore data mismatch!"
assert np.array_equal(data_zarr, test_data), "Zarr data mismatch!"
assert np.array_equal(data_tiff, test_data), "TIFF data mismatch!"
assert np.array_equal(data_mbo_zarr, test_data), "MBO Zarr data mismatch!"
print("✓ All data matches original")

[1/4] Reading with TensorStore...


ValueError: NOT_FOUND: Error opening "zarr" driver: Metadata at local file "C:/Users/RBO/AppData/Local/Temp/tmpe_24o4te/tensorstore.zarr/.zarray" does not exist [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_locking\":{},\"file_io_mode\":{},\"file_io_sync\":true},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/Users/RBO/AppData/Local/Temp/tmpe_24o4te/tensorstore.zarr/\"}}'] [source locations='tensorstore/driver/kvs_backed_chunk_driver.cc:1322\ntensorstore/driver/driver.cc:116']

## Section 4: Slice Performance - Random Access Patterns

Test performance for typical access patterns:
1. Single frame access (important for GUI)
2. Small ROI extraction
3. Time series from single pixel/ROI
4. Strided access (every Nth frame)

In [None]:
# Open datasets for slicing tests
ts_dataset = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(ts_path)},
}).result()

zarr_dataset = zarr.open(str(zarr_path), mode='r')
tiff_dataset = imread(tiff_path)

slice_results = {}

# ============================================================================
# Test 1: Single frame access (100 random frames)
# ============================================================================
print("Test 1: Single frame access (100 random frames)")
print("="*60)

frames_to_test = np.random.randint(0, test_shape[0], 100)

# TensorStore
start = time.time()
for idx in frames_to_test:
    _ = ts_dataset[idx].read().result()
ts_frame_time = time.time() - start
print(f"TensorStore: {ts_frame_time:.3f}s ({ts_frame_time/100*1000:.1f}ms per frame)")

# Zarr
start = time.time()
for idx in frames_to_test:
    _ = zarr_dataset[idx]
zarr_frame_time = time.time() - start
print(f"Zarr:        {zarr_frame_time:.3f}s ({zarr_frame_time/100*1000:.1f}ms per frame)")

# TIFF (MboRawArray caches decoded frames)
start = time.time()
for idx in frames_to_test:
    _ = tiff_dataset[idx]
tiff_frame_time = time.time() - start
print(f"TIFF:        {tiff_frame_time:.3f}s ({tiff_frame_time/100*1000:.1f}ms per frame)")

slice_results['single_frame'] = {
    'tensorstore': ts_frame_time,
    'zarr': zarr_frame_time,
    'tiff': tiff_frame_time,
}

# ============================================================================
# Test 2: Small ROI extraction (100x100 pixels, 100 frames)
# ============================================================================
print("\nTest 2: Small ROI extraction (100x100 pixels, 100 frames)")
print("="*60)

roi_slice = (slice(0, 100), slice(200, 300), slice(200, 300))

# TensorStore
start = time.time()
_ = ts_dataset[roi_slice].read().result()
ts_roi_time = time.time() - start
print(f"TensorStore: {ts_roi_time:.3f}s")

# Zarr
start = time.time()
_ = zarr_dataset[roi_slice]
zarr_roi_time = time.time() - start
print(f"Zarr:        {zarr_roi_time:.3f}s")

# TIFF
start = time.time()
_ = tiff_dataset[roi_slice]
tiff_roi_time = time.time() - start
print(f"TIFF:        {tiff_roi_time:.3f}s")

slice_results['small_roi'] = {
    'tensorstore': ts_roi_time,
    'zarr': zarr_roi_time,
    'tiff': tiff_roi_time,
}

# ============================================================================
# Test 3: Time series from single pixel
# ============================================================================
print("\nTest 3: Time series from single pixel (all frames)")
print("="*60)

pixel_slice = (slice(None), 256, 256)

# TensorStore
start = time.time()
_ = ts_dataset[pixel_slice].read().result()
ts_pixel_time = time.time() - start
print(f"TensorStore: {ts_pixel_time:.3f}s")

# Zarr
start = time.time()
_ = zarr_dataset[pixel_slice]
zarr_pixel_time = time.time() - start
print(f"Zarr:        {zarr_pixel_time:.3f}s")

# TIFF
start = time.time()
_ = tiff_dataset[pixel_slice]
tiff_pixel_time = time.time() - start
print(f"TIFF:        {tiff_pixel_time:.3f}s")

slice_results['pixel_timeseries'] = {
    'tensorstore': ts_pixel_time,
    'zarr': zarr_pixel_time,
    'tiff': tiff_pixel_time,
}

# ============================================================================
# Test 4: Strided access (every 10th frame)
# ============================================================================
print("\nTest 4: Strided access (every 10th frame)")
print("="*60)

stride_slice = (slice(0, test_shape[0], 10), slice(None), slice(None))

# TensorStore
start = time.time()
_ = ts_dataset[stride_slice].read().result()
ts_stride_time = time.time() - start
print(f"TensorStore: {ts_stride_time:.3f}s")

# Zarr
start = time.time()
_ = zarr_dataset[stride_slice]
zarr_stride_time = time.time() - start
print(f"Zarr:        {zarr_stride_time:.3f}s")

# TIFF
start = time.time()
_ = tiff_dataset[stride_slice]
tiff_stride_time = time.time() - start
print(f"TIFF:        {tiff_stride_time:.3f}s")

slice_results['strided_access'] = {
    'tensorstore': ts_stride_time,
    'zarr': zarr_stride_time,
    'tiff': tiff_stride_time,
}

## Section 5: Caching and Network Performance

Test TensorStore's caching capabilities - this is where it could shine for network drives.

In [None]:
print("Testing TensorStore caching behavior")
print("="*60)

# Open with different cache sizes
cache_sizes = [0, 10_000_000, 100_000_000, 500_000_000]  # 0, 10MB, 100MB, 500MB
cache_results = {}

for cache_size in cache_sizes:
    print(f"\nCache size: {cache_size/1e6:.0f} MB")

    # Open with cache
    dataset = ts.open({
        'driver': 'zarr',
        'kvstore': {'driver': 'file', 'path': str(ts_path)},
        'context': {
            'cache_pool': {'total_bytes_limit': cache_size}
        },
        'recheck_cached_data': False,  # Don't revalidate cached data
    }).result()

    # Read 100 random frames twice (second read should hit cache)
    frames = np.random.randint(0, test_shape[0], 100)

    # First read (cold cache)
    start = time.time()
    for idx in frames:
        _ = dataset[idx].read().result()
    first_read = time.time() - start

    # Second read (warm cache - same frames)
    start = time.time()
    for idx in frames:
        _ = dataset[idx].read().result()
    second_read = time.time() - start

    cache_results[cache_size] = {
        'first': first_read,
        'second': second_read,
        'speedup': first_read / second_read if second_read > 0 else 0,
    }

    print(f"  First read:  {first_read:.3f}s")
    print(f"  Second read: {second_read:.3f}s")
    print(f"  Speedup:     {first_read/second_read:.1f}x")

print("\n" + "="*60)
print("CACHE PERFORMANCE SUMMARY")
print("="*60)
print(f"{'Cache Size':>15s}  {'Cold (s)':>10s}  {'Warm (s)':>10s}  {'Speedup':>10s}")
print("-"*60)
for cache_size, res in cache_results.items():
    print(f"{cache_size/1e6:>13.0f} MB  {res['first']:>10.3f}  {res['second']:>10.3f}  {res['speedup']:>9.1f}x")

## Section 6: Async I/O - Overlapping Computation and I/O

Demonstrate TensorStore's async capabilities for pipelining operations.

In [None]:
print("Comparing synchronous vs asynchronous I/O")
print("="*60)

# Simulate processing pipeline: read frame -> process -> next frame
num_frames = 200
frames_to_process = list(range(0, test_shape[0], test_shape[0]//num_frames)[:num_frames])

def process_frame(frame_data):
    """Simulate some processing (e.g., denoising, filtering)"""
    # Mean filter
    from scipy.ndimage import uniform_filter
    return uniform_filter(frame_data.astype(np.float32), size=3)

# ============================================================================
# Synchronous: Read -> Process -> Read -> Process ...
# ============================================================================
print("\nSynchronous pipeline (read-then-process)...")
dataset = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(ts_path)},
}).result()

start = time.time()
results_sync = []
for idx in tqdm(frames_to_process, desc="Sync"):
    frame = dataset[idx].read().result()  # Wait for read
    processed = process_frame(frame)       # Process
    results_sync.append(processed)
sync_time = time.time() - start
print(f"Synchronous:  {sync_time:.2f}s")

# ============================================================================
# Asynchronous: Read[i] while processing frame[i-1]
# ============================================================================
print("\nAsynchronous pipeline (overlapped I/O and compute)...")

start = time.time()
results_async = []

# Start first read
pending_read = dataset[frames_to_process[0]].read()

for i in tqdm(range(len(frames_to_process)), desc="Async"):
    # Wait for current read to complete
    current_frame = pending_read.result()

    # Start next read (if not last iteration)
    if i + 1 < len(frames_to_process):
        pending_read = dataset[frames_to_process[i + 1]].read()

    # Process current frame while next frame is loading
    processed = process_frame(current_frame)
    results_async.append(processed)

async_time = time.time() - start
print(f"Asynchronous: {async_time:.2f}s")
print(f"Speedup:      {sync_time/async_time:.2f}x")

# Verify results match
assert all(np.allclose(a, b) for a, b in zip(results_sync, results_async))
print("✓ Results match")

## Section 7: Virtual Views and Zero-Copy Operations

TensorStore can create virtual views without copying data.

In [None]:
print("Testing virtual views and transformations")
print("="*60)

dataset = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(ts_path)},
}).result()

print(f"Original dataset shape: {dataset.shape}")
print(f"Original dataset domain: {dataset.domain}")

# Create ROI view (no data copied)
roi_view = dataset[ts.d[0][100:200], ts.d[1][100:300], ts.d[2][100:300]]
print(f"\nROI view shape: {roi_view.shape}")
print(f"ROI view domain: {roi_view.domain}")

# Create downsampled view (every 2nd frame, every 2nd pixel)
downsampled = dataset[::2, ::2, ::2]
print(f"\nDownsampled shape: {downsampled.shape}")
print(f"Downsampled domain: {downsampled.domain}")

# Transpose view
transposed = dataset[ts.d['z', 'y', 'x'].transpose[2, 1, 0]]
print(f"\nTransposed shape: {transposed.shape}")

# All of these are virtual - no data loaded yet!
print("\nReading ROI view (triggers actual I/O)...")
start = time.time()
roi_data = roi_view.read().result()
print(f"ROI read time: {time.time() - start:.3f}s")
print(f"ROI data shape: {roi_data.shape}")

## Section 8: Integration with Your Workflow

How TensorStore could integrate with MboRawArray and existing pipelines.

In [None]:
print("Example: Converting raw TIFF to TensorStore Zarr")
print("="*60)

# Scenario: You have raw ScanImage TIFFs and want to convert to
# TensorStore-backed Zarr for faster access

# Read raw TIFF using your existing pipeline
print("\n1. Loading raw TIFF with MboRawArray...")
raw_data = imread(tiff_path)
print(f"   Shape: {raw_data.shape}")
print(f"   Has phase correction: {hasattr(raw_data, 'fix_phase')}")

# Create TensorStore Zarr
print("\n2. Creating TensorStore Zarr...")
output_path = tmpdir / "converted.zarr"

ts_output = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(output_path)},
    'metadata': {
        'compressor': {'id': 'blosc', 'cname': 'lz4', 'clevel': 5, 'shuffle': 1},
        'dtype': '<u2',
        'shape': list(raw_data.shape),
        'chunks': [100, 128, 128],
    },
    'create': True,
    'delete_existing': True,
}).result()

# Write in chunks (to avoid loading full array)
print("\n3. Writing chunks (with progress bar)...")
chunk_t = 100
start = time.time()

for t_start in tqdm(range(0, raw_data.shape[0], chunk_t), desc="Writing"):
    t_end = min(t_start + chunk_t, raw_data.shape[0])
    chunk = raw_data[t_start:t_end]
    ts_output[t_start:t_end].write(chunk).result()

convert_time = time.time() - start
print(f"\nConversion time: {convert_time:.2f}s")

# Verify
print("\n4. Verifying converted data...")
ts_verify = ts.open({
    'driver': 'zarr',
    'kvstore': {'driver': 'file', 'path': str(output_path)},
}).result()

# Check random frames
for _ in range(10):
    idx = np.random.randint(0, raw_data.shape[0])
    assert np.array_equal(
        ts_verify[idx].read().result(),
        raw_data[idx]
    )
print("✓ Conversion successful")

## Section 9: File Size Comparison

Compare storage efficiency.

In [None]:
import os

def get_size(path):
    """Get total size of file or directory."""
    if path.is_file():
        return path.stat().st_size
    total = 0
    for entry in path.rglob('*'):
        if entry.is_file():
            total += entry.stat().st_size
    return total

print("File sizes:")
print("="*60)
print(f"Original data:       {test_data.nbytes/1e6:8.1f} MB (uncompressed)")
print(f"TensorStore Zarr:    {get_size(ts_path)/1e6:8.1f} MB")
print(f"Standard Zarr:       {get_size(zarr_path)/1e6:8.1f} MB")
print(f"TIFF (LZW):          {get_size(tiff_path)/1e6:8.1f} MB")
print(f"MBO Zarr:            {get_size(mbo_zarr_path)/1e6:8.1f} MB")

## Section 10: Summary and Recommendations

### Performance Summary

Based on the benchmarks above, here's when to use each tool:

#### **Use TensorStore when:**

1. **Network storage with high latency**
   - The caching layer can provide significant speedups for repeated access
   - Async I/O helps overlap network latency with computation

2. **Complex access patterns**
   - Random frame access benefits from chunk-level caching
   - Virtual views allow efficient ROI extraction without copying

3. **Parallel processing pipelines**
   - Async API enables overlapping I/O and computation
   - Good for Suite2p/Suite3D preprocessing workflows

4. **Cloud storage integration**
   - Native GCS/S3 support for large-scale datasets
   - ACID transactions for safe multi-user access

#### **Stick with current tools when:**

1. **Local SSD storage**
   - TensorStore overhead may not be worth it
   - Current zarr/tifffile are already fast

2. **Raw TIFF processing**
   - MboRawArray has ScanImage-specific metadata handling
   - Phase correction is integrated
   - TensorStore doesn't understand TIFF metadata

3. **Simple workflows**
   - Sequential processing doesn't benefit from async
   - Standard zarr is simpler and well-understood

4. **GUI real-time display**
   - Both are similar for single-frame access
   - MboRawArray's frame caching might be better

### Recommended Hybrid Approach

```python
# Acquisition: Save as standard TIFF (existing workflow)
# - Keeps ScanImage metadata
# - MboRawArray handles phase correction

# Preprocessing: Convert to TensorStore Zarr for analysis
# - Apply phase correction once
# - Store in chunked Zarr via TensorStore
# - Benefits from caching during Suite2p/Suite3D

# Analysis: Use TensorStore for network-based processing
# - Faster random access with cache
# - Async I/O for pipelines
# - Safe multi-user access
```

### Disadvantages of TensorStore

1. **Additional dependency**: Another library to maintain
2. **Learning curve**: More complex API than simple zarr
3. **Overhead**: May be slower than zarr for local SSD
4. **Limited format support**: No native TIFF support
5. **Memory**: Cache can use significant RAM

### Advantages of TensorStore

1. **Caching**: Automatic chunk-level caching
2. **Async I/O**: Overlap I/O with computation
3. **Virtual views**: Zero-copy operations
4. **Cloud support**: Native GCS/S3 integration
5. **Transactions**: ACID guarantees for safe writes
6. **Performance**: C++ backend is very fast

In [None]:
# Clean up temporary files
import shutil
shutil.rmtree(tmpdir)
print(f"Cleaned up {tmpdir}")