# Xenium to Zarr Conversion Notebook

This notebook converts all the Xenium datasets to the Zarr format. The Zarr format is a condensed version of the spatial dataset that allows for efficient loading of `SpatialData` objects for analysis.

**Overview:**
- **Input:** Xenium dataset directory
- **Output:** Zarr file

In [1]:
import shutil
from pathlib import Path
from tqdm.auto import tqdm
from spatialdata_io import xenium

--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy, cupy-cuda12x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------

  import cupy

stdout:



stderr:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/opt/conda/lib/python3.11/site-packages/numba/cuda/cudadrv/driver.py", line 295, in __getattr__
    raise CudaSuppor

## Function: `convert_to_zarr`

This function performs the conversion from a Xenium dataset to a Zarr file.

**Parameters:**
- `input_dir` (str): Path to the input Xenium dataset directory.
- `output_file` (str): Path where the output Zarr file will be stored.

**Process:**
1. Reads the Xenium dataset using `spatialdata_io.xenium`.
2. Checks if the target Zarr directory exists and removes it if so.
3. Writes the `SpatialData` object to the Zarr backing store.
4. Prints confirmation of the saved data.


In [2]:
def convert_to_zarr(input_dir, output_file):
    """
    Convert a Xenium dataset to Zarr format using spatialdat_io xenium function.
    Zarr format is a condensed version of the spatial dataset and allows us to load in SpatialData objects for analysis.
    Args:
        input_dir (str): Path to the input Xenium dataset directory.
        output_file (str): Path to the output Zarr file.

    """
    input_path = Path(input_dir)
    output_path = Path(output_file)

    sdata = xenium(input_path)

    if output_path.exists():
        shutil.rmtree(output_path)
    sdata.write(output_path)

    print(f"Saved merscope data:\n{sdata}")

## Iterating Through the Datasets

This section of the notebook iterates through all the directories in the specified input path. It:
- Extracts a name from the directory name.
- Generates an output path for the corresponding Zarr file.
- Calls the `convert_to_zarr` function for each dataset.

In [3]:
# Iterate through all directories in the input directory and converts datsets to Zarr format
for input_dir in tqdm(Path("/tscc/projects/ps-yeolab5/share/CNT/Naga/synapse/data/").glob("20240814__184334__240814_MEI_YEOLAB_MsCoronal_Sag/*")):
    name = input_dir.name.split('_')[4]
    output_file = Path(r"/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/") / f"{name}.zarr"
    convert_to_zarr(input_dir, output_file)

0it [00:00, ?it/s]

[34mINFO    [0m reading                                                                                                   
         [35m/tscc/projects/ps-yeolab5/share/CNT/Naga/synapse/data/20240814__184334__240814_MEI_YEOLAB_MsCoronal_Sag/ou[0m
         [35mtput-XETG00224__0040870__coronal2__20240814__184433/[0m[95mcell_feature_matrix.h5[0m                                


  sdata = xenium(input_path)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/[0m[95mcoronal2.zarr[0m       
Saved merscope data:
SpatialData object, with associated Zarr store: /tscc/projects/ps-yeolab3/bay001/karen_synapse_20240529/permanent_data/dylan_work/sdata/coronal2.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (5, 23975, 36993), (5, 11987, 18496), (5, 5993, 9248), (5, 2996, 4624), (5, 1498, 2312)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (23975, 36993), (11987, 18496), (5993, 9248), (2996, 4624), (1498, 2312)
│     └── 'nucleus_labels': DataTree[yx] (23975, 36993), (11987, 18496), (5993, 9248), (2996, 4624), (1498, 2312)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (81375, 1) (2D shapes)
│     ├── '

  sdata = xenium(input_path)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/[0m[95mcoronal3.zarr[0m       
Saved merscope data:
SpatialData object, with associated Zarr store: /tscc/projects/ps-yeolab3/bay001/karen_synapse_20240529/permanent_data/dylan_work/sdata/coronal3.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (5, 30759, 39849), (5, 15379, 19924), (5, 7689, 9962), (5, 3844, 4981), (5, 1922, 2490)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (30759, 39849), (15379, 19924), (7689, 9962), (3844, 4981), (1922, 2490)
│     └── 'nucleus_labels': DataTree[yx] (30759, 39849), (15379, 19924), (7689, 9962), (3844, 4981), (1922, 2490)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (137686, 1) (2D shapes)
│     ├── 

  sdata = xenium(input_path)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/[0m[95msagittal1.zarr[0m      
Saved merscope data:
SpatialData object, with associated Zarr store: /tscc/projects/ps-yeolab3/bay001/karen_synapse_20240529/permanent_data/dylan_work/sdata/sagittal1.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (5, 47662, 28550), (5, 23831, 14275), (5, 11915, 7137), (5, 5957, 3568), (5, 2978, 1784)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (47662, 28550), (23831, 14275), (11915, 7137), (5957, 3568), (2978, 1784)
│     └── 'nucleus_labels': DataTree[yx] (47662, 28550), (23831, 14275), (11915, 7137), (5957, 3568), (2978, 1784)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (134957, 1) (2D shapes)
│     

  sdata = xenium(input_path)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/[0m[95msagittal2.zarr[0m      
Saved merscope data:
SpatialData object, with associated Zarr store: /tscc/projects/ps-yeolab3/bay001/karen_synapse_20240529/permanent_data/dylan_work/sdata/sagittal2.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (5, 27309, 39845), (5, 13654, 19922), (5, 6827, 9961), (5, 3413, 4980), (5, 1706, 2490)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (27309, 39845), (13654, 19922), (6827, 9961), (3413, 4980), (1706, 2490)
│     └── 'nucleus_labels': DataTree[yx] (27309, 39845), (13654, 19922), (6827, 9961), (3413, 4980), (1706, 2490)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (72366, 1) (2D shapes)
│     ├── 

  sdata = xenium(input_path)


[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path:                                      
         [35m/tscc/nfs/home/bay001/projects/karen_synapse_20240529/permanent_data/dylan_work/sdata/[0m[95mcoronal1.zarr[0m       
Saved merscope data:
SpatialData object, with associated Zarr store: /tscc/projects/ps-yeolab3/bay001/karen_synapse_20240529/permanent_data/dylan_work/sdata/coronal1.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (5, 30735, 42643), (5, 15367, 21321), (5, 7683, 10660), (5, 3841, 5330), (5, 1920, 2665)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (30735, 42643), (15367, 21321), (7683, 10660), (3841, 5330), (1920, 2665)
│     └── 'nucleus_labels': DataTree[yx] (30735, 42643), (15367, 21321), (7683, 10660), (3841, 5330), (1920, 2665)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (118618, 1) (2D shapes)
│     ├