# 10X Visium HD Preprocessing

rakaia supports the visualization and anlaysis of spot and bin-based spatial transcriptomic technologies such as 10X Visium. The preprocessing steps required for Visium HD datasets differ slightly from 10X Visium V1 and V2 data, as the datasets are binned and exported at different micron resolutions. Here we demonstrate how to extract different micron resolution files for visualization in rakaia. 

In [2]:
from spatialdata_io import visium_hd
import spatialdata as sd
import os
import warnings
warnings.filterwarnings('ignore')

input_dir = "/home/admin/rakaia/visium/hd/"
zarr_path = os.path.join(input_dir, 'zarr')

We will use the mouse small intestine Visium HD dataset that is publicly available from 10X Genomics [here](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine). Using the link provided, select **Batch download**, and download all of the files shown to the desired input directory. Once the download has complete, the *tar.gz* should be extracted into the input directory, producing a sub-directory named **binned_outputs**. 

Next, you should specify the experiment name variable below by checking the anallysis summary HTML or related output files. For example, the summary [here](https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Mouse_Small_Intestine/Visium_HD_Mouse_Small_Intestine_web_summary.html) shows the dataset name in the title. 

In [3]:
dataset_name = "Visium_HD_Mouse_Small_Intestine"

In [5]:
try:
    sdata = visium_hd(input_dir)
    sdata.write(zarr_path)
except ValueError:
    pass
sdata = sd.read_zarr(zarr_path)
sdata

SpatialData object, with associated Zarr store: /home/admin/rakaia/visium/hd/zarr
├── Images
│     ├── 'Visium_HD_Mouse_Small_Intestine_hires_image': DataArray[cyx] (3, 5575, 6000)
│     └── 'Visium_HD_Mouse_Small_Intestine_lowres_image': DataArray[cyx] (3, 558, 600)
├── Shapes
│     ├── 'Visium_HD_Mouse_Small_Intestine_square_002um': GeoDataFrame shape: (5479660, 1) (2D shapes)
│     ├── 'Visium_HD_Mouse_Small_Intestine_square_008um': GeoDataFrame shape: (351817, 1) (2D shapes)
│     └── 'Visium_HD_Mouse_Small_Intestine_square_016um': GeoDataFrame shape: (91033, 1) (2D shapes)
└── Tables
      ├── 'square_002um': AnnData (5479660, 19059)
      ├── 'square_008um': AnnData (351817, 19059)
      └── 'square_016um': AnnData (91033, 19059)
with coordinate systems:
    ▸ 'downscaled_hires', with elements:
        Visium_HD_Mouse_Small_Intestine_hires_image (Images), Visium_HD_Mouse_Small_Intestine_square_002um (Shapes), Visium_HD_Mouse_Small_Intestine_square_008um (Shapes), Visium_HD_Mouse_

As seen above, the Visium HD spatialdata object contains a variety of images, shapes, and coordinate systems. For rakaia visualization, the Anndata tables will be scaled and exported. 

By default, Visium HD datasets summarize expression levels at three different micron "bin" resolutions: 2, 8, and 16 microns. Smaller bin sizes produce higher resolution datasets that are closer to the original image resolution, but are slower to visualize in rakaia. It is recommended to use either 8 or 16 micron resolutions in rakaia for performance purposes. 

Below, we will extract the feature objects needed for visualization and scale the spatial coordinates by the micron resolution. This is similar to the procedure described in spatialdata [here](https://spatialdata.scverse.org/en/latest/tutorials/notebooks/notebooks/examples/technlology_visium_hd.html#performant-on-the-fly-data-rasterization), where rasterizing the expressoin for each bin level produces an image frame that can be queried. We will do this inside the Anndata object to produce an output whose size and resolution is scaled to the bin resolution. 

In [7]:
for bin_size in [2, 8, 16]:
    extra_zero = "0" if bin_size < 16 else ""
    adata = sdata.tables[f"square_0{extra_zero}{bin_size}um"]
    # sdata.tables[f"square_0{extra_zero}{bin_size}um"].X = (
    #     sdata.tables[f"square_0{extra_zero}{bin_size}um"].X.tocsc())
    # rasterized = sd.rasterize_bins(
    #             sdata,
    #             f"{experiment_name}_square_0{extra_zero}{bin_size}um",
    #             f"square_0{extra_zero}{bin_size}um",
    #             "array_col",
    #             "array_row",
    #         )
    # # for now, don't use the rasterized xarray frames as they are too large to cast to sparse numpy
    # sdata[f"rasterized_{bin_size}um"] = rasterized
    adata.obsm['spatial'] = adata.obsm['spatial'] / float(bin_size)
    adata.var_names_make_unique()
    adata.uns["scaling_visium_hd"] = int(bin_size)
    adata.write_h5ad(os.path.join(input_dir, "anndata", f"{dataset_name}_{bin_size}um.h5ad"))

**IMPORTANT**: The `scaling_visium_hd` key in the uns slot is required to align the h5ad expression values to whole slide images (WSI) such as H & E in rakaia. 

The outputs are three h5ad files containing the scaled expression for each bin level as separate input files. When imported into rakaia, it can be seen that each file has different dimensions, with the largest bin size (16) generating the smallest dimensions. 