Space

Reconciling Multiple Spatial Domain Identification Algorithms via Consensus Clustering

1. Introduction

Space is a spatial domain identification method from spatially resolved transcriptomics (SRT) data using consensus clustering. It integrates 10 SOTA algorithms. Space selects reliable algorithms by measuring their consistency. Then, it constructs a consensus matrix to integrate the outputs from multiple algorithms. We introduce similarity loss, spatial loss, and low-rank loss in Space to enhance accuracy and optimize computational efficiency.

Space Workflow

The integrated methods:

The organization of this repository file is as follows:

Space/  
├── Data/                    # data for reproducibility
├── Demo/  
│   ├── Reference_Methods/   # the scripts of 10 SOTA algorithms
│   └── Reproduce_Scripts/   # the scripts for reproducing results in manuscript
├── Images/                  # images
├── Space/                   # Space source code
├── CITATION.cff             # CItation file
├── LICENSE                  # LICENSE file
├── setup.sh                 # Installation file
└── environment.yml          # conda/mamba environment file

2. Installation Tutorial

The deployment of Space requires a Linux/Unix machine with GPU. We recommend using conda/mamba and create a virtual environment to manage all the dependencies. If you did not install conda before, please install conda/mamba first.

We provide the environment file, allowing users to quickly deploy Space using the following command.

# clone or download this repository
git clone https://github.com/Honchkrow/Space

# enter the folder
cd Space

chmod +x setup.sh

# install environment using setup.sh based on conda or mamba
# The script will check the configure settings automatically.
bash setup.sh conda  # or "bash setup.sh mamba"

# activate environment
conda activate space  # or "mamba activate space"

# install bokeh and stlearn
pip install --no-deps bokeh==3.4.2 stlearn==0.4.12

Note: Space requires CUDA version 11.3.1. If your current CUDA version is inconsistent with this requirement, please refer to this tutorial to adjust it before proceeding with the installation.

Note: Please note that if there is already an environment named "space" in conda/mamba, it will lead to a failure due to name conflict. Be sure to resolve any naming issues with the environment in advance.

For common installation issues, please refer to FAQ.

3. How to use Space

In this section, we will use a SRT dataset to provide a detailed introduction to the functionalities of Space.

3.1 Preparing the Datasets

In the manuscript for Space, we present the results of Space on four different datasets. These datasets are:

Human breast cancer: This dataset contains 3,798 spots and 36,601 genes, along with 20 manually annotated regions.
Mouse hypothalamus: This dataset includes 5 tissue sections, with 8 regions manually annotated, and the number of spots varies from 5,488 to 5,926.
Mouse primary visual area: This dataset comprises three slices, containing 3390, 4491, and 3545 spots, respectively, for a total of 79 genes. Manual annotation was performed on six visual cortex layers, ranging from VISP_I to VISP_VI, as well as the white matter region (VISpwm).
Mouse visual cortex: This dataset includes three tissue sections_BZ5, BZ9, and BZ14_which have spot counts of 1049, 1053, and 1088, respectively, amounting to a total of 166 genes. Manual annotation was conducted for four regions.

We have prepared two types of data. The first type consists of results on four datasets, obtained using ten SOTA algorithms. This data is in CSV format, which allows users to quickly reproduce the results of the Space article. This data is already integrated into this repository, so users do not need to download it separately. The second type is the processed SRT data, which includes gene expression matrices, spatial location information, and H&E images. Due to the large size of this data, it cannot be uploaded to GitHub. Therefore, users will need to download it.

Download the processed SRT datasets (not mandatory)

To facilitate user access, we have uploaded the processed SRT datasets to Google Drive and BaiduYun. Users can directly download and use them.

To facilitate users in quickly reproducing our results, they can merge the extracted 'Data' folder with the 'Data' folder in the Space project. This can be done immediately after downloading and unzipping the files.The organization of this project will become:

# Only shows the BARISTASeq dataset.
# Mouse_hippocampus_MERFISH, SRARmap_pa and V1_Breast_Cancer_Block_A_Section_1 are the same.
Space/  
├── Data/
│   ├── BARISTASeq/
│   │   ├── BARISTASeq_Sun2021Integrating_Slice_1_data.h5ad
│   │   ├── BARISTASeq_Sun2021Integrating_Slice_2_data.h5ad
│   │   ├── BARISTASeq_Sun2021Integrating_Slice_3_data.h5ad
│   │   ├── result1.csv
│   │   ├── result2.csv
│   │   └── result3.csv
│   ├── Mouse_hippocampus_MERFISH/  # not show
│   ├── SRARmap_pa/  # not show
│   └── V1_Breast_Cancer_Block_A_Section_1/  # not show
├── Demo/  
│   ├── Reference_Methods/
│   └── Reproduce_Scripts/
├── Images/
├── Space/
└── Other Files

3.2 Performing Concensus Clsutering using Space (Reproducibility)

In this section, we will show how to perform the Clsutering using Space.

Also, to reproduce the results of our article, users can run the scripts in the Demo folder. The scripts are organized into two folders: Reference_Methods and Reproduce_Scripts. The Reference_Methods folder contains scripts for reproducing the results of the 10 SOTA algorithms. The Reproduce_Scripts folder contains scripts for reproducing the results of the Space.

3.2.1 Step-by-step Tutorial for Procesing Breast Cancer Dataset

Here, for quick illustration, we directly apply Space to the results obtained from 10 SOTA methods. These methods have already been executed. The scripts are saved in Reference_Methods folder. The results of these methods are saved in the Data folder.

First, load the necessary packages and set R environment.

Please note that in the code below, the R environment must be the one installed within Space. Users need to replace it according to the installation directory of Space.

import os
import scanpy as sc
import pandas as pd
import Space
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import SpectralClustering
from Space.cons_func import get_results, get_domains
from Space.utils import calculate_location_adj, plot_results_ari, get_bool_martix, plot_ari_with_removal

# Some methods need mclust.
# Please modify this path!
os.environ["R_HOME"] = "/home/zw/software/miniforge3/envs/space/lib/R"

Next, load the dataset.

# read the expression data
adata = sc.read_visium(
    path="./Data/V1_Breast_Cancer_Block_A_Section_1", 
    count_file="filtered_feature_bc_matrix.h5"
)

# read the metadata
Ann_df = pd.read_csv(
    "./Data/V1_Breast_Cancer_Block_A_Section_1/metadata.tsv",
    sep="\t",
    header=0,
    na_filter=False,
    index_col=0,
)
adata.var_names_make_unique()

# read the image representation
im_re = pd.read_csv(
    "./Data/V1_Breast_Cancer_Block_A_Section_1/image_representation/ViT_pca_representation.csv",
    header=0,
    index_col=0,
    sep=",",
)

# set variables
adata.obsm["im_re"] = im_re
adata.obs["gt"] = Ann_df["fine_annot_type"]
gt = adata.obs["gt"]

Then, set the parameters.

k = 20                   # number of clusters
epochs = 120             # epoch in training
seed = 666               # random seed
alpha = 1                # recommended value
learning_rate = 0.0001   # learning rate in training

Now, read the results from 10 SOTA methods. To quickly reproduce the results, we directly read the outcomes from 10 SOTA methods. The code for these methods can be found in the "/Demo/Reference_Methods/breast" folder.

mul_reults = pd.read_csv(
    "./Data/V1_Breast_Cancer_Block_A_Section_1/result.csv", 
    header=0, 
    index_col=0
)
mul_reults = mul_reults.iloc[:, 2:]

Next, we can observe the consistency between the results of different methods and discard the inconsistent methods.

# drop 2 methods that show poor consistency
mul_reults = plot_ari_with_removal(mul_reults, 2)