## Running CellART on VisiumHD colorectal cancer dataset

### Download data

The VisiumHD colorectal cancer dataset can be obtained from the 10x Genomics website [here](https://www.10xgenomics.com/cn/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-colorectal-cancer-tissue), with name “Visium HD, Sample P2 CRC”. Below is a demo script for create new data dir and download the required VisiumHD files. 

In [None]:
mkdir ./visiumhd_crc
cd ./visiumhd_crc

curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_tissue_image.btf
curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_alignment_file.json
curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_binned_outputs.tar.gz
curl -O https://cf.10xgenomics.com/samples/spatial-exp/3.0.0/Visium_HD_Human_Colon_Cancer_P2/Visium_HD_Human_Colon_Cancer_P2_spatial.tar.gz

# Unzip files
tar -xzvf Visium_HD_Human_Colon_Cancer_P2_binned_outputs.tar.gz
tar -xzvf Visium_HD_Human_Colon_Cancer_P2_spatial.tar.gz

# Back to root dir
cd ..

After unzip the file, you will get binned_outputs and spatial directory. The paired scRNA reference after selecting patient 2 can be download [here](https://drive.google.com/file/d/1kzNZq7h4V-JyaBcjJ1Kcz-JSlLFNAQrY/view?usp=drive_link). Please also download the reference file adata_sc_p2.h5ad into the data directory. Now you have prepared all the raw data to run CellART.

### Preprocess

In [None]:
from cellart.utils.preprocess import SingleCellPreprocessor, VisiumHDPreprocessor
from cellart.utils.io import load_list
import scanpy as sc

# Processed data save dir
save_dir = './preprocessed_visiumhd_crc/'
# Path to 002um spot data
path = "./visiumhd_crc/binned_outputs/square_002um/"
# Path to he
source_image_path = "./visiumhd_crc/Visium_HD_Human_Colon_Cancer_P2_tissue_image.btf"
# Path to spatial dir
spaceranger_image_path = "./visiumhd_crc/spatial/"

st_preprocessor = VisiumHDPreprocessor(path, source_image_path, spaceranger_image_path, save_dir)
st_preprocessor.get_nuclei_segmentation()
sc_adata = sc.read("./visiumhd_crc/adata_sc_p2.h5ad")
sc_preprocessor = SingleCellPreprocessor(sc_adata, celltype_col = "celltype", save_path= save_dir, st_gene_list=load_list(save_dir + "/st_gene_list.txt"))
sc_preprocessor.preprocess(hvg_method="seurat_v3", n_hvg=3000)
st_preprocessor.prepare_sst(load_list(save_dir + "/filtered_gene_names.txt"))

Now in the preprocessed_crc directory, you can see all the preprocessed files. You can check the spatial and segmentation files to see if their are matched.

In [None]:
# Check
import numpy as np
import matplotlib.pyplot as plt

gene_map = np.load(save_dir + "/gene_map.npy")
segmentation_mask = np.load(save_dir + "/segmentation_mask.npy")

gene_map_sum = gene_map.sum(axis=-1)

In [None]:
# plt.imshow(gene_map_sum)
# plt.imshow(segmentation_mask > 0)
fig, ax = plt.subplots(1,2, figsize=(12,5))
ax[0].imshow(gene_map_sum)
ax[0].set_title("Gene expression map sum")
ax[1].imshow(segmentation_mask > 0)
ax[1].set_title("Nuclei segmentation mask")
plt.show()

### Running CellART

NOTE: these part code make takes hours to run, so it is highly recommend you not to directly run in the notebook.

In [None]:
import cellart
from pathlib import Path
import wandb
import os

# Preprocessed data
save_dir = './preprocessed_visiumhd_crc/'
# Directory to store all results
log_dir = "./results_visiumhd_crc/"

manager = cellart.ExperimentManager(
    # Basic input data settings (must be specified)
    gene_map=os.path.join(save_dir, "gene_map.npy"),
    nuclei_mask=os.path.join(save_dir, "segmentation_mask.npy"),
    basis=os.path.join(save_dir, "basis.npy"),
    gene_names=os.path.join(save_dir, "filtered_gene_names.txt"),
    celltype_names=os.path.join(save_dir, "celltype_names.txt"),
    log_dir=log_dir,

    # Training parameters (adjust based on convergence and wandb visualization)
    epoch=200, 
    seg_training_epochs=10,
    deconv_warmup_epochs=100,

    pred_period=50,
    gpu="0"
)

# Update options
opt = manager.get_opt()
print(opt)

In [None]:
# Set up wandb for logging and visualization
run = wandb.init(project="CellART", dir=manager.get_log_dir(), config=opt,
                 name=os.path.basename(os.path.normpath(manager.get_log_dir())))

In [None]:
# Set up dataset
dataset = cellart.SSTDataset(manager)
gene_map_shape = dataset.gene_map.shape

# Initialize and train the CellART model
model = cellart.CellARTModel(manager, gene_map_shape, len(dataset.coords_starts))
model.train_model(dataset)

### Check the output of CellART