# Tutorial: Spatial clustering on 10x Visium (DLPFC dataset)

Here we present our re-analysis of 151675 sample of the dorsolateral prefrontal cortex (DLPFC) dataset. Maynard et al. has manually annotated DLPFC layers and white matter (WM) based on the morphological features and gene markers.

This tutorial demonstrates how to identify spatial domains on 10x Visium data using RGAST.

DLPFC data can be downloaded from [SpatialLIBD](https://github.com/LieberInstitute/HumanPilot/). Extract and put data within data/DLPFC folder.
Please notice that the scale_factors_json.json and tissue_positions_list.csv can be found in 10X folder in SpatialLIBD.
For convenient, we have put three files within data folder here. You are recommended to make your folder structure like this:

In [None]:
RGAST
 ├── data
   └── DLPFC
        └── 151507
              ├── filtered_feature_bc_matrix.h5
              ├── metadata.tsv 
              └── spatial
                     ├── scalefactors_json.json  
                     ├── tissue_positions_list.csv  
                     ├── full_image.tif  
                     ├── tissue_hires_image.png  
                     └── tissue_lowres_image.png  

## Preparation

In [2]:
import os,sys
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import warnings
import RGAST
warnings.filterwarnings("ignore")
from sklearn.metrics.cluster import adjusted_rand_score

## read data

In [3]:
adata = sc.read_visium(path='../data/DLPFC/151675', count_file='filtered_feature_bc_matrix.h5')
adata.var_names_make_unique()

### read metadata

In [None]:
df_meta = pd.read_csv(f'../data/DLPFC/151673/metadata.tsv', sep='\t')
adata.obs = adata.obs.join(df_meta)

## preprocessing

In [4]:
#preprocess
sc.pp.filter_genes(adata, min_cells=5)
sc.pp.normalize_total(adata, target_sum=1, exclude_highly_expressed=True)
sc.pp.scale(adata)
sc.pp.pca(adata, n_comps=200)

## Constructing gene expression similarity and spatial neighborhood relationships

In [None]:
RGAST.Cal_Spatial_Net(adata)
RGAST.Cal_Expression_Net(adata)

## Model traininig

In [None]:
train_RGAST = RGAST.Train_RGAST(adata)
# with early stopping
# train_RGAST.train_RGAST(label_key="layer_guess",save_path=dir_output,n_clusters=n_clusters)
# without early stopping
train_RGAST.train_RGAST(early_stopping=False, save_path='.', n_epochs=500)
train_RGAST.train_with_dec() #optional

### you can also use the model parameter we have trained in the study

In [None]:
train_RGAST.load_model(path='../model_path/DLPFC_151675.pth')
z, _ = train_RGAST.process()
adata.obsm['RGAST'] =  z.to('cpu').detach().numpy()

##  Clustering

In [14]:
import RGAST.utils.res_search_fixed_clus as res_search_fixed_clus
n_clusters = 7
sc.pp.neighbors(adata, use_rep='RGAST')
sc.tl.umap(adata)
_ = res_search_fixed_clus(adata, n_clusters)

In [15]:
adata.obs['refine'] = RGAST.refine_spatial_cluster(adata,adata.obs['leiden']) #optional
RGAST.plot_clustering(adata, "refine",title='151675')

## Evaluation

In [None]:
obs_df = adata.obs.dropna(subset='layer_guess')
ARI = adjusted_rand_score(obs_df['leiden'], obs_df['layer_guess'])
print('Adjusted rand index = %.2f' %ARI)