# Run *SDePER* on sequencing-based simulated data: Scenario 1 + scRNA-seq data as reference + WITHOUT spatial correlation constraint

In this Notebook we run ablation test on SDePER on simulated data. For generating **sequencing-based** simulated data via coarse-graining procedure please refer [generate_simulated_spatial_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c963d08f74f4591c2ef6f132177795297793d878/Simulation_seq_based/Generate_simulation_data/generate_simulated_spatial_data.nb.html) in [Generate_simulation_data](https://github.com/az7jh2/SDePER_Analysis/tree/main/Simulation_seq_based/Generate_simulation_data) folder.

**Scenario 1** means the reference data for deconvolution includes all single cells with the **matched 12 cell types**.

**scRNA-seq data as reference** means the reference data is another scRNA-seq data ([GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746)) from the same tissue with simulated spatial data, therefore **platform effect exists**.

**WITHOUT spatial correlation constraint** means we fit the graph Laplacian regularized model the spatial correlation constraint, essentially removing Laplacian regularization (by setting the command option `--lambda_g` to 0).

==================================================================================================================

So here we use the **4 input files** as shown below:

!!! NOTE here we directly use CVAE transformed spatial transcriptomic data and scRNA-seq marker gene profile as input, which can be found in [diagnosis file](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_WITH_CVAE_diagnosis.tar) generated in baseline run [S1_ref_scRNA_SDePER_WITH_CVAE.ipynb](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_WITH_CVAE.ipynb) !!!

1. nUMI-like counts of transformed spatial transcriptomic data (spots × genes): `spatial_spots_transformToscRNA_decoded.csv`
2. reference cell type specific marker gene expression (cell types × genes): modified `scRNA_decoded_avg_exp_bycelltypes.csv` to include only selected marker genes in `redo_DE_celltype_markers.csv`
4. adjacency matrix of spots in simulated spatial transcriptomic data (spots × spots): [sim_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_adjacency_matrix.csv)

==================================================================================================================

SDePER settings are the same as baseline run [S1_ref_scRNA_SDePER_WITH_CVAE.ipynb](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_WITH_CVAE.ipynb), and we discarded unneeded command-line options further disabled the additive platform effect term by manually adjusted the source code:

* number of used CPU cores `n_core`: 64

ALL other options are left as default.

**For ablation test, set hyper-parameter for Graph Laplacian Constrain `lambda_g` as 0.**

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q spatial_spots_transformToscRNA_decoded.csv -m scRNA_decoded_avg_exp_bycelltypes.csv -a sim_spatial_spot_adjacency_matrix.csv -n 64 --lambda_g 0`

Note this Notebook uses **SDePER v1.2.1**. Cell type deconvolution result is renamed as [S1_ref_scRNA_SDePER_Ablation_NO_Laplacian_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Ablation/Ablation_simulation_seq_based/S1_ref_scRNA_SDePER_Ablation_NO_Laplacian_celltype_proportions.csv).

In [1]:
# modify cell type specific marker gene expression profile to included only selected markers
import pandas as pd

marker_df = pd.read_csv('scRNA_decoded_avg_exp_bycelltypes.csv', index_col=0)
print(f'got {marker_df.shape[1]} genes for {marker_df.shape[0]} cell types')

de_gene_df = pd.read_csv('redo_DE_celltype_markers.csv')
de_gene_list = list(de_gene_df.loc[de_gene_df['selected']==1, 'gene'].unique())
print(f'got {len(de_gene_list)} marker genes for downstream analysis')

marker_df[de_gene_list].to_csv('scRNA_decoded_avg_exp_bycelltypes.csv')

got 829 genes for 12 cell types
got 568 marker genes for downstream analysis


In [2]:
import subprocess

cmd = '''runDeconvolution -q spatial_spots_transformToscRNA_decoded.csv \
                          -m scRNA_decoded_avg_exp_bycelltypes.csv \
                          -a sim_spatial_spot_adjacency_matrix.csv \
                          -n 64 \
                          --lambda_g 0
'''

subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.2.1


running options:
spatial_file: /home/exouser/Spatial/spatial_spots_transformToscRNA_decoded.csv
ref_file: None
ref_celltype_file: None
marker_file: /home/exouser/Spatial/scRNA_decoded_avg_exp_bycelltypes.csv
loc_file: None
A_file: /home/exouser/Spatial/sim_spatial_spot_adjacency_matrix.csv
n_cores: 64
threshold: 0
use_cvae: False
use_imputation: False
diagnosis: False
verbose: True
use_fdr: True
p_val_cutoff: 0.05
fc_cutoff: 1.2
pct1_cutoff: 0.3
pct2_cutoff: 0.1
sortby_fc: True
n_marker_per_cmp: 20
filter_cell: True
filter_gene: True
n_hv_gene: 200
n_pseudo_spot: 500000
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 8
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.01
num_hidden_layer: 2
use_batch_norm: True
cvae_train_epoch: 500
use_spatial_pseudo: False
redo_de: True
seed: 383
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: 0.0
diameter: 200
impute_diameter: [160, 114

CompletedProcess(args='runDeconvolution -q spatial_spots_transformToscRNA_decoded.csv                           -m scRNA_decoded_avg_exp_bycelltypes.csv                           -a sim_spatial_spot_adjacency_matrix.csv                           -n 64                           --lambda_g 0\n', returncode=0)