# Run *SDePER* on sequencing-based simulated data: Scenario 1 + scRNA-seq data as reference + NO platform effect removal

In this Notebook we run ablation test on SDePER on simulated data. For generating **sequencing-based** simulated data via coarse-graining procedure please refer [generate_simulated_spatial_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c963d08f74f4591c2ef6f132177795297793d878/Simulation_seq_based/Generate_simulation_data/generate_simulated_spatial_data.nb.html) in [Simulation_seq_based](https://github.com/az7jh2/SDePER_Analysis/tree/main/Simulation_seq_based) folder.

**Scenario 1** means the reference data for deconvolution includes all single cells with the **matched 12 cell types**.

**scRNA-seq data as reference** means the reference data is another scRNA-seq data ([GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746)) from the same tissue with simulated spatial data, therefore **platform effect exists**.

**NO platform effect removal** means we conduct cell type deconvolution **disregarding platform effect**, meaning that neither CVAE nor an additive gene-wise platform effect term is utilized.

==================================================================================================================

So here we use the **4 input files** as shown below:

1. raw nUMI counts of simulated spatial transcriptomic data (spots × genes): [sim_seq_based_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Generate_simulation_data/sim_seq_based_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `scRNA_data_full.csv`. Since the file size of csv file of raw nUMI matrix of all 23,178 cells and 45,768 genes is up to 2.3 GB, we do not provide this file in our repository. It's just a **matrix transpose** of [GSE115746_cells_exon_counts.csv.gz](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE115746&format=file&file=GSE115746%5Fcells%5Fexon%5Fcounts%2Ecsv%2Egz) in [GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746) to satisty the file format requirement that rows as cells and columns as genes.
3. cell type annotations for cells of **the matched 12 cell types** in reference scRNA-seq data (cells × 1): [ref_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/ref_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in simulated spatial transcriptomic data (spots × spots): [sim_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_adjacency_matrix.csv)

==================================================================================================================

SDePER settings are the same as baseline run [S1_ref_scRNA_SDePER_WITH_CVAE.ipynb](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_WITH_CVAE.ipynb), and we discarded unneeded command-line options:

* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 20
* number of used CPU cores `n_core`: 64

ALL other options are left as default.

**For ablation test, disable CVAE and additive platform effect term**.

Due to the absence of a command-line option to disable additive platform effect term, we manually adjusted the source code to disable it.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv -r scRNA_data_full.csv -c ref_scRNA_cell_celltype.csv -a sim_spatial_spot_adjacency_matrix.csv --n_marker_per_cmp 20 -n 64 --use_cvae false`

Note this Notebook uses **SDePER v1.2.1**. Cell type deconvolution result is renamed as [S1_ref_scRNA_SDePER_Ablation_NO_PlatEffRmv_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Ablation/Ablation_simulation_seq_based/S1_ref_scRNA_SDePER_Ablation_NO_PlatEffRmv_celltype_proportions.csv).

In [1]:
import subprocess

cmd = '''runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv \
                          -r scRNA_data_full.csv \
                          -c ref_scRNA_cell_celltype.csv \
                          -a sim_spatial_spot_adjacency_matrix.csv \
                          --n_marker_per_cmp 20 \
                          -n 64 \
                          --use_cvae false \
'''

subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.2.1


running options:
spatial_file: /home/exouser/Spatial/sim_seq_based_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/scRNA_data_full.csv
ref_celltype_file: /home/exouser/Spatial/ref_scRNA_cell_celltype.csv
marker_file: None
loc_file: None
A_file: /home/exouser/Spatial/sim_spatial_spot_adjacency_matrix.csv
n_cores: 64
threshold: 0
use_cvae: False
use_imputation: False
diagnosis: False
verbose: True
use_fdr: True
p_val_cutoff: 0.05
fc_cutoff: 1.2
pct1_cutoff: 0.3
pct2_cutoff: 0.1
sortby_fc: True
n_marker_per_cmp: 20
filter_cell: True
filter_gene: True
n_hv_gene: 200
n_pseudo_spot: 500000
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 8
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.01
num_hidden_layer: 2
use_batch_norm: True
cvae_train_epoch: 500
use_spatial_pseudo: False
redo_de: True
seed: 383
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0.268, 0.72, 1.931

CompletedProcess(args='runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv                           -r scRNA_data_full.csv                           -c ref_scRNA_cell_celltype.csv                           -a sim_spatial_spot_adjacency_matrix.csv                           --n_marker_per_cmp 20                           -n 64                           --use_cvae false ', returncode=0)