# Run *SDePER* on sequencing-based simulated data: Scenario 1 + Spatial data as reference + WITH CVAE

In this Notebook we run SDePER on simulated data. For generating **sequencing-based** simulated data via coarse-graining procedure please refer [generate_simulated_spatial_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c963d08f74f4591c2ef6f132177795297793d878/Simulation_seq_based/Generate_simulation_data/generate_simulated_spatial_data.nb.html) in [Simulation_seq_based](https://github.com/az7jh2/SDePER_Analysis/tree/main/Simulation_seq_based) folder.

**Scenario 1** means the reference data for deconvolution includes all single cells with the **matched 12 cell types**.

**Spatial data as reference** means the reference data is actually the [GSE102827](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102827) scRNA-seq data which is used to generate the simulated data, therefore it's **free of platform effect**.

**WITH CVAE** means we still use CVAE to remove platform effect although it's actually free of platform effect here.

==================================================================================================================

So here we use the **4 input files** as shown below:

1. raw nUMI counts of simulated spatial transcriptomic data (spots × genes): [sim_seq_based_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Generate_simulation_data/sim_seq_based_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference GSE102827 scRNA-seq data (cells × genes): `GSE102827_scRNA_cell_nUMI.csv`. Since the file size of csv file of raw nUMI matrix of all 65,539 cells and 25,187 genes is up to 3.1 GB, we do not provide this file in our repository. It's just a **matrix transpose** of [GSE102827_merged_all_raw.csv.gz](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE102827&format=file&file=GSE102827%5Fmerged%5Fall%5Fraw%2Ecsv%2Egz) in [GSE102827](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102827) to satisty the file format requirement that rows as cells and columns as genes.
3. cell type annotations for selected 2,002 cells used for simulated data generation in [GSE102827](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102827) scRNA-seq data (cells × 1): [GSE102827_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_spatial/GSE102827_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in simulated spatial transcriptomic data (spots × spots): [sim_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_adjacency_matrix.csv)

==================================================================================================================

SDePER settings are:

* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 20
* number of used CPU cores `n_core`: 64
* initial learning rate for training CVAE `cvae_init_lr`: 0.003
* number of hidden layers in encoder and decoder of CVAE `num_hidden_layer`: 1
* whether to use Batch Normalization `use_batch_norm`: false
* CVAE training epochs `cvae_train_epoch`: 1000
* for diagnostic purposes set `diagnosis`: true

ALL other options are left as default.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv -r GSE102827_scRNA_cell_nUMI.csv -c GSE102827_scRNA_cell_celltype.csv -a sim_spatial_spot_adjacency_matrix.csv --n_marker_per_cmp 20 -n 64 --cvae_init_lr 0.003 --num_hidden_layer 1 --use_batch_norm false --cvae_train_epoch 1000 --diagnosis true`

Note this Notebook uses **SDePER v1.2.1**. Cell type deconvolution result is renamed as [S1_ref_spatial_SDePER_WITH_CVAE_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_spatial/S1_ref_spatial_SDePER_WITH_CVAE_celltype_proportions.csv). Folder of diagnostic plots is compressed and renamed as [S1_ref_spatial_SDePER_WITH_CVAE_diagnosis.tar](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation_seq_based/Run_SDePER_on_simulation_data/Scenario_1/ref_spatial/S1_ref_spatial_SDePER_WITH_CVAE_diagnosis.tar).

In [1]:
import subprocess

cmd = '''runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv \
                          -r GSE102827_scRNA_cell_nUMI.csv \
                          -c GSE102827_scRNA_cell_celltype.csv \
                          -a sim_spatial_spot_adjacency_matrix.csv \
                          --n_marker_per_cmp 20 \
                          -n 64 \
                          --cvae_init_lr 0.003 \
                          --num_hidden_layer 1 \
                          --use_batch_norm false \
                          --cvae_train_epoch 1000 \
                          --diagnosis true
'''

subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.2.1


running options:
spatial_file: /home/exouser/Spatial/sim_seq_based_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/GSE102827_scRNA_cell_nUMI.csv
ref_celltype_file: /home/exouser/Spatial/GSE102827_scRNA_cell_celltype.csv
marker_file: None
loc_file: None
A_file: /home/exouser/Spatial/sim_spatial_spot_adjacency_matrix.csv
n_cores: 64
threshold: 0
use_cvae: True
use_imputation: False
diagnosis: True
verbose: True
use_fdr: True
p_val_cutoff: 0.05
fc_cutoff: 1.2
pct1_cutoff: 0.3
pct2_cutoff: 0.1
sortby_fc: True
n_marker_per_cmp: 20
filter_cell: True
filter_gene: True
n_hv_gene: 200
n_pseudo_spot: 500000
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 8
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.003
num_hidden_layer: 1
use_batch_norm: False
cvae_train_epoch: 1000
use_spatial_pseudo: False
redo_de: True
seed: 383
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0

CompletedProcess(args='runDeconvolution -q sim_seq_based_spatial_spot_nUMI.csv                           -r GSE102827_scRNA_cell_nUMI.csv                           -c GSE102827_scRNA_cell_celltype.csv                           -a sim_spatial_spot_adjacency_matrix.csv                           --n_marker_per_cmp 20                           -n 64                           --cvae_init_lr 0.003                           --num_hidden_layer 1                           --use_batch_norm false                           --cvae_train_epoch 1000                           --diagnosis true\n', returncode=0)