# Run Ablation Test on *SDePER* on simulated data: Scenario 1 + scRNA-seq data as reference + NO platform effect removal

In this Notebook we run **ablation test** on SDePER on simulated data. For generating simulated data via coarse-graining procedure please refer [generate_simulated_spatial_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c60dcb036816bd61b5a8b3752d473a5b591b52b6/Simulation/Generate_simulation_data/generate_simulated_spatial_data.nb.html).

**Scenario 1** means the reference data for deconvolution includes all single cells with the **matched 12 cell types**.

**scRNA-seq data as reference** means the reference data is scRNA-seq data ([GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746)) from the same tissue with simulated spatial data, therefore **platform effect exists**.

**NO platform effect removal** means we conduct cell type deconvolution **disregarding platform effect**, meaning that neither CVAE nor an additive gene-wise platform effect term is utilized.

==================================================================================================================

So here we use the **4 input files** as shown below:

1. raw nUMI counts of simulated spatial transcriptomic data (spots × genes): [sim_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `scRNA_data_full.csv`. Since the file size of csv file of raw nUMI matrix of all 23,178 cells and 45,768 genes is up to 2.3 GB, we do not provide this file in our repository. It's just a **matrix transpose** of [GSE115746_cells_exon_counts.csv.gz](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE115746&format=file&file=GSE115746%5Fcells%5Fexon%5Fcounts%2Ecsv%2Egz) in [GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746) to satisty the file format requirement that rows as cells and columns as genes.
3. cell type annotations for cells of **the matched 12 cell types** in reference scRNA-seq data (cells × 1): [ref_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/ref_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in simulated spatial transcriptomic data (spots × spots): [sim_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_adjacency_matrix.csv)

==================================================================================================================

SDePER settings are the same as baseline run [S1_ref_scRNA_SDePER_WITH_CVAE.ipynb](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_WITH_CVAE.ipynb):

* number of included highly variable genes `n_hv_gene`: 500
* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 50
* seed for random values `seed`: 2
* number of used CPU cores `n_core`: 64

ALL other options are left as default.

**For ablation test, disable CVAE and additive platform effect term**.

Due to the absence of a command-line option to disable additive platform effect term, we manually adjusted the source code to disable it.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q sim_spatial_spot_nUMI.csv -r scRNA_data_full.csv -c ref_scRNA_cell_celltype.csv -a sim_spatial_spot_adjacency_matrix.csv --n_hv_gene 500 --n_marker_per_cmp 50 --seed 2 -n 64 --use_cvae false`

Note this Notebook uses **SDePER v1.0.0**. Cell type deconvolution result is renamed as [S1_ref_scRNA_SDePER_Ablation_NO_PlatEffRmv_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Ablation/Ablation_simulation_data/S1_ref_scRNA_SDePER_Ablation_NO_PlatEffRmv_celltype_proportions.csv).

In [1]:
import subprocess

cmd = '''runDeconvolution -q sim_spatial_spot_nUMI.csv \
                          -r scRNA_data_full.csv \
                          -c ref_scRNA_cell_celltype.csv \
                          -a sim_spatial_spot_adjacency_matrix.csv \
                          --n_hv_gene 500 \
                          --n_marker_per_cmp 50 \
                          --seed 2 \
                          -n 64 \
                          --use_cvae false
'''

subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.0.0


running options:
spatial_file: /home/exouser/Spatial/sim_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/scRNA_data_full.csv
ref_celltype_file: /home/exouser/Spatial/ref_scRNA_cell_celltype.csv
marker_file: None
loc_file: None
A_file: /home/exouser/Spatial/sim_spatial_spot_adjacency_matrix.csv
n_cores: 64
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
use_cvae: False
threshold: 0
n_hv_gene: 500
n_marker_per_cmp: 50
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 8
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.003
redo_de: True
seed: 2
diagnosis: False
verbose: True
use_imputation: False
diameter: 200
impute_diameter: [160, 114, 80]


######### Preprocessing... #########

######### Building CVAE skipped... #########

read spatial data from file /home/exouser/Spatial/sim_spatial_spot_nUMI.csv
total 581 spots;

    33 |      0.073 |    120.632 |      0.239 |      3.268 |    4096.00 |    8192.00 |    8.701 |    0.000 |    0.003 |   0.001050 |   0.000525
    34 |      0.062 |    144.878 |      0.263 |      3.576 |    8192.00 |    8192.00 |    8.779 |    0.000 |    0.003 |   0.000872 |   0.000436
    35 |      0.052 |    158.991 |      0.277 |      3.839 |    8192.00 |    8192.00 |    8.141 |    0.000 |    0.003 |   0.000713 |   0.000357
    36 |      0.046 |    139.617 |      0.258 |      4.059 |    8192.00 |   16384.00 |    7.660 |    0.000 |    0.003 |   0.000617 |   0.000309
    37 |      0.039 |    167.556 |      0.286 |      4.463 |   16384.00 |   16384.00 |    7.893 |    0.000 |    0.003 |   0.000515 |   0.000258
    38 |      0.033 |    201.736 |      0.320 |      4.808 |   16384.00 |   16384.00 |    7.423 |    0.000 |    0.003 |   0.000422 |   0.000211
    39 |      0.028 |    180.808 |      0.299 |      5.086 |   16384.00 |   32768.00 |    6.670 |    0.000 |    0.003 |   0.000354 |   0

CompletedProcess(args='runDeconvolution -q sim_spatial_spot_nUMI.csv                           -r scRNA_data_full.csv                           -c ref_scRNA_cell_celltype.csv                           -a sim_spatial_spot_adjacency_matrix.csv                           --n_hv_gene 500                           --n_marker_per_cmp 50                           --seed 2                           -n 64                           --use_cvae false\n', returncode=0)