# Run *SDePER* on simulated data with downsampled reference: Scenario 1 + scRNA-seq data as reference + NO CVAE

In this Notebook we run SDePER on simulated data **with downsampled reference**. For generating simulated data via coarse-graining procedure please refer [generate_simulated_spatial_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c60dcb036816bd61b5a8b3752d473a5b591b52b6/Simulation/Generate_simulation_data/generate_simulated_spatial_data.nb.html). For generating reference dataset with downsampled Astro cells please refer [generate_downsampled_ref_data.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/5dc5cf8a6d97237304017c260f96ed0d3e41cb51/Simulation/Generate_downsampled_ref_data/generate_downsampled_ref_data.nb.html).

**Scenario 1** means the reference data for deconvolution includes all single cells with the **matched 12 cell types**.

**scRNA-seq data as reference** means the reference data is scRNA-seq data ([GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746)) from the same tissue with simulated spatial data, therefore **platform effect exists**.

**NO CVAE** means we DO NOT use CVAE to remove platform effect although platform effect exists here.

==================================================================================================================

So here we use the **4 input files** as shown below:

1. raw nUMI counts of simulated spatial transcriptomic data (spots × genes): [sim_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `scRNA_data_full.csv`. Since the file size of csv file of raw nUMI matrix of all 23,178 cells and 45,768 genes is up to 2.3 GB, we do not provide this file in our repository. It's just a **matrix transpose** of [GSE115746_cells_exon_counts.csv.gz](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE115746&format=file&file=GSE115746%5Fcells%5Fexon%5Fcounts%2Ecsv%2Egz) in [GSE115746](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115746) to satisty the file format requirement that rows as cells and columns as genes
3. cell type annotations for **downsampled** cells in reference scRNA-seq data (cells × 1): [scRNA_cell_annotation_Astro_5cells.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data_with_downsampled_ref/Astro/Scenario_1/ref_scRNA_seq/scRNA_cell_annotation_Astro_5cells.csv)
4. adjacency matrix of spots in simulated spatial transcriptomic data (spots × spots): [sim_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Generate_simulation_data/sim_spatial_spot_adjacency_matrix.csv)

==================================================================================================================

SDePER settings are the same as those used for the setting including all cells in reference dataset ([S1_ref_scRNA_SDePER_NO_CVAE.ipynb](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_NO_CVAE.ipynb)):

* number of included highly variable genes `n_hv_gene`: 500
* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 50
* seed for random values `seed`: 2
* number of used CPU cores `n_core`: 64
* **whether to use CVAE to remove platform effect `use_cvae`: false**

ALL other options are left as default.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q sim_spatial_spot_nUMI.csv -r scRNA_data_full.csv -c scRNA_cell_annotation_Astro_5cells.csv -a sim_spatial_spot_adjacency_matrix.csv --n_hv_gene 500 --n_marker_per_cmp 50 --seed 2 -n 64 --use_cvae false`

Note this Notebook uses **SDePER v1.0.0**. Cell type deconvolution result is renamed as [S1_ref_scRNA_SDePER_NO_CVAE_ref5Astro_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/Simulation/Run_SDePER_on_simulation_data_with_downsampled_ref/Astro/Scenario_1/ref_scRNA_seq/S1_ref_scRNA_SDePER_NO_CVAE_ref5Astro_celltype_proportions.csv).

In [1]:
import subprocess

cmd = '''runDeconvolution -q sim_spatial_spot_nUMI.csv \
                          -r scRNA_data_full.csv \
                          -c scRNA_cell_annotation_Astro_5cells.csv \
                          -a sim_spatial_spot_adjacency_matrix.csv \
                          --n_hv_gene 500 \
                          --n_marker_per_cmp 50 \
                          --seed 2 \
                          -n 64 \
                          --use_cvae false
'''

subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.0.0


running options:
spatial_file: /home/exouser/Spatial/sim_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/scRNA_data_full.csv
ref_celltype_file: /home/exouser/Spatial/scRNA_cell_annotation_Astro_5cells.csv
marker_file: None
loc_file: None
A_file: /home/exouser/Spatial/sim_spatial_spot_adjacency_matrix.csv
n_cores: 64
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
use_cvae: False
threshold: 0
n_hv_gene: 500
n_marker_per_cmp: 50
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 8
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.003
redo_de: True
seed: 2
diagnosis: False
verbose: True
use_imputation: False
diameter: 200
impute_diameter: [160, 114, 80]


######### Preprocessing... #########

######### Building CVAE skipped... #########

read spatial data from file /home/exouser/Spatial/sim_spatial_spot_nUMI.csv
total

    29 |      0.139 |    133.235 |      0.251 |      2.628 |    2048.00 |    2048.00 |    8.664 |    0.000 |    0.003 |   0.002119 |   0.001060
    30 |      0.122 |    111.412 |      0.229 |      2.755 |    2048.00 |    4096.00 |    8.116 |    0.000 |    0.003 |   0.001848 |   0.000924
    31 |      0.103 |    135.202 |      0.253 |      2.991 |    4096.00 |    4096.00 |    8.201 |    0.000 |    0.003 |   0.001542 |   0.000771
    32 |      0.088 |    145.462 |      0.264 |      3.202 |    4096.00 |    4096.00 |    7.574 |    0.000 |    0.003 |   0.001280 |   0.000640
    33 |      0.079 |    122.221 |      0.240 |      3.384 |    4096.00 |    8192.00 |    7.058 |    0.000 |    0.003 |   0.001135 |   0.000567
    34 |      0.067 |    164.459 |      0.283 |      3.720 |    8192.00 |    8192.00 |    7.508 |    0.000 |    0.003 |   0.000944 |   0.000472
    35 |      0.056 |    184.599 |      0.303 |      4.005 |    8192.00 |    8192.00 |    6.591 |    0.000 |    0.003 |   0.000759 |   0

    33 |      0.004 |    296.373 |      0.414 |      0.854 |    1024.00 |          / |    4.649 |    0.000 |    0.004 |   0.000059 |   0.000032
early stop!
Terminated (optimal) in 34 iterations.
One optimization by ADMM finished. Elapsed time: 5.65 minutes.


stage 2 finished. Elapsed time: 103.57 minutes.

GLRM fitting finished. Elapsed time: 130.81 minutes.


Post-processing estimated cell-type proportion theta...
hard thresholding small theta values with threshold 0


cell type deconvolution finished. Estimate results saved in /home/exouser/Spatial/celltype_proportions.csv. Elapsed time: 2.23 hours.


######### No imputation #########


whole pipeline finished. Total elapsed time: 2.23 hours.


CompletedProcess(args='runDeconvolution -q sim_spatial_spot_nUMI.csv                           -r scRNA_data_full.csv                           -c scRNA_cell_annotation_Astro_5cells.csv                           -a sim_spatial_spot_adjacency_matrix.csv                           --n_hv_gene 500                           --n_marker_per_cmp 50                           --seed 2                           -n 64                           --use_cvae false\n', returncode=0)