# Run *SDePER* on Real data: Breast Cancer

In this Notebook we run SDePER on one Real Dataset -- human **HER2+ Breast Cancer**. 

For downloading and preprocessing original spatial and reference scRNA-seq data for cell type deconvolution please refer [Breast_Cancer_preprocess.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/0a429908645a665c1f9d345df013d5b9fcde20b3/RealData/Breast_Cancer/Breast_Cancer_preprocess.nb.html).

==================================================================================================================

So here we use **5 input files** as shown below:

1. raw nUMI counts of spatial transcriptomic data (spots × genes): [Breast_Cancer_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Breast_Cancer/Breast_Cancer_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `Breast_Cancer_ref_scRNA_cell_nUMI.csv`, just decompress the gzipped file [Breast_Cancer_ref_scRNA_cell_nUMI.csv.gz](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Breast_Cancer/Breast_Cancer_ref_scRNA_cell_nUMI.csv.gz)
3. cell type annotations for cells of selected 7 cell types in reference scRNA-seq data (cells × 1): [Breast_Cancer_ref_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Breast_Cancer/Breast_Cancer_ref_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in spatial transcriptomic data (spots × spots): [Breast_Cancer_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Breast_Cancer/Breast_Cancer_spatial_spot_adjacency_matrix.csv)
5. row/column integer index (x,y) of spatial spots (spots * 2): [Breast_Cancer_spatial_spot_loc.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Breast_Cancer/Breast_Cancer_spatial_spot_loc.csv)

==================================================================================================================

SDePER settings are:

* number of included highly variable genes `n_hv_gene`: 1500
* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 20
* minimum value of cells in one pseudo-spot for building CVAE `pseudo_spot_min_cell`: 20
* maximum value of cells in one pseudo-spot for building CVAE `pseudo_spot_max_cell`: 70
* initial learning rate for training CVAE `cvae_init_lr`: 0.001
* seed for random values `seed`: 4
* number of used CPU cores `n_core`: 32
* whether to perform imputation `use_imputation`: true

ALL other options are left as default.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q Breast_Cancer_spatial_spot_nUMI.csv -r Breast_Cancer_ref_scRNA_cell_nUMI.csv -c Breast_Cancer_ref_scRNA_cell_celltype.csv -a Breast_Cancer_spatial_spot_adjacency_matrix.csv -l Breast_Cancer_spatial_spot_loc.csv --n_hv_gene 1500 --n_marker_per_cmp 20 --pseudo_spot_min_cell 20 --pseudo_spot_max_cell 70 --cvae_init_lr 0.001 --seed 4 -n 32 --use_imputation true`

Note this Notebook uses **SDePER v1.0.1**. Cell type deconvolution result is renamed as [Breast_Cancer_SDePER_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/tree/main/RealData/Breast_Cancer/Breast_Cancer_SDePER_celltype_proportions.csv). Imputation results are compressed into one zipped file and deposited to folder [Breast_Cancer](https://zenodo.org/record/8334656/files/Breast_Cancer.zip) in [10.5281/zenodo.8334655](https://doi.org/10.5281/zenodo.8334655).

In [1]:
import subprocess

cmd = '''runDeconvolution -q Breast_Cancer_spatial_spot_nUMI.csv \
                          -r Breast_Cancer_ref_scRNA_cell_nUMI.csv \
                          -c Breast_Cancer_ref_scRNA_cell_celltype.csv \
                          -a Breast_Cancer_spatial_spot_adjacency_matrix.csv \
                          -l Breast_Cancer_spatial_spot_loc.csv \
                          --n_hv_gene 1500 \
                          --n_marker_per_cmp 20 \
                          --pseudo_spot_min_cell 20 \
                          --pseudo_spot_max_cell 70 \
                          --cvae_init_lr 0.001 \
                          --seed 4 \
                          -n 32 \
                          --use_imputation true
'''
subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.0.1


running options:
spatial_file: /home/exouser/Spatial/Breast_Cancer_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/Breast_Cancer_ref_scRNA_cell_nUMI.csv
ref_celltype_file: /home/exouser/Spatial/Breast_Cancer_ref_scRNA_cell_celltype.csv
marker_file: None
loc_file: /home/exouser/Spatial/Breast_Cancer_spatial_spot_loc.csv
A_file: /home/exouser/Spatial/Breast_Cancer_spatial_spot_adjacency_matrix.csv
n_cores: 32
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
use_cvae: True
threshold: 0
n_hv_gene: 1500
n_marker_per_cmp: 20
pseudo_spot_min_cell: 20
pseudo_spot_max_cell: 70
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.001
redo_de: True
seed: 4
diagnosis: False
verbose: True
use_imputation: True
diameter: 200
impute_diameter: [160, 114, 80]


######### Preprocessing... #########

######### First build CVAE... ###

Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
Epoch 73/1000
Epoch 74/1000
Epoch 75/1000
Epoch 76/1000
Epoch 77/1000
Epoch 78/1000
Epoch 79/1000
Epoch 80/1000
Epoch 81/1000
Epoch 82/1000
Epoch 83/1000
Epoch 84/1000
Epoch 85/1000
Epoch 86/1000
Epoch 87/1000
Epoch 88/1000
Epoch 89/1000
Epoch 90/1000
Epoch 91/1000
Epoch 92/1000
Epoch 93/1000
Epoch 94/1000
Epoch 95/1000
Epoch 96/1000
Epoch 97/1000
Epoch 98/1000
Epoch 99/1000
Epoch 100/1000
Epoch 101/1000
Epoch 102/1000
Epoch 103/1000
Epoch 104/1000
Epoch 105/1000
Epoch 106/1000
Epoch 107/1000
Epoch 108/1000
Epoch 109/1000
Epoch 110/1000
Epoch 111/1000
Epoch 112/1000
Epoch 113/1000
Epoch 114/1000

training finished in 114 epochs (early stop), 

    34 |      0.043 |    110.166 |      0.215 |      2.830 |    8192.00 |    8192.00 |    5.426 |    0.000 |    0.003 |   0.000465 |   0.000233
    35 |      0.035 |    126.089 |      0.231 |      3.020 |    8192.00 |    8192.00 |    5.288 |    0.000 |    0.002 |   0.000362 |   0.000181
    36 |      0.030 |    121.922 |      0.227 |      3.170 |    8192.00 |   16384.00 |    5.090 |    0.000 |    0.002 |   0.000292 |   0.000146
    37 |      0.025 |    146.072 |      0.251 |      3.429 |   16384.00 |   16384.00 |    4.653 |    0.000 |    0.002 |   0.000232 |   0.000116
    38 |      0.021 |    158.120 |      0.263 |      3.643 |   16384.00 |   16384.00 |    4.595 |    0.000 |    0.002 |   0.000181 |   0.000091
    39 |      0.017 |    155.099 |      0.260 |      3.816 |   16384.00 |   32768.00 |    4.412 |    0.000 |    0.002 |   0.000148 |   0.000074
    40 |      0.013 |    208.729 |      0.314 |      4.096 |   32768.00 |   32768.00 |    4.122 |    0.000 |    0.002 |   0.000112 |   0

CompletedProcess(args='runDeconvolution -q Breast_Cancer_spatial_spot_nUMI.csv                           -r Breast_Cancer_ref_scRNA_cell_nUMI.csv                           -c Breast_Cancer_ref_scRNA_cell_celltype.csv                           -a Breast_Cancer_spatial_spot_adjacency_matrix.csv                           -l Breast_Cancer_spatial_spot_loc.csv                           --n_hv_gene 1500                           --n_marker_per_cmp 20                           --pseudo_spot_min_cell 20                           --pseudo_spot_max_cell 70                           --cvae_init_lr 0.001                           --seed 4                           -n 32                           --use_imputation true\n', returncode=0)