# Run *SDePER* on Real data: Melanoma

In this Notebook we run SDePER on one Real Dataset -- human **Melanoma**. 

For downloading and preprocessing original spatial and reference scRNA-seq data for cell type deconvolution please refer [Melanoma_preprocess.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/0a429908645a665c1f9d345df013d5b9fcde20b3/RealData/Melanoma/Melanoma_preprocess.nb.html).

==================================================================================================================

So here we use **5 input files** as shown below:

1. raw nUMI counts of spatial transcriptomic data (spots × genes): [Melanoma_spatial_spot_nUMI.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Melanoma/Melanoma_spatial_spot_nUMI.csv)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `Melanoma_ref_scRNA_cell_nUMI.csv`, just decompress the gzipped file [Melanoma_ref_scRNA_cell_nUMI.csv.gz](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Melanoma/Melanoma_ref_scRNA_cell_nUMI.csv.gz)
3. cell type annotations for cells of selected 7 cell types in reference scRNA-seq data (cells × 1): [Melanoma_ref_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Melanoma/Melanoma_ref_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in spatial transcriptomic data (spots × spots): [Melanoma_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Melanoma/Melanoma_spatial_spot_adjacency_matrix.csv)
5. row/column integer index (x,y) of spatial spots (spots * 2): [Melanoma_spatial_spot_loc.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/Melanoma/Melanoma_spatial_spot_loc.csv)

==================================================================================================================

SDePER settings are:

* number of included highly variable genes `n_hv_gene`: 300
* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 10
* minimum value of cells in one pseudo-spot for building CVAE `pseudo_spot_min_cell`: 5
* maximum value of cells in one pseudo-spot for building CVAE `pseudo_spot_max_cell`: 40
* seed for random values `seed`: 3
* number of used CPU cores `n_core`: 32
* whether to perform imputation `use_imputation`: true

ALL other options are left as default.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q Melanoma_spatial_spot_nUMI.csv -r Melanoma_ref_scRNA_cell_nUMI.csv -c Melanoma_ref_scRNA_cell_celltype.csv -a Melanoma_spatial_spot_adjacency_matrix.csv -l Melanoma_spatial_spot_loc.csv --n_hv_gene 300 --n_marker_per_cmp 10 --pseudo_spot_min_cell 5 --pseudo_spot_max_cell 40 --seed 3 -n 32 --use_imputation true`

Note this Notebook uses **SDePER v1.0.1**. Cell type deconvolution result is renamed as [Melanoma_SDePER_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/tree/main/RealData/Melanoma/Melanoma_SDePER_celltype_proportions.csv). Imputation results are compressed into into one zipped file and deposited to folder [Melanoma](https://zenodo.org/record/8334656/files/Melanoma.zip) in [10.5281/zenodo.8334655](https://doi.org/10.5281/zenodo.8334655).

In [1]:
import subprocess

cmd = '''runDeconvolution -q Melanoma_spatial_spot_nUMI.csv \
                          -r Melanoma_ref_scRNA_cell_nUMI.csv \
                          -c Melanoma_ref_scRNA_cell_celltype.csv \
                          -a Melanoma_spatial_spot_adjacency_matrix.csv \
                          -l Melanoma_spatial_spot_loc.csv \
                          --n_hv_gene 300 \
                          --n_marker_per_cmp 10 \
                          --pseudo_spot_min_cell 5 \
                          --pseudo_spot_max_cell 40 \
                          --seed 3 \
                          -n 32 \
                          --use_imputation true
'''
subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.0.1


running options:
spatial_file: /home/exouser/Spatial/Melanoma_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/Melanoma_ref_scRNA_cell_nUMI.csv
ref_celltype_file: /home/exouser/Spatial/Melanoma_ref_scRNA_cell_celltype.csv
marker_file: None
loc_file: /home/exouser/Spatial/Melanoma_spatial_spot_loc.csv
A_file: /home/exouser/Spatial/Melanoma_spatial_spot_adjacency_matrix.csv
n_cores: 32
lambda_r: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
lambda_g: [0.1, 0.268, 0.72, 1.931, 5.179, 13.895, 37.276, 100.0]
use_cvae: True
threshold: 0
n_hv_gene: 300
n_marker_per_cmp: 10
pseudo_spot_min_cell: 5
pseudo_spot_max_cell: 40
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.003
redo_de: True
seed: 3
diagnosis: False
verbose: True
use_imputation: True
diameter: 200
impute_diameter: [160, 114, 80]


######### Preprocessing... #########

######### First build CVAE... #########

read spatial data f

Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
Epoch 73/1000
Epoch 74/1000
Epoch 75/1000
Epoch 76/1000
Epoch 77/1000
Epoch 78/1000
Epoch 79/1000
Epoch 80/1000
Epoch 81/1000
Epoch 82/1000
Epoch 83/1000
Epoch 84/1000
Epoch 85/1000
Epoch 86/1000
Epoch 87/1000
Epoch 88/1000
Epoch 89/1000
Epoch 90/1000
Epoch 91/1000
Epoch 92/1000
Epoch 93/1000
Epoch 94/1000
Epoch 95/1000
Epoch 96/1000
Epoch 97/1000
Epoch 98/1000
Epoch 99/1000
Epoch 100/1000
Epoch 101/1000
Epoch 102/1000
Epoch 103/1000
Epoch 104/1000
Epoch 105/1000
Epoch 106/1000
Epoch 107/1000
Epoch 108/1000
Epoch 109/1000
Epoch 110/1000
Epoch 111/1000
Epoch 112/1000
Epoch 113/1000
Epoch 114/1000


Epoch 115/1000
Epoch 116/1000
Epoch 117/1000
Epoch 118/1000
Epoch 119/1000
Epoch 120/1000
Epoch 121/1000
Epoch 122/1000
Epoch 123/1000
Epoch 124/1000
Epoch 125/1000
Epoch 126/1000

training finished in 126 epochs (early stop), transform data to adjust the platform effect...


re-run DE on CVAE transformed scRNA-seq data!
Differential analysis across cell-types on scRNA-seq data...
finally selected 160 cell-type marker genes


platform effect adjustment by CVAE finished. Elapsed time: 10.47 minutes.


use the marker genes derived from CVAE transformed scRNA-seq for downstream regression!

gene filtering before modeling...
15 genes with nUMIs<5 in all spatial spots and need to be excluded
finally use 145 genes for modeling

spot filtering before modeling...
all spots passed filtering


######### Start GLRM modeling... #########

GLRM settings:
use SciPy minimize method:  L-BFGS-B
global optimization turned off, local minimum will be used in GLRM
use hybrid version of GLRM
Numba detected 

    28 |      0.126 |     28.083 |      0.092 |      0.797 |     512.00 |    1024.00 |    2.175 |    0.000 |    0.002 |   0.002587 |   0.001293
    29 |      0.104 |     36.400 |      0.100 |      0.870 |    1024.00 |    1024.00 |    2.029 |    0.000 |    0.002 |   0.002117 |   0.001059
    30 |      0.085 |     40.959 |      0.105 |      0.928 |    1024.00 |    1024.00 |    1.972 |    0.000 |    0.002 |   0.001659 |   0.000829
    31 |      0.072 |     36.468 |      0.101 |      0.973 |    1024.00 |    2048.00 |    2.122 |    0.000 |    0.002 |   0.001373 |   0.000686
    32 |      0.058 |     45.574 |      0.110 |      1.046 |    2048.00 |    2048.00 |    2.152 |    0.000 |    0.002 |   0.001079 |   0.000540
    33 |      0.049 |     58.273 |      0.122 |      1.104 |    2048.00 |    2048.00 |    1.894 |    0.000 |    0.002 |   0.000853 |   0.000427
    34 |      0.044 |     37.893 |      0.102 |      1.153 |    2048.00 |    4096.00 |    1.763 |    0.000 |    0.002 |   0.000729 |   0

    33 |      0.006 |    352.072 |      0.416 |      0.618 |    1024.00 |    1024.00 |    1.259 |    0.000 |    0.003 |   0.000148 |   0.000074
    34 |      0.002 |    351.852 |      0.416 |      0.618 |    1024.00 |          / |    1.135 |    0.000 |    0.003 |   0.000042 |   0.000023
early stop!
Terminated (optimal) in 35 iterations.
One optimization by ADMM finished. Elapsed time: 0.89 minutes.


stage 2 finished. Elapsed time: 21.43 minutes.

GLRM fitting finished. Elapsed time: 25.61 minutes.


Post-processing estimated cell-type proportion theta...
hard thresholding small theta values with threshold 0


cell type deconvolution finished. Estimate results saved in /home/exouser/Spatial/celltype_proportions.csv. Elapsed time: 0.60 hours.


######### Start imputation #########
imputation for 160 µm finished. Elapsed time: 0.35 minutes
imputation for 114 µm finished. Elapsed time: 0.91 minutes
imputation for 80 µm finished. Elapsed time: 2.88 minutes


whole pipeline finished. Total 

CompletedProcess(args='runDeconvolution -q Melanoma_spatial_spot_nUMI.csv                           -r Melanoma_ref_scRNA_cell_nUMI.csv                           -c Melanoma_ref_scRNA_cell_celltype.csv                           -a Melanoma_spatial_spot_adjacency_matrix.csv                           -l Melanoma_spatial_spot_loc.csv                           --n_hv_gene 300                           --n_marker_per_cmp 10                           --pseudo_spot_min_cell 5                           --pseudo_spot_max_cell 40                           --seed 3                           -n 32                           --use_imputation true\n', returncode=0)