# Run *SDePER* on Real data: IPF

In this Notebook we run SDePER on one Real Dataset -- human **Idiopathic pulmonary fibrosis (IPF)** dataset. 

For downloading and preprocessing original spatial and reference scRNA-seq data for cell type deconvolution please refer [IPF_preprocess.nb.html](https://rawcdn.githack.com/az7jh2/SDePER_Analysis/c9b4698ecd9d0b1b0d2794df963127efe01ec231/RealData/IPF/IPF_preprocess.nb.html).

==================================================================================================================

So here we use **5 input files** as shown below:

1. raw nUMI counts of spatial transcriptomic data (spots × genes): `IPF_spatial_spot_nUMI.csv`, just decompress the gzipped file [IPF_spatial_spot_nUMI.csv.gz](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/IPF/IPF_spatial_spot_nUMI.csv.gz)
2. raw nUMI counts of reference scRNA-seq data (cells × genes): `IPF_ref_scRNA_cell_nUMI.csv`, just decompress the gzipped file [IPF_ref_scRNA_cell_nUMI.csv.gz](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/IPF/IPF_ref_scRNA_cell_nUMI.csv.gz)
3. cell type annotations for cells of selected 26 cell types in reference scRNA-seq data (cells × 1): [IPF_ref_scRNA_cell_celltype.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/IPF/IPF_ref_scRNA_cell_celltype.csv)
4. adjacency matrix of spots in spatial transcriptomic data (spots × spots): [IPF_spatial_spot_adjacency_matrix.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/IPF/IPF_spatial_spot_adjacency_matrix.csv)
5. manually selected 2,534 cell type specific marker genes from scRNA-seq data (26 cell types × 2,534 genes): [IPF_selected_2534_celltype_markers.csv](https://github.com/az7jh2/SDePER_Analysis/blob/main/RealData/IPF/IPF_selected_2534_celltype_markers.csv)

==================================================================================================================

SDePER settings are:

* number of included highly variable genes `n_hv_gene`: 2000
* number of selected TOP marker genes for each comparison in Differential `n_marker_per_cmp`: 20
* maximum value of cells in one pseudo-spot for building CVAE `pseudo_spot_max_cell`: 10
* hyper-parameter for Adaptive Lasso `lambda_r`: 0.72
* hyper-parameter for Graph Laplacian Constrain `lambda_g`: 13.895
* seed for random values `seed`: 1
* number of used CPU cores `n_core`: 64

ALL other options are left as default.

==================================================================================================================

the `bash` command to start cell type deconvolution is

`runDeconvolution -q IPF_spatial_spot_nUMI.csv -r IPF_ref_scRNA_cell_nUMI.csv -c IPF_ref_scRNA_cell_celltype.csv -a IPF_spatial_spot_adjacency_matrix.csv -m IPF_selected_2534_celltype_markers.csv --n_hv_gene 2000 --n_marker_per_cmp 20 --pseudo_spot_max_cell 10 --lambda_r 0.72 --lambda_g 13.895 --seed 1 -n 64`

Note this Notebook uses **SDePER v1.0.1**. Cell type deconvolution result is renamed as [IPF_SDePER_celltype_proportions.csv](https://github.com/az7jh2/SDePER_Analysis/tree/main/RealData/IPF/IPF_SDePER_celltype_proportions.csv).

In [1]:
import subprocess

cmd = '''runDeconvolution -q IPF_spatial_spot_nUMI.csv \
                          -r IPF_ref_scRNA_cell_nUMI.csv \
                          -c IPF_ref_scRNA_cell_celltype.csv \
                          -a IPF_spatial_spot_adjacency_matrix.csv \
                          -m IPF_selected_2534_celltype_markers.csv \
                          --n_hv_gene 2000 \
                          --n_marker_per_cmp 20 \
                          --pseudo_spot_max_cell 10 \
                          --lambda_r 0.72 \
                          --lambda_g 13.895 \
                          --seed 1 \
                          -n 64
'''
subprocess.run(cmd, check=True, text=True, shell=True)


SDePER (Spatial Deconvolution method with Platform Effect Removal) v1.0.1


running options:
spatial_file: /home/exouser/Spatial/IPF_spatial_spot_nUMI.csv
ref_file: /home/exouser/Spatial/IPF_ref_scRNA_cell_nUMI.csv
ref_celltype_file: /home/exouser/Spatial/IPF_ref_scRNA_cell_celltype.csv
marker_file: /home/exouser/Spatial/IPF_selected_2534_celltype_markers.csv
loc_file: None
A_file: /home/exouser/Spatial/IPF_spatial_spot_adjacency_matrix.csv
n_cores: 64
lambda_r: 0.72
lambda_g: 13.895
use_cvae: True
threshold: 0
n_hv_gene: 2000
n_marker_per_cmp: 20
pseudo_spot_min_cell: 2
pseudo_spot_max_cell: 10
seq_depth_scaler: 10000
cvae_input_scaler: 10
cvae_init_lr: 0.003
redo_de: True
seed: 1
diagnosis: False
verbose: True
use_imputation: False
diameter: 200
impute_diameter: [160, 114, 80]


######### Preprocessing... #########

######### First build CVAE... #########

read spatial data from file /home/exouser/Spatial/IPF_spatial_spot_nUMI.csv
total 3532 spots; 32078 genes

read scRNA-seq data f

CompletedProcess(args='runDeconvolution -q IPF_spatial_spot_nUMI.csv                           -r IPF_ref_scRNA_cell_nUMI.csv                           -c IPF_ref_scRNA_cell_celltype.csv                           -a IPF_spatial_spot_adjacency_matrix.csv                           -m IPF_selected_2534_celltype_markers.csv                           --n_hv_gene 2000                           --n_marker_per_cmp 20                           --pseudo_spot_max_cell 10                           --lambda_r 0.72                           --lambda_g 13.895                           --seed 1                           -n 64\n', returncode=0)