Minimal examples demonstrating the usage of DeSide
DeSide_mini_example
├── DeSide_model # the pre-trained model, one large file need to be downloaded separately
├── E1 - Using pre-trained model.ipynb
├── E2 - Training a model from scratch.ipynb
├── E3 - Synthesizing bulk tumors.ipynb
├── LICENSE
├── README.md
├── datasets # three large files need to be downloaded separately
├── results # the results of the three examples
├── plot_fig # the figures and relevant data in the manuscript
├── main_workflow_demo.py # the main workflow of the manuscript, only for achieving the code
└── single_cell_dataset_integration # the single-cell dataset used in the manuscript
-
DeSide
is needed to reproduce the results. Please find the installation instructions about DeSide. -
Three files larger than 100MB in the
datasets
folder are not uploaded to GitHub. Please download and unzip them to the right place.simu_bulk_exp_Mixed_N100K_D1.h5ad
: the synthesized bulk gene expression profiles (GEPs) after filtering (Dataset D1), which is used in theexample 2
as the training dataset. Download link (~2.2G)simu_bulk_exp_SCT_N10K_S1_16sct.h5ad
: the synthesized single-cell-type GEPs (sctGEPs, Dataset S1), which is used in theexample 3
as the source of single-cell GEPs for simulation. Download link (~7G)merged_tpm.csv
: gene expression profiles of 19 cancer types in TCGA (TPM format), which is used as the reference dataset to guild the filtering steps in theexample 3
. Download link (~300M)
datasets
├── TCGA
│ ├── pca_model_0.9 # the PCA model fitted by the TCGA dataset for GEP-level filtering
│ │ ├── gene_list_for_pca.csv
│ │ ├── tcga_pca_model_for_gep_filtering.pkl # generated during dataset generation
│ │ └── tcga_pca_ref.csv
│ └── tpm
│ ├── LUAD
│ │ └── LUAD_TPM.csv
│ ├── merged_tpm.csv # merged TPM of 19 cancer types (need to be downloaded separately)
│ └── tcga_sample_id2cancer_type.csv
├── gene_set # used as the pathway profiles
│ ├── c2.cp.kegg.v2023.1.Hs.symbols.gmt
│ └── c2.cp.reactome.v2023.1.Hs.symbols.gmt
├── simu_bulk_exp_SCT_N10K_S1_16sct.h5ad # Dataset S1 (need to be downloaded separately)
└── simulated_bulk_cell_dataset
├── D1
│ ├── gene_list_filtered_by_high_corr_gene_and_quantile_range.csv # gene list after gene-level filtering (different datasets can generate this gene list slightly differently)
│ ├── gene_list_filtered_by_quantile_range_q_0.5_q_99.5.csv
│ └── simu_bulk_exp_Mixed_N100K_D1.h5ad # Dataset D1 (need to be downloaded separately)
└── D2
├── corr_cell_frac_with_gene_exp_D2.csv
└── gene_list_filtered_by_high_corr_gene.csv # the list of high correlation genes (the same one used for the filtering step in other datasets)
- The following file in the folder
DeSide_model
is larger than 100MB and has not been uploaded to GitHub. Please download and put it to the right place.model_DeSide.h5
: the pre-trained model, which is used in theexample 1
. Download link (~100M)
DeSide_model
├── celltypes.txt
├── genes.txt
├── genes_for_gep.txt
├── genes_for_pathway_profile.txt
├── history_reg.csv
├── key_params.txt
├── loss.png
└── model_DeSide.h5 # the pre-trained model (need to be downloaded separately)
Using the pre-trained model to predict cell type proportions in a new dataset.
- Jupyter notebook: E1 - Using pre-trained model.ipynb
Training a model from scratch using the DeSide
package and the synthesized bulk GEP dataset.
- Jupyter notebook: E2 - Training a model from scratch.ipynb
Synthesizing bulk tumors using the DeSide
package.
- Jupyter notebook: E3 - Synthesizing bulk tumors.ipynb