DeSide_mini_example

Minimal examples demonstrating the usage of DeSide

Folder structure of `DeSide_mini_example`:

DeSide_mini_example
├── DeSide_model  # the pre-trained model, one large file need to be downloaded separately
├── E1 - Using pre-trained model.ipynb
├── E2 - Training a model from scratch.ipynb
├── E3 - Synthesizing bulk tumors.ipynb
├── LICENSE
├── README.md
├── datasets  # three large files need to be downloaded separately
├── results   # the results of the three examples
├── plot_fig  # the figures and relevant data in the manuscript
├── main_workflow_demo.py  # the main workflow of the manuscript, only for achieving the code
└── single_cell_dataset_integration  # the single-cell dataset used in the manuscript

Dependencies

DeSide is needed to reproduce the results. Please find the installation instructions about DeSide.
Three files larger than 100MB in the datasets folder are not uploaded to GitHub. Please download and unzip them to the right place.
- simu_bulk_exp_Mixed_N100K_D1.h5ad: the synthesized bulk gene expression profiles (GEPs) after filtering (Dataset D1), which is used in the example 2 as the training dataset. Download link (~2.2G)
- simu_bulk_exp_SCT_N10K_S1_16sct.h5ad: the synthesized single-cell-type GEPs (sctGEPs, Dataset S1), which is used in the example 3 as the source of single-cell GEPs for simulation. Download link (~7G)
- merged_tpm.csv: gene expression profiles of 19 cancer types in TCGA (TPM format), which is used as the reference dataset to guild the filtering steps in the example 3. Download link (~300M)

Folder structure of `datasets`:

datasets
├── TCGA
│ ├── pca_model_0.9  # the PCA model fitted by the TCGA dataset for GEP-level filtering
│ │ ├── gene_list_for_pca.csv
│ │ ├── tcga_pca_model_for_gep_filtering.pkl  # generated during dataset generation
│ │ └── tcga_pca_ref.csv
│ └── tpm
│     ├── LUAD
│     │ └── LUAD_TPM.csv
│     ├── merged_tpm.csv # merged TPM of 19 cancer types (need to be downloaded separately)
│     └── tcga_sample_id2cancer_type.csv
├── gene_set  # used as the pathway profiles
│ ├── c2.cp.kegg.v2023.1.Hs.symbols.gmt
│ └── c2.cp.reactome.v2023.1.Hs.symbols.gmt
├── simu_bulk_exp_SCT_N10K_S1_16sct.h5ad # Dataset S1 (need to be downloaded separately)
└── simulated_bulk_cell_dataset
    ├── D1
    │ ├── gene_list_filtered_by_high_corr_gene_and_quantile_range.csv  # gene list after gene-level filtering (different datasets can generate this gene list slightly differently)
    │ ├── gene_list_filtered_by_quantile_range_q_0.5_q_99.5.csv
    │ └── simu_bulk_exp_Mixed_N100K_D1.h5ad # Dataset D1 (need to be downloaded separately)
    └── D2
        ├── corr_cell_frac_with_gene_exp_D2.csv
        └── gene_list_filtered_by_high_corr_gene.csv # the list of high correlation genes (the same one used for the filtering step in other datasets)

The following file in the folder DeSide_model is larger than 100MB and has not been uploaded to GitHub. Please download and put it to the right place.
- model_DeSide.h5: the pre-trained model, which is used in the example 1. Download link (~100M)

Folder structure of `DeSide_model`:

DeSide_model
├── celltypes.txt
├── genes.txt
├── genes_for_gep.txt
├── genes_for_pathway_profile.txt
├── history_reg.csv
├── key_params.txt
├── loss.png
└── model_DeSide.h5 # the pre-trained model (need to be downloaded separately)

Example 1: Using pre-trained model

Using the pre-trained model to predict cell type proportions in a new dataset.

Jupyter notebook: E1 - Using pre-trained model.ipynb

Example 2: Training a model from scratch

Training a model from scratch using the DeSide package and the synthesized bulk GEP dataset.

Jupyter notebook: E2 - Training a model from scratch.ipynb

Example 3: Synthesizing bulk tumors

Synthesizing bulk tumors using the DeSide package.

Jupyter notebook: E3 - Synthesizing bulk tumors.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
DeSide_model		DeSide_model
datasets		datasets
plot_fig		plot_fig
results		results
single_cell_dataset_integration		single_cell_dataset_integration
.gitignore		.gitignore
E1 - Using pre-trained model.ipynb		E1 - Using pre-trained model.ipynb
E2 - Training a model from scratch.ipynb		E2 - Training a model from scratch.ipynb
E3 - Synthesizing bulk tumors.ipynb		E3 - Synthesizing bulk tumors.ipynb
LICENSE		LICENSE
README.md		README.md
main_workflow_demo.py		main_workflow_demo.py

License

OnlyBelter/DeSide_mini_example

Folders and files

Latest commit

History

Repository files navigation

DeSide_mini_example

Folder structure of DeSide_mini_example:

Dependencies

Folder structure of datasets:

Folder structure of DeSide_model:

Example 1: Using pre-trained model

Example 2: Training a model from scratch

Example 3: Synthesizing bulk tumors

About

Resources

License

Stars

Watchers

Forks

Languages

Folder structure of `DeSide_mini_example`:

Folder structure of `datasets`:

Folder structure of `DeSide_model`: