PathDSP

This repository demonstrates how to use the IMPROVE library v0.1.0-2024-09-27 for building a drug response prediction (DRP) model using PathDSP, and provides examples with the benchmark cross-study analysis (CSA) dataset.

This version, tagged as v0.1.0-2024-09-27, introduces a new API which is designed to encourage broader adoption of IMPROVE and its curated models by the research community.

Dependencies

Installation instuctions are detailed below in Step-by-step instructions.

Conda yml file PathDSP_env_conda

ML framework:

Torch -- deep learning framework for building the prediction model

IMPROVE dependencies:

IMPROVE v0.1.0-2024-09-27

Dataset

Benchmark data for cross-study analysis (CSA) can be downloaded from this site.

The data tree is shown below:

csa_data/raw_data/
├── splits
│   ├── CCLE_all.txt
│   ├── CCLE_split_0_test.txt
│   ├── CCLE_split_0_train.txt
│   ├── CCLE_split_0_val.txt
│   ├── CCLE_split_1_test.txt
│   ├── CCLE_split_1_train.txt
│   ├── CCLE_split_1_val.txt
│   ├── ...
│   ├── GDSCv2_split_9_test.txt
│   ├── GDSCv2_split_9_train.txt
│   └── GDSCv2_split_9_val.txt
├── x_data
│   ├── cancer_copy_number.tsv
│   ├── cancer_discretized_copy_number.tsv
│   ├── cancer_DNA_methylation.tsv
│   ├── cancer_gene_expression.tsv
│   ├── cancer_miRNA_expression.tsv
│   ├── cancer_mutation_count.tsv
│   ├── cancer_mutation_long_format.tsv
│   ├── cancer_mutation.parquet
│   ├── cancer_RPPA.tsv
│   ├── drug_ecfp4_nbits512.tsv
│   ├── drug_info.tsv
│   ├── drug_mordred_descriptor.tsv
│   └── drug_SMILES.tsv
└── y_data
    └── response.tsv

Model scripts and parameter file

PathDSP_preprocess_improve.py - takes benchmark data files and transforms into files for training and inference
PathDSP_train_improve.py - trains the PathDSP model
PathDSP_infer_improve.py - runs inference with the trained PathDSP model
model_params_def.py - definitions of parameters that are specific to the model
PathDSP_params.txt - default parameter file

Step-by-step instructions

1. Clone the model repository

git clone https://github.com/JDACS4C-IMPROVE/PathDSP
cd PathDSP
git checkout v0.1.0-2024-09-27

2. Set computational environment

Create conda env using yml

conda env create -f PathDSP_env_conda.yml -n PathDSP_env
conda activate PathDSP_env

3. Run `setup_improve.sh`.

source setup_improve.sh

This will:

Download cross-study analysis (CSA) benchmark data into ./csa_data/.
Clone IMPROVE repo (checkout tag v0.1.0-2024-09-27) outside the PathDSP model repo
Set up env variables: IMPROVE_DATA_DIR (to ./csa_data/) and PYTHONPATH (adds IMPROVE repo).
Download the model-specific supplemental data (aka author data) and set up the env variable AUTHOR_DATA_DIR.

4. Preprocess CSA benchmark data (raw data) to construct model input data (ML data)

python PathDSP_preprocess_improve.py --input_dir ./csa_data/raw_data --output_dir exp_result

Preprocesses the CSA data and creates train, validation (val), and test datasets.

Generates:

three model input data files: train_data.txt, val_data.txt, test_data.txt

exp_result
├── tmpdir_ssgsea
├── EXP.txt
├── cnv_data.txt
├── CNVnet.txt
├── DGnet.txt
├── MUTnet.txt
├── drug_mbit_df.txt
├── drug_target.txt
├── mutation_data.txt 
├── test_data.txt
├── train_data.txt
├── val_data.txt
└── x_data_gene_expression_scaler.gz

5. Train PathDSP model

python PathDSP_train_improve.py --input_dir exp_result --output_dir exp_result

Trains PathDSP using the model input data: train_data.txt (training), val_data.txt (for early stopping).

Generates:

trained model: model.pt
predictions on val data (tabular data): val_y_data_predicted.csv
prediction performance scores on val data: val_scores.json

exp_result
├── model.pt
├── checkpoint.pt
├── Val_Loss_orig.txt
├── val_scores.json
└── val_y_data_predicted.csv

6. Run inference on test data with the trained model

python PathDSP_infer_improve.py --input_data_dir exp_result --input_model_dir exp_result --output_dir exp_result --calc_infer_score True

Evaluates the performance on a test dataset with the trained model.

Generates:

predictions on test data (tabular data): test_y_data_predicted.csv
prediction performance scores on test data: test_scores.json

exp_result
├── test_scores.json
└── test_y_data_predicted.csv

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
model_utils		model_utils
.gitignore		.gitignore
LICENSE		LICENSE
NetPEA.py		NetPEA.py
PathDSP.def		PathDSP.def
PathDSP_cs_model.txt		PathDSP_cs_model.txt
PathDSP_env_conda.yml		PathDSP_env_conda.yml
PathDSP_infer_improve.py		PathDSP_infer_improve.py
PathDSP_params.txt		PathDSP_params.txt
PathDSP_preprocess_improve.py		PathDSP_preprocess_improve.py
PathDSP_train_improve.py		PathDSP_train_improve.py
README.md		README.md
README_deephyper.md		README_deephyper.md
README_old.md		README_old.md
README_old2.md		README_old2.md
RWR.py		RWR.py
TODO.txt		TODO.txt
csa_bruteforce_params.ini		csa_bruteforce_params.ini
csa_bruteforce_params_def.py		csa_bruteforce_params_def.py
csa_bruteforce_wf.py		csa_bruteforce_wf.py
csa_params.ini		csa_params.ini
csa_params.test.ini		csa_params.test.ini
csa_params_def.py		csa_params_def.py
csa_wf_v3.py		csa_wf_v3.py
csa_workflow_params.txt		csa_workflow_params.txt
download_author_data.sh		download_author_data.sh
download_csa.sh		download_csa.sh
environment.yml		environment.yml
environment_081723.yml		environment_081723.yml
execute_in_conda.sh		execute_in_conda.sh
get_hosts_polaris.py		get_hosts_polaris.py
get_test_data.py		get_test_data.py
hpo_scale.sh		hpo_scale.sh
hpo_scale_singularity_debug.sh		hpo_scale_singularity_debug.sh
hpo_scale_singularity_debug_scaling.sh		hpo_scale_singularity_debug_scaling.sh
hpo_scale_singularity_prod.sh		hpo_scale_singularity_prod.sh
hpo_subprocess.py		hpo_subprocess.py
hpo_subprocess_singularity.py		hpo_subprocess_singularity.py
hyperparameters_default.json		hyperparameters_default.json
hyperparameters_hpo.json		hyperparameters_hpo.json
improve_utils.py		improve_utils.py
infer.py		infer.py
infer.sh		infer.sh
install_polaris.sh		install_polaris.sh
model_params_def.py		model_params_def.py
parse_DSP_data_Chia_Jan12_2023.R		parse_DSP_data_Chia_Jan12_2023.R
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
preprocess_new.py		preprocess_new.py
set_affinity_gpu_polaris.sh		set_affinity_gpu_polaris.sh
setup_improve.sh		setup_improve.sh
subprocess_train.sh		subprocess_train.sh
subprocess_train_singularity.sh		subprocess_train_singularity.sh
train.py		train.py
train.sh		train.sh
workflow_csa.py		workflow_csa.py
workflow_preprocess.py		workflow_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PathDSP

Dependencies

Dataset

Model scripts and parameter file

Step-by-step instructions

1. Clone the model repository

2. Set computational environment

3. Run `setup_improve.sh`.

4. Preprocess CSA benchmark data (raw data) to construct model input data (ML data)

5. Train PathDSP model

6. Run inference on test data with the trained model

About

Releases

Packages

Languages

License

JDACS4C-IMPROVE/PathDSP

Folders and files

Latest commit

History

Repository files navigation

PathDSP

Dependencies

Dataset

Model scripts and parameter file

Step-by-step instructions

1. Clone the model repository

2. Set computational environment

3. Run setup_improve.sh.

4. Preprocess CSA benchmark data (raw data) to construct model input data (ML data)

5. Train PathDSP model

6. Run inference on test data with the trained model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Run `setup_improve.sh`.

Packages