DRP 2022

The Drug Response Prediction 2022 project in Computational Biology and Artificial Intelligence (COMBINE) Laboratory, McGill University.

1 Setup

1.1 Cloning the repository

In the console, type the following command.

git clone https://github.com/AntonioShen/MTDRP.git

1.2 Installing dependencies

It is preferred to use CONDA for dependency packages management. Type the following command to the console (make sure the current working directory is under the project root /MTDRP/) to create a new environment and to install all required packages.

conda create --name env --file ./requirements.txt

Activate the newly created CONDA environment.

2 Data Preparation

2.1 Downloading dataset

Download DRP2022_preprocesssed.zip (not disclosed, will be available in the future), unzip it and merge the folder to ./data/DRP2022_preprocessed.

2.2 Parsing .csv files

The dataset contains multiple .csv files, this operation extracts numerical values from them and creates objects (sub-class of torch.utils.data.Dataset) for easy training and testing.

2.2.1 Selecting data folding and the 2nd-stage preprocessing method

A particular set of folds (for cross-validation) with an (optional) addition data preprocessing rule should be determined. In the example below (see 2.2.3), the first fold (indexed 0) in cl_fold and zero-mean standardization are used to create PyTorch datasets.

It is possible and easy to define a new 2nd-stage preprocessing method in ./datahandlers/custom_preprocess_rules.py (see 2.2.2). Min-max normalization and zero-mean standardization rules are provided initially.

2.2.2 (Optional) Defining a new 2nd-stage preprocessing method

Every preprocessing method should pack to a class that inherits datahandlers.dataset_handler.PreprocessRule, and implements its preprocess() interface to return a list that contains two torch.Tensor for training and testing, respectively.

2.2.3 Example of parsing and saving the first CL fold from the GDSC dataset

In the Python console.

>>> from datahandlers.dataset_handler import DRPGeneralDataset
>>> from datahandlers.custom_preprocess_rules import Standardization
>>> GDSC = DRPGeneralDataset()
>>> GDSC.load_from_csv('GDSC',
    'data/DRP2022_preprocessed/sanger/sanger_broad_ccl_log2tpm.csv',
    'data/DRP2022_preprocessed/drug_features/gdsc_drug_descriptors.csv',
    'data/DRP2022_preprocessed/drug_response/gdsc_tuple_labels_folds.csv')
>>> train, test = GDSC.get_fold('cl_fold', 0, preprocess=Standardization(), save=True)
>>> print(len(train), len(test))
259386 66319

In the above example, passing save=True saves all tensor files (.pt) under ./tensors/Standardization/GDSC/cl_fold0/. It is recommended to do so.

3 Running Experiments

3.1 Loading from existing tensor files

In the console, type the following command with arguments source_path, batch_size, epochs and lr (the learning rate).

python train.py --source_path ./tensors/Standardization/GDSC/cl_fold0/ --batch_size 20 --epochs 100 --lr 1e-4

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
datacleaners		datacleaners
datahandlers		datahandlers
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
da_train.py		da_train.py
damlp_train.py		damlp_train.py
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

xingbpshen/MTDRP

Folders and files

Latest commit

History

Repository files navigation

DRP 2022

1 Setup

1.1 Cloning the repository

1.2 Installing dependencies

2 Data Preparation

2.1 Downloading dataset

2.2 Parsing .csv files

2.2.1 Selecting data folding and the 2nd-stage preprocessing method

2.2.2 (Optional) Defining a new 2nd-stage preprocessing method

2.2.3 Example of parsing and saving the first CL fold from the GDSC dataset

3 Running Experiments

3.1 Loading from existing tensor files

About

Topics

Resources

Stars

Watchers

Forks

Languages