GitHub - Luigi-Ferraro/MOViDA

MOViDA

Introduction

Our tool makes drug sensitivity prediction for cancer cell lines using a XAI model based on gene ontology hierarchy. We use multi-omics data, including mutations, amplifications, deletions and pathway activity to make predictions. This cutting-edge approach allows for a more accurate prediction of drug sensitivity by taking into account the specific genetic makeup of the cancer cell line. By utilizing gene ontology hierarchy and multi-omics data, our tool provides a powerful resource for researchers and clinicians looking to improve treatment outcomes for patients with cancer.

Environment set up

Hardware: GPU with CUDA >= 11
Software:
- Python >= 3.7
- Miniconda
- PyTorch >= 11.3 (verify on Pytorch the corrected version for your pc)
```
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
```
- Python libraries
  - networkx
  - numpy
  - pandas
  - torchmetrics
  - torchvision

Files

code
- main.py : starts training or testing
- param_data.py : contains a class that represents the data in input
- param_train.py : contains a class that represents the training parameters
- prepare_directories.py : creates directory and file for the experiment
- models : directory that contains classes which encode all the parameters and structure of the models, as well as functions needed for training them
  - MOViDA.py : class for drug sensitivity prediction
  - MOViDA_synergy.py : class for synergistic drug combination prediction
- parameters : directory that contains .txt files for data and parameters information, see below

Parameters

A parameters file is a .txt file that contains a Python dictionary with all the necessary information for data and training. An example of such a file would be:

{
    "input_dir"         : "data", 
    "names_mo"          : ["mut", "amp", "del"], 
    ...
}

data parameters : This is a python dictionary that contains the input data needed to launch a training process. The dictionary contains the following key-value pairs :
- "input_dir": This is the path to the directory where the input data files are located.
- "names_mo": This is a list of strings that contains the names of the different types of genetic modifications (e.g. "mut", "amp", "del")
- "drug2id": This is the path to the file that contains the mapping from drug names to drug IDs.
- "cell2id": This is the path to the file that contains the mapping from cell line names to cell line IDs.
- "mo_gene2id": This is a list of strings that contains the paths to the files that map the gene names to the genetic modification IDs for each type of modification.
- "train_file": This is the path to the file that contains the training set of drug-cell line pairs.
- "val_file": This is the path to the file that contains the validation set of drug-cell line pairs.
- "test_file": This is the path to the file that contains the test set of drug-cell line pairs.
- "all_file": This is the path to the file that contains all the drug-cell line pairs.
- "cell2features": This is a list of strings that contains the paths to the files that contain the features of the cell lines for each type of modification.
- "names_drfeat": This is a list of strings that contains the names of the different types of drug features (e.g. "pc", "vs").
- "drug2features": This is a list of strings that contains the paths to the files that contain the features of the drugs for each type of feature.
- "pathway_act_file": This is the path to the file that contains the pathway activity data.
- "ontology": This is the path to the file that contains the ontology data.
- "mo_gene2ontology": This is a list of strings that contains the paths to the files that map the genes to the ontology for each type of modification.
- "synergy_bool": This is a boolean value that indicates whether the training process should include synergy information (i.e. whether the output should predict synergy scores or not).
training parameters : This is a python dictionary that contains the training parameters needed to launch a training process :
- "epoch": This is the number of training iterations.
- "lr": This is the learning rate used for training.
- "batchsize": This is the number of samples in a batch used for training.
- "loss_func": This is the name of the loss function used for training.
- "metric_func": This is the name of the metric used to evaluate the performance of the model.
- "classifier": This is a boolean value that indicates whether the training process is for classification or regression.
- "wloss_bool": This is a boolean value that indicates whether the loss function is weighted or not.
- "focal_bool": This is a boolean value that indicates whether the Focal Loss is used or not.
- "f_alpha": This is the value used for Focal Loss alpha.
- "f_gamma": This is the value used for Focal Loss gamma.
- "sampler_bool": This is a boolean value that indicates whether the data should be oversampled or not.
- "ccl_hiddens": This is the number of nodes in hidden layers in the cell line encoder.
- "drug_hiddens": This is a string that contains the number of hidden layers in the drug encoder and how many nodes they have.
- "final_hiddens": This is the number of nodes in hidden layers in the final network.
- "max_val": This is the maximum value used for the target variable.
- "eps_weights": This is the parameter epsilon using to balance the weights.
- "expdir": This is the directory where all the experiments results will be stored.
- "currexpdir": This is the directory where the current experiment results will be stored, it will be in "expdir".
- "result_file": This is the path to the file where the test set predictions will be stored.
- "result_all_file": This is the path to the file where the all set predictions will be stored.
- "model_name": This is the name of python file storing the model used (without the file extension .py)
- "load_model": This is the path to the pre-trained model if any.

Start training

prepare dictionary files for data and parameter

execute this command whit the correct data_dict and param_dict

python code/main.py -data_dict data_dict -param_dict param_dict -cuda 0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
data_utils		data_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data_utils

data_utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MOViDA

Introduction

Environment set up

Files

Parameters

Start training

About

Releases

Packages

Languages

License

Luigi-Ferraro/MOViDA

Folders and files

Latest commit

History

Repository files navigation

MOViDA

Introduction

Environment set up

Files

Parameters

Start training

About

Resources

License

Stars

Watchers

Forks

Languages