Skip to content

Predicting the inhibitory response of drugs using Graph Convolutional Networks trained on censored data.

License

Notifications You must be signed in to change notification settings

alejogiley/ChemGraphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChemGraphs

Build Codacy Badge Coverage Status License: GPL v3

Predicting inhibitory response of ligands using Graph convolutional networks (GCNs) trained on censored data.

Background

Censored datasets are common in drug discovery and other fields where the data being measured is below a certain detection limit. Bioactivity assays are typically performed over a limited range of compound concentrations therefore some $IC50$ or $EC50$ values may be reported as being above or below a maximum or minimum concentration, resulting in censored data.

When analyzing censored data, traditional regression models may lead to biased estimates and incorrect conclusions. This is because traditional regression models assume that the censored values are missing at random, which is often not the case in practice. Ignoring the censored data or using imputation methods to estimate the missing values can also result in biased estimates and incorrect conclusions.

Censored regression models, on the other hand, are specifically designed to handle censored data. These models take into account the fact that the censored data is not missing at random and use likelihood-based methods to estimate the parameters of the model. This leads to unbiased estimates and more accurate predictions of the binding affinities.

Censored Datasets

Using a Tobit model as the loss function is a common approach when dealing with censored data in regression problems. The tobit model is a type of censored regression model that takes into account the censored values and estimates the parameters of the model using maximum likelihood methods.

Using a graph neural network (GNN) to predict the binding affinity from the ligand 2D structure is also a promising approach. GNNs are designed to handle graph-structured data, such as molecular structures, and have been shown to be effective in predicting molecular properties and activities.

By combining a GNN with a tobit loss function, this approach can improve the accuracy of the predictions and provide a powerful tool for drug discovery.

Installation

To install all dependencies, first you need to create a conda enviroment using the environment file provided here:

conda env create -f environment.yml
conda activate ChemGraph

To download and install the latest version from github:

git clone https://github.com/alejogiley/ChemGraphs.git
cd ChemGraphs
pip install .

Usage

The bioactivity datasets are provided in the datasets directory in SDF format. Data is taken from the The Binding Database.

To process the raw dataset into a format suitable for training the predictive model, you can use the setup_dataset.py application. You can select the metric type to use for the predictions, e.g. IC50 or Ki.

setup_dataset.py \
    --binding datasets/estrogen_receptor.sdf \
    --data_path datasets \
    --file_name "estrogen_dataset" \
    --metric_type "IC50"

To train a model, you can use the train_gcnn.py application. This script takes as input a formated dataset and trains a GCNN model using the tobit loss function. The model is saved to the specified path. You can define the number of epochs, batch size, learning rate, number of channels and layers, and the seed for the random number generator.

train_gcnn.py \
    --data_path datasets/estrogen_dataset.lz4 \
    --record_path history.csv \
    --model_path model.h5 \
    --metrics_path metrics.dat \
    --epochs 100 \
    --batch_size 32 \
    --learning_rate 1e-2 \
    --channels 64 16 \
    --n_layers 2 \
    --seed 0 \
    "maxlike_tobit_loss" 

Testing

To run the unit and integration tests, you can use the test.sh script provided here:

bash test.sh

About

Predicting the inhibitory response of drugs using Graph Convolutional Networks trained on censored data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published