QSAR

The pharmaceutical industry relies on Quantitative structure−activity relationships (QSAR) models to predict a quantified biological response of a molecule based on its descriptors, which are essentially studied properties of the molecule. These descriptors vary in complexity and can range from simple molecular weight measures to complex geometric features. Drug discovery is a time-consuming and expensive process for pharma. A major purpose of these QSAR models is to help accelerate discovery of molecular drug candidates through reduced experimental work, and eventually bring a drug to market faster. Due to recent advances in Machine Learning and hardware capabilities, Deep Neural Networks (DNNs) serve as a promising tool to predict biological activity, such as receptor binding or enzyme inhibition, based on molecular descriptors.

This project implements a DNN based on the architecture and parameters described in the following paper:

Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E. and Svetnik, V., 2015. Deep neural nets as a method for quantitative structure–activity relationships. Journal of chemical information and modeling, 55(2), pp.263-274.

Getting Started

The data used for training and evaluation of the model is can be downloaded from the paper's supplementary section.

Paper's supplementary page

Both the training and test data are structured in such a way that each row represents a molecule. There is a single column called "Act" that represents the biological activity that is to be predicted. The rest of the columns are molecular descriptors.

Prerequisites

Docker
Pipenv

Docker Installation Documentation

Pipenv Installation Documentation

Installing

Install dependencies via Pipenv
Build Docker image based on Dockerfile

make build

Preparing the Data for Training

Specify the dataset of interest and its location. For example:

make preprocess DATASET=NK1 DATA=~/Documents/qsar/

Training the Model

Specify the dataset of interest and its location and override the batch size and number of epochs specified in the Makefile.

make train DATASET=NK1 DATA=~/Documents/qsar/ BATCH_SIZE=64 EPOCHS=128

Evaluating the Model

Specify the dataset of interest and its location. For example:

make evaluate DATASET=NK1 DATA=~/Documents/qsar/

The metric used to evaluate the model is the correlation coefficient (R2). According to Ma et al., a model with coefficient even as low as 0.30 is still useful since QSAR is used to prioritize a large number of molecular compounds so the activity prediction on a single molecular basis is less important. The paper recommends that the number of epochs should be set as high as possible (within hardware limits) to increase the R2. The trade off is time and resources vs. a higher R2.

Testing

Pytest

make test

Linting

Flake8 is the chosen linter

make lint

Acknowledgments

Thank you to Ma, J et al. for clear description of DNN architecture and supplementary data

Future Work

NVIDIA Docker image for GPU based Training
Error handling if weights aren't available for a dataset
Tests around the Preprocessor

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
evaluate.py		evaluate.py
helpers.py		helpers.py
neural_net.py		neural_net.py
preprocessing.py		preprocessing.py
tests.py		tests.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QSAR

Getting Started

Prerequisites

Installing

Preparing the Data for Training

Training the Model

Evaluating the Model

Testing

Linting

Acknowledgments

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QSAR

Getting Started

Prerequisites

Installing

Preparing the Data for Training

Training the Model

Evaluating the Model

Testing

Linting

Acknowledgments

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages