affinity-sampling

Introduction

This repository contains the code for the paper

focused on two aspects of designing AI-driven tools for solving the early-stage drug discovery problems. First, it reports a new GNN architecture for predicting affinity of small molecule ligand to protein target using a novel graph-based neural network architecture. Second, it showcases how naive application of commonly used performance evaluation strategies can yield overly optimistic performance metrics for a given ML model.

Installation

The code depends on a number of packages (moreover, specific combination of their versions facilitates good performance in terms of speed) and we recommend using conda package manager to automatically install these dependencies. To do so, ensure you have conda (actually, conda distribution should be pretty enough) installed, clone the repository and run

conda env create -f environment.yml

this will create an environment called affgnn and installed all neccessary packages there. After that, use

conda activate affgnn

to make the new envoronment usable. Note, however, that the presented configuration also requires CUDA to be usable on the syetem (the packages to be installed use CUDA 10.2).

Configuration and use

The code can be run as

python train.py

which will first train the model using the folds marked 0,1,2,3, then make prediction on fold 4 (this id can be set through argv_valFold and argv_testFold keys in affinity_module/config.py) and save the predicted affinities as text files.

The input data needs to be supplied in two ways: a .csv defining the structure of ligands, PDB codes and UniProtIDs of receptors and the distrubution of the data over five folds, and .dssp files defining the secondary structure elements of target proteins.

Location of the input.csv, as well as the location of the folder containing .dssp files for receptors, should then be set by editing affinity_module/config.py (see master_data_table and dssp_files_path keys respectively).

To prepare receptor data in .dssp format, the underlying receptor structures, eitehr in PDB or CIF (preferably) format, should be processed with DSSP program, e.g.,

mkdssp --output-format dssp protein.cif > protein.dssp

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
affinity_module		affinity_module
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

affinity-sampling

Introduction

Installation

Configuration and use

About

Releases

Packages

Contributors 2

Languages

License

SoftServeInc/affinity-by-GNN

Folders and files

Latest commit

History

Repository files navigation

affinity-sampling

Introduction

Installation

Configuration and use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages