MORLD

MORLD is a molecule optimization method based on reinforcement learning and docking. This repository provides the source code of the main part of the MORLD software and its usage.

To run a demo, you need to prepare the enviroment described below.

Or simply go to MORLD web service (http://morld.kaist.ac.kr) and see the Tutorial page. The demo prepared at MORLD web service takes 1~2 days to get the result.

Prepare

Enviroment setting

MolDQN and its requirements (RL framework): https://github.com/google-research/google-research/tree/master/mol_dqn

The MORLD is based on MolDQN. Therefore, to run MORLD in standalone, you must have an environment that you can run MolDQN. Usage of the MORLD is similar with MolDQN too.

rdkit (QED score and molecule modification): https://www.rdkit.org/docs/Install.html
gym-molecule library (SA score): https://github.com/bowenliu16/rl_graph_generation/tree/master/gym-molecule
QuickVina2 (docking score): https://github.com/QVina/qvina
open babel (converting file types of a molecule): https://openbabel.org/docs/dev/Installation/install.html
mgltools for linux (preprocessing a target protein): http://mgltools.scripps.edu/downloads

Verified dependencies

The working of MORLD has been verified in the versions below.

python: 3.7.6
MolDQN: Latest github version (https://github.com/aksub99/MolDQN-pytorch)
rdkit: 2018.09.1
mgltools: mgltools_Linux-x86_64_1.5.7 (https://ccsb.scripps.edu/mgltools/downloads/)
gym-molecule: Latest github version (https://github.com/bowenliu16/rl_graph_generation/tree/master/gym-molecule)
QuickVina2: Latest github version (https://github.com/QVina/qvina)
open babel: 2.4.1
pandas: 1.0.1
baselines: Latest github version (https://github.com/openai/baselines#installation)
absl-py: 0.9.0
networkx: 2.4
numpy: 1.18.1
tensorflow: 1.14.0
gast: 0.3.3 (recommended to suppress warnings)

Preprocessing of a target protein

For running QuickVina2, the PDB file should have no ligand molecules. You need to remove the ligand molecules with tools like pymol before docking. And PDB file should be protonated to appropriately. You can use PDB2PQR server to protonate PDB file.

Also, the target protein is given as a pdbqt file format. Please follow the intruction of the below link to convert a pdb file format to a pdbqt file format.

http://autodock.scripps.edu/faqs-help/how-to/how-to-prepare-a-receptor-file-for-autodock4 or https://bioinformaticsreview.com/20200716/prepare-receptor-and-ligand-files-for-docking-using-python-scripts/

We provide an example pdbqt file of the protein DDR1 (discoidin domain receptor 1) 3zosA_prepared.pdbqt for demo in this repository.

Configuration file for docking

For running QuickVina2, you need a configuration file. Make the configuration file looks like below as "config.txt".

receptor = receptor.pdbqt
ligand = ligand.pdbqt

#binding_pocket
center_x = ###
center_y = ###
center_z = ###

size_x = ###
size_y = ###
size_z = ###

Fill the file name of the receptor at the placeholder of receptor. (You do not need to change the name of ligand file.) Fill the binding pocket information with the coordinate and the size of the grid box in Angstrom (Å).

An example configuration file for demo is also provides as config.txt in this repository.

Place the required files

MORLD works inside the MolDQN. Place the below files into mol_dqn/chemgraph/ directory.

optimized_BE.py file
3zos.pdbqt the receptor file with pdbqt format.
config.txt file

Usage

Choose the output directory

export OUTPUT_DIR="./save"

Set the initial molecule (lead molecule)

export INIT_MOL="C1CC2=CC=CC=C2N(C1)C(=O)CN3CCC(CC3)NC4=NC(=CC(=O)N4)C(F)(F)F"

Set your own initial molecule with a SMILES representation. The example SMILES is ZINC12114041 which is found by virtual screening against the protein DDR1 (3zos).

Set the hyperparameters

At mol_dqn/configs/ directory, there are json files for the hyperparameters. You can change thoes hyperparameters as your desire.

Optimization of binding affinity

python optimize_BE.py --model_dir=${OUTPUT_DIR} --start_molecule=${INIT_MOL} --hparams="./configs/bootstrap_dqn_step1.json"

hparams could be a custom json file.

Output

The output file optimized_result_total.txt contains the optimized molecules with SMILES, docking score, SA score, and QED score by tab delimiter.

MORLD web server (http://morld.kaist.ac.kr), in addition, provides files of docking pose of each optimized molecules.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
3zosA_prepared.pdb		3zosA_prepared.pdb
3zosA_prepared.pdbqt		3zosA_prepared.pdbqt
LICENSE.txt		LICENSE.txt
README.md		README.md
config.txt		config.txt
optimize_BE.py		optimize_BE.py
prepare_receptor4.py		prepare_receptor4.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3zosA_prepared.pdb

3zosA_prepared.pdb

3zosA_prepared.pdbqt

3zosA_prepared.pdbqt

LICENSE.txt

LICENSE.txt

README.md

README.md

config.txt

config.txt

optimize_BE.py

optimize_BE.py

prepare_receptor4.py

prepare_receptor4.py

Repository files navigation

MORLD

Prepare

Enviroment setting

Verified dependencies

Preprocessing of a target protein

Configuration file for docking

Place the required files

Usage

Choose the output directory

Set the initial molecule (lead molecule)

Set the hyperparameters

Optimization of binding affinity

Output

About

Releases

Packages

Languages

License

wsjeon92/morld

Folders and files

Latest commit

History

Repository files navigation

MORLD

Prepare

Enviroment setting

Verified dependencies

Preprocessing of a target protein

Configuration file for docking

Place the required files

Usage

Choose the output directory

Set the initial molecule (lead molecule)

Set the hyperparameters

Optimization of binding affinity

Output

About

Resources

License

Stars

Watchers

Forks

Languages