Skip to content

Davezqq/FOMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

FOMO:A Fragment-based Objective Molecule Optimization Framework

This is the official code for FOMO

We are cleaning the code and will release it soon.

Requirements

Operating systems: Ubuntu 20.04.4 LTS

  • python==3.10.6
  • pytorch==1.12.1
  • rdkit==2022.9.1
  • dgl-cuda11.3==0.9.1
  • pandas==1.5.3

Installation

Users can download the codes by executing the command:

git clone https://github.com/Davezqq/FOMO

Downloading may take seconds to minutes according to the quality of the network.

Data Organization

Train_set

The data used for training was randomly collected from ZINC, consisting of 2 files:

Molecule_dataset(src/data/train_set/Molecule_dataset.txt): This txt file consists of the SMILES representation of the molecules, and each row has 1 SMILES sequence.

Vocabulary(src/data/train_set/vocabulary.txt): This txt file consists of the fragments that are viewed as nodes. For each row, the first content is the SMILES of the fragment, and the second content is the time of showing up in the dataset.

Test_set

The datasets used for testing the performance in different optimization problems were published by Modof, there are 4 different tasks:

  • Improve the DRD score(src/data/test_set/drd2)
  • Improve the Penalized LogP score(src/data/test_set/plogp)
  • Improve the QED score(src/data/test_set/qed)
  • Improve the QED and DRD scores at the same time(src/data/test_set/drd2_25_qed6)

Training

We designed two different models: GAPM and GFPM, both are used in the process of molecule optimization.
The script to train GFPM is located in src/train.py
You can run the command to train GFPM as follows:
python3 train.py -ep 200 -bs 128 -nw 5 -ne 5 -feat 64 -nh 4 -nmul 4 -lr 1e-2 -data [the path of molecule data]
The meaning of the arguments is as follows:

  • ep: number of epochs
  • bs: the batch size of training
  • nw: number of workers used when loading the data
  • ne: number of the types of edges
  • feat: the dimensions of the feature vectors
  • nh: number of heads in RelMAL layers
  • nmul: number of RelMAL layers
  • lr: learning rate
  • data: the path of molecule data file

The script to train GAPM is located in src/PositionModel/train.py
You can run the command to train GFPM as follows:
python3 train.py -ep 200 -bs 128 -nw 5 -ne 5 -feat 64 -nh 4 -nmul 4 -lr 1e-2 -data [the path of molecule data]

Optimizing

The script for molecule optimization is located in src/batch_evaluate.py
You can run the command to optimize the molecules:
python3 batch_evaluate.py -dir [path of result files] -smi [path of test files] -ng 5 -oracle qed -gen 5 -pop 5 -sim 0.6 -clo_path [the path of pretrained GFPM] -pos_path [the path of pretrained GAPM]
The meaning of the arguments is as follows:

  • dir: The path of result files
  • smi: The path of test files
  • ng: number of parallel groups
  • oracle: the property of the molecules for optimization
  • gen: the number of iteration
  • pop: the number of the molecules kept after each iteration
  • sim: the constraint of similarity
  • clo_path: the path of pretrained GFPM model
  • pos_path: the path of pretrained GAPM model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages