Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



15 Commits

Repository files navigation

FOMO:A Fragment-based Objective Molecule Optimization Framework

This is the official code for FOMO

We are cleaning the code and will release it soon.


Operating systems: Ubuntu 20.04.4 LTS

  • python==3.10.6
  • pytorch==1.12.1
  • rdkit==2022.9.1
  • dgl-cuda11.3==0.9.1
  • pandas==1.5.3


Users can download the codes by executing the command:

git clone

Downloading may take seconds to minutes according to the quality of the network.

Data Organization


The data used for training was randomly collected from ZINC, consisting of 2 files:

Molecule_dataset(src/data/train_set/Molecule_dataset.txt): This txt file consists of the SMILES representation of the molecules, and each row has 1 SMILES sequence.

Vocabulary(src/data/train_set/vocabulary.txt): This txt file consists of the fragments that are viewed as nodes. For each row, the first content is the SMILES of the fragment, and the second content is the time of showing up in the dataset.


The datasets used for testing the performance in different optimization problems were published by Modof, there are 4 different tasks:

  • Improve the DRD score(src/data/test_set/drd2)
  • Improve the Penalized LogP score(src/data/test_set/plogp)
  • Improve the QED score(src/data/test_set/qed)
  • Improve the QED and DRD scores at the same time(src/data/test_set/drd2_25_qed6)


We designed two different models: GAPM and GFPM, both are used in the process of molecule optimization.
The script to train GFPM is located in src/
You can run the command to train GFPM as follows:
python3 -ep 200 -bs 128 -nw 5 -ne 5 -feat 64 -nh 4 -nmul 4 -lr 1e-2 -data [the path of molecule data]
The meaning of the arguments is as follows:

  • ep: number of epochs
  • bs: the batch size of training
  • nw: number of workers used when loading the data
  • ne: number of the types of edges
  • feat: the dimensions of the feature vectors
  • nh: number of heads in RelMAL layers
  • nmul: number of RelMAL layers
  • lr: learning rate
  • data: the path of molecule data file

The script to train GAPM is located in src/PositionModel/
You can run the command to train GFPM as follows:
python3 -ep 200 -bs 128 -nw 5 -ne 5 -feat 64 -nh 4 -nmul 4 -lr 1e-2 -data [the path of molecule data]


The script for molecule optimization is located in src/
You can run the command to optimize the molecules:
python3 -dir [path of result files] -smi [path of test files] -ng 5 -oracle qed -gen 5 -pop 5 -sim 0.6 -clo_path [the path of pretrained GFPM] -pos_path [the path of pretrained GAPM]
The meaning of the arguments is as follows:

  • dir: The path of result files
  • smi: The path of test files
  • ng: number of parallel groups
  • oracle: the property of the molecules for optimization
  • gen: the number of iteration
  • pop: the number of the molecules kept after each iteration
  • sim: the constraint of similarity
  • clo_path: the path of pretrained GFPM model
  • pos_path: the path of pretrained GAPM model


No description, website, or topics provided.






No releases published


No packages published
