Skip to content

Novel molecules from a reference shape!

License

Notifications You must be signed in to change notification settings

playmolecule/ligdream

Repository files navigation

LigDream: Shape-Based Compound Generation

THIS PROJECT IS NOT LONGER ACTIVE. IT IS MADE AVAILABLE WITHOUT ANY SUPPORT.

Citing

If you are using content of the repository please consider citing the follow work:

@article{skalic2019shape,
  title={Shape-Based Generative Modeling for de-novo Drug Design},
  author={Skalic, Miha and Jim{\'e}nez Luna, Jos{\'e} and Sabbadin, Davide and De Fabritiis, Gianni},
  journal={Journal of chemical information and modeling},
  doi = {10.1021/acs.jcim.8b00706},
  publisher={ACS Publications}
}

Requirements

Model training is written in pytorch==0.3.1 and uses keras==2.2.2 for data loaders. RDKit==2017.09.2.0 and HTMD==1.13.9 are needed for molecule manipulation.

Add the repo to your pythonpath

  export PYTHONPATH=/path/to/ligdream/repo/:$PYTHONPATH

Before starting

For the training a smi file is needed. We used subset of the Zinc15 dataset, using only the drug-like. The same cleaned dataset can be retrieve by using the getDataset.sh script. The latter will download the smi file required for the training (see next section).

  bash getDataset.sh

In the traindataset folder there will be the zinc15_druglike_clean_canonical_max60.smi file that is required for the training step (see next section).

For the generation stage the model files are necessary. It is possible to use the ones that are generated during the training step or you can download the ones that we have already generated by using the following script:

  bash getWeights.sh

In the modelweights folder there will be the three models:

  • decoder-210000.pkl
  • encoder-210000.pkl
  • vae-210000.pkl

Training

Note that training runs on a GPU and it will take several days to complete.

First construct a set of training molecules:

$ python prepare_data.py -i "./path/to/my/smiles.smi" -o "./path/to/my/smiles.npy"

Secondly, execute the training of a model:

$ python train.py -i "./path/to/my/smiles.npy" -o "./path/to/models"

Generation

Web based compund generation is available at https://playmolecule.org/LigDream/.

For an example of local novel compound generation please follow notebook generate.ipynb.

License

Code is released under GNU AFFERO GENERAL PUBLIC LICENSE.