Skip to content
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit f5dc096 Nov 20, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments ready Nov 12, 2019
smiles_transformer delete irrelevant files Nov 5, 2019
tests Set trainer Jan 24, 2019
.gitignore delete Nov 12, 2019
LICENSE Vocab builder Jan 24, 2019
README.md Update README.md Nov 21, 2019

README.md

SMILES Transformer

SMILES Transformer extracts molecular fingerprints from string representations of chemical molecules.
The transformer learns latent representation that is useful for various downstream tasks through autoencoding task.

Requirement

This project requires the following libraries.

  • NumPy
  • Pandas
  • PyTorch > 1.2
  • tqdm
  • RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Pre-training

After preparing the SMILES corpus for pre-training, run:

$ python pretrain_trfm.py

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

Cite

@article{honda2019smiles,
    title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery},
    author={Shion Honda and Shoi Shi and Hiroki R. Ueda},
    year={2019},
    eprint={1911.04738},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
You can’t perform that action at this time.