Skip to content
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit f5dc096 Nov 20, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments ready Nov 12, 2019
smiles_transformer delete irrelevant files Nov 5, 2019
tests Set trainer Jan 24, 2019
.gitignore delete Nov 12, 2019
LICENSE Vocab builder Jan 24, 2019 Update Nov 21, 2019

SMILES Transformer

SMILES Transformer extracts molecular fingerprints from string representations of chemical molecules.
The transformer learns latent representation that is useful for various downstream tasks through autoencoding task.


This project requires the following libraries.

  • NumPy
  • Pandas
  • PyTorch > 1.2
  • tqdm
  • RDKit


Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.


After preparing the SMILES corpus for pre-training, run:

$ python

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.


    title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery},
    author={Shion Honda and Shoi Shi and Hiroki R. Ueda},
You can’t perform that action at this time.