Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 1.15 KB

README.md

File metadata and controls

34 lines (24 loc) · 1.15 KB

Data processing scripts

make_fingerprints.py

Utility script to generate fingerprints for a set of smiles strings. By precomputing the fingerprints for all the fragments in our dataset, we can speed up training.

To use, pass in the moad.h5 file and specify the fingerprint type and output path.

Supported fingerprints:

  • rdk: RDKFingerprint (2048 bits)
  • rdk10: RDKFingerprint (path size 10) (2048 bits)
  • morgan: Mogan fingerprint (r=2) (2048 bits)
  • gobbi2d: Gobbi 2d pharmophocore fingerprint (folded to 2048 bits)

Usage:

usage: make_fingerprints.py [-h] -f FRAGMENTS -fp {rdk,rdk10,morgan,gobbi2d}
                            [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -f FRAGMENTS, --fragments FRAGMENTS
                        Path to fragemnts.h5 containing "frag_smiles" array
  -fp {rdk,rdk10,morgan,gobbi2d}, --fingerprint {rdk,rdk10,morgan,gobbi2d}
                        Which fingerprint type to generate
  -o OUTPUT, --output OUTPUT
                        Output file path (.h5)

MOAD Dataset

For instructions on working with MOAD data, see README_MOAD.md.