## Molecule featurizers

In [1]:
from chemprop.featurizers.molecule import MorganBinaryFeaturizer, MorganCountFeaturizer

These are example molecules to featurize.

In [2]:
from chemprop.utils import make_mol

smis = ["C" * i for i in range(1, 11)]
mols = [make_mol(smi, keep_h=False, add_h=False) for smi in smis]

### Molecule vs molgraph featurizers

Both molecule and [molgraph](./molgraph_molecule_featurizer.ipynb) featurizers take `rdkit.Chem.Mol` objects as input. Molgraph featurizers produce a `MolGraph` which is used in message passing. Molecule featurizers produce a 1D numpy array of features that can be used as [extra datapoint descriptors](../data/datapoints.ipynb).

In [3]:
from chemprop.data import MoleculeDatapoint

molecule_featurizer = MorganBinaryFeaturizer()

datapoints = [MoleculeDatapoint(mol, x_d=molecule_featurizer(mol)) for mol in mols]

molecule_featurizer(mols[0])

array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)

### Morgan fingerprint featurizers

Morgan fingerprint can either use a binary or count representation of molecular structures. The radius of structures, length of the fingerprint, and whether to include chirality can all be customized. The default radius is 2, the default length is 2048, and chirality is included by default.

In [4]:
mf = MorganCountFeaturizer(radius=3, length=1024, include_chirality=False)
morgan_fp = mf(mols[0])
morgan_fp.shape, morgan_fp

((1024,), array([0, 0, 0, ..., 0, 0, 0], dtype=int32))

### RDKit molecule featurizers

In [5]:
# Coming soon

### Custom

Any class that has a length and returns a 1D numpy array when given an `rdkit.Chem.Mol` can be used as a molecule featurizer. 

In [6]:
import numpy as np
from rdkit import Chem

class MyMoleculeFeaturizer:
    def __len__(self) -> int:
        return 1

    def __call__(self, mol: Chem.Mol) -> np.ndarray:
        total_atoms = mol.GetNumAtoms()
        return np.array([total_atoms])

In [7]:
mf = MyMoleculeFeaturizer()
mf(mols[0])

array([1])