## Molecule featurizers

In [1]:
from rdkit import Chem
from chemprop import utils
from chemprop.featurizers.molecule import MorganBinaryFeaturizer, MorganCountFeaturizer

### Molecule vs molgraph featurizers

Both molecule and [molgraph](./molgraph_molecule_featurizer.ipynb) featurizers take `rdkit.Chem.Mol` objects as input. Molgraph featurizers produce a `MolGraph` which is used in message passing. Molecule featurizers produce a vector of features that can be used as [extra datapoint descriptors](../data/datapoints.ipynb).

### Morgan fingerprint featurizers

Morgan fingerprint can either use a binary or count representation of molecular structures. The radius of structures, length of the fingerprint, and whether to include chirality can all be customized. The default radius is 2, the default length is 2048, and chirality is included by default.

In [2]:
import numpy as np
from rdkit import Chem
from chemprop.data import MoleculeDatapoint, MoleculeDataset

smis = ["C" * i for i in range(1, 11)]
ys = np.random.rand(len(smis), 1)

molecule_featurizer = MorganBinaryFeaturizer()
extra_datapoint_descriptors = [
    molecule_featurizer(utils.make_mol(smis[0], keep_h=False, add_h=False)) for smi in smis
]
datapoints = [
    MoleculeDatapoint.from_smi(smi, y, x_d=x_d)
    for smi, y, x_d in zip(smis, ys, extra_datapoint_descriptors)
]

In [3]:
molecule_featurizer = MorganCountFeaturizer(radius=3, length=1024, include_chirality=False)
morgan_fp = molecule_featurizer(Chem.MolFromSmiles(smis[0]))
morgan_fp.shape, morgan_fp

((1024,), array([0, 0, 0, ..., 0, 0, 0], dtype=int32))

### RDKit molecule featurizers

In [4]:
# Coming soon

### Custom

Any class that has a length and returns a numpy array when given an `rdkit.Chem.Mol` can be used as a molecule featurizer. 

In [5]:
class MyMoleculeFeaturizer:
    def __len__(self) -> int:
        return 1

    def __call__(self, mol: Chem.Mol) -> np.ndarray:
        total_atoms = mol.GetNumAtoms()
        return np.array([total_atoms])

In [6]:
mf = MyMoleculeFeaturizer()
mf(utils.make_mol(smis[0], keep_h=False, add_h=False))

array([1])