## Molecule MolGraph featurizers

In [1]:
from chemprop.featurizers.molgraph.molecule import SimpleMoleculeMolGraphFeaturizer

This is an example molecule to featurize.

In [2]:
from rdkit import Chem

mol_to_featurize = Chem.MolFromSmiles("CC")

### Simple molgraph featurizer

A `MolGraph` represents the graph featurization of a molecule. It is made of atom features (`V`), bond features (`E`), and a mapping between atoms and bonds (`edge_index` and `rev_edge_index`). It is created by `SimpleMoleculeMolGraphFeaturizer`. 

In [3]:
featurizer = SimpleMoleculeMolGraphFeaturizer()
featurizer(mol_to_featurize)

MolGraph(V=array([[0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        1.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
        0.     , 0.12011],
       [0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.  

### Custom

The [atom](./atom_featurizers.ipynb) and [bond](./bond_featurizers.ipynb) featurizers used by the molgraph featurizer are customizable.

In [4]:
from chemprop.featurizers import MultiHotAtomFeaturizer, MultiHotBondFeaturizer

atom_featurizer = MultiHotAtomFeaturizer.organic()
bond_featurizer = MultiHotBondFeaturizer(stereos=[0, 1, 2, 3, 4])
featurizer = SimpleMoleculeMolGraphFeaturizer(
    atom_featurizer=atom_featurizer, bond_featurizer=bond_featurizer
)
featurizer(mol_to_featurize)

MolGraph(V=array([[0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
        0.     , 0.12011],
       [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,
        0.     , 0.12011]], dtype=float32), E=array([[0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
   

### Extra atom and bond features

If your [datapoints](../data/datapoints.ipynb) have extra atom or bond features, the molgraph featurizer needs to know the length of the extra features when it is created so that molecules without heavy atoms (molecular hydrogen) are featurized correctly and so that the bond feature array is the correct shape.

In [5]:
n_extra_atom_features = 3
n_extra_bond_features = 4
featurizer = SimpleMoleculeMolGraphFeaturizer(
    extra_atom_fdim=n_extra_atom_features, extra_bond_fdim=n_extra_bond_features
)

The [dataset](../data/datasets.ipynb) is given this custom featurizer and automatically handles the featurization including passing extra atom and bond features for each datapoint. 