## Polymer MolGraph Featurizers

In [4]:
from chemprop.featurizers.molgraph.polymer import PolymerMolGraphFeaturizer
from chemprop.data.datapoints import PolymerDatapoint

This is an example polymer to featurize

In [8]:
polymer_to_featurize = "[*:1]c1cc(F)c([*:2])cc1F.[*:3]c1c(O)cc(O)c([*:4])c1O|0.5|0.5|<1-3:0.5:0.5<1-4:0.5:0.5<2-3:0.5:0.5<2-4:0.5:0.5"
# A PolymerDatapoint must first be initialised to generate the Mol object
polymer = PolymerDatapoint.from_smi(polymer_to_featurize, y=1)

### Polymer Featurizer

A `PolymerMolGraph` represents the graph featurization of a polymer. It is made of atom features (`V`), bond features (`E`), atom weights (`V_w`), bond weights (`E_w`), a mapping between atoms and bonds (`edge_index` and `rev_edge_index`) and the degree of polymerisation in the form `1+log(Xn)`. It is created by `PolymerMolGraphFeaturizer`. 

In [9]:
featurizer = PolymerMolGraphFeaturizer()
featurizer(polymer)

MolGraph(V=array([[0.     , 0.     , 0.     , ..., 0.     , 1.     , 0.12011],
       [0.     , 0.     , 0.     , ..., 0.     , 1.     , 0.12011],
       [0.     , 0.     , 0.     , ..., 0.     , 1.     , 0.12011],
       ...,
       [0.     , 0.     , 0.     , ..., 0.     , 1.     , 0.12011],
       [0.     , 0.     , 0.     , ..., 0.     , 1.     , 0.12011],
       [0.     , 0.     , 0.     , ..., 0.     , 0.     , 0.15999]],
      dtype=float32), E=array([[0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 

Additional polymer properties can be accessed such as the atom and bond weights and the degree of polymerisation

In [10]:
# Atom weights
featurizer(polymer).V_w

array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5], dtype=float32)

In [11]:
# Bond weights
featurizer(polymer).E_w

array([1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ,
       1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ,
       1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5])

In [12]:
# Degree of polymerisation
featurizer(polymer).degree_of_poly

1.0

### Custom

The [atom](./atom_featurizers.ipynb) and [bond](./bond_featurizers.ipynb) featurizers used by the polymer featurizer are customizable.

In [14]:
from chemprop.featurizers import MultiHotAtomFeaturizer, MultiHotBondFeaturizer

atom_featurizer = MultiHotAtomFeaturizer.organic()
bond_featurizer = MultiHotBondFeaturizer(stereos=[0, 1, 2, 3, 4])
featurizer = PolymerMolGraphFeaturizer(
    atom_featurizer=atom_featurizer, bond_featurizer=bond_featurizer
)
featurizer(polymer)

MolGraph(V=array([[0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ,
        1.     , 0.12011],
       [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ,
        0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ,
        1.     , 0.12011],
       [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,
        0.  

### Extra atom and bond features

If your [datapoints](../data/datapoints.ipynb) have extra atom or bond features, the polymermolgraph featurizer needs to know the length of the extra features when it is created so that molecules without heavy atoms (molecular hydrogen) are featurized correctly and so that the bond feature array is the correct shape.

In [15]:
n_extra_atom_features = 3
n_extra_bond_features = 4
featurizer = PolymerMolGraphFeaturizer(
    extra_atom_fdim=n_extra_atom_features, extra_bond_fdim=n_extra_bond_features
)

The [dataset](../data/datasets.ipynb) is given this custom featurizer and automatically handles the featurization including passing extra atom and bond features for each datapoint. 