In [1]:
import torch
from chemprop.nn.predictors import (
    RegressionFFN,
    BinaryClassificationFFN,
    MulticlassClassificationFFN,
)

Example output of aggregation for input to the predictor. See the aggregation notebook for more details.

In [2]:
n_datapoints_in_batch = 2
hidden_dim = 300
example_aggregation_output = torch.randn(n_datapoints_in_batch, hidden_dim)

### Predictors

The molecule representation from message passing and aggregation is fed through a feed forward network to make the final property prediction. Three basic predictors differ in the prediction task they are used for. Multiclass classification needs to know how many possible classes the targets can be.

In [3]:
regression_ffn = RegressionFFN()
binary_class_ffn = BinaryClassificationFFN()
multi_class_ffn = MulticlassClassificationFFN(n_classes=3)

### Input dimension

The input dimension of the predictor defaults to the default dimension of the message passing hidden representation. If you message passing hidden dimension is different, or if you have addition datapoint descriptors, you need to change the predictor's input dimension.

In [4]:
ffn = RegressionFFN()
ffn(example_aggregation_output)

tensor([[-0.2016],
        [-0.1791]], grad_fn=<AddmmBackward0>)

In [5]:
shorter_hidden_rep = torch.randn(n_datapoints_in_batch, 3)
example_datapoint_descriptors = torch.randn(n_datapoints_in_batch, 12)

input_dim = shorter_hidden_rep.shape[1] + example_datapoint_descriptors.shape[1]

ffn = RegressionFFN(input_dim=input_dim)
ffn(torch.cat([shorter_hidden_rep, example_datapoint_descriptors], dim=1))

tensor([[0.1014],
        [0.0284]], grad_fn=<AddmmBackward0>)

### Output dimension

The number of tasks defaults to 1 but can be adjusted. Predictors that need to predict multiple values per task, like multiclass classification, will automatically adjust the output dimension.

In [6]:
ffn = RegressionFFN(n_tasks=4)
ffn(example_aggregation_output).shape

torch.Size([2, 4])

In [7]:
ffn = MulticlassClassificationFFN(n_tasks=4, n_classes=3)
ffn(example_aggregation_output).shape

torch.Size([2, 4, 3])

### Customization

The following hyperparameters of the predictor are customizable:
 - the hidden dimension between layer, default: 300
 - the number of layer, default 1
 - the dropout probability, default: 0.0 (i.e. no dropout)
 - which activation function, default: ReLU

In [8]:
custom_ffn = RegressionFFN(hidden_dim=600, n_layers=3, dropout=0.1, activation="tanh")
custom_ffn(example_aggregation_output)

tensor([[-0.0897],
        [-0.0634]], grad_fn=<AddmmBackward0>)

Intermediate hidden representations can also be extracted. Note that each predictor layer consists of an activation layer, followed by dropout, followed by a linear layer. The first predictor layer only has the linear layer.

In [9]:
layer = 2
custom_ffn.encode(example_aggregation_output, i=layer).shape

torch.Size([2, 600])

In [10]:
custom_ffn

RegressionFFN(
  (ffn): MLP(
    (0): Sequential(
      (0): Linear(in_features=300, out_features=600, bias=True)
    )
    (1): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=600, bias=True)
    )
    (2): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=600, bias=True)
    )
    (3): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=1, bias=True)
    )
  )
  (criterion): MSELoss(task_weights=[[1.0]])
  (output_transform): Identity()
)

### Criterion

Each predictor has a criterion that is used as the loss function during training. The default criterion for a predictor is defined in the predictor class.

In [11]:
print(RegressionFFN._T_default_criterion)
print(BinaryClassificationFFN._T_default_criterion)
print(MulticlassClassificationFFN._T_default_criterion)

<class 'chemprop.nn.loss.MSELoss'>
<class 'chemprop.nn.loss.BCELoss'>
<class 'chemprop.nn.loss.CrossEntropyLoss'>


A custom criterion can be given to the predictor. See the loss function notebook for more details.

In [12]:
from chemprop.nn import MSELoss

criterion = MSELoss(task_weights=torch.tensor([0.5, 1.0]))
ffn = RegressionFFN(n_tasks=2, criterion=criterion)

/home/knathan/anaconda3/envs/chemprop/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'criterion' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['criterion'])`.


### Regression vs. classification

In addition to using different loss functions, regression and classification predictors also differ in their tranforms of the model outputs during inference. Regression can use a scaler transform if target normalization was used during training. (See the scaling notebook for more details.) Classification uses a sigmoid (binary classification) or a softmax (multiclass) transform to keep class probability predictions between 0 and 1. 

In [13]:
import numpy as np
from chemprop.data import MoleculeDatapoint, MoleculeDataset
from chemprop.nn import UnscaleTransform

smis = ["C" * i for i in range(1, 4)]
ys = np.random.rand(len(smis), 1)
dataset = MoleculeDataset([MoleculeDatapoint.from_smi(smi, y) for smi, y in zip(smis, ys)])

output_scaler = dataset.normalize_targets()
output_transform = UnscaleTransform.from_standard_scaler(output_scaler)

unscaled_ffn = RegressionFFN()
scaled_ffn = RegressionFFN(output_transform=output_transform)

/home/knathan/anaconda3/envs/chemprop/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'output_transform' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['output_transform'])`.


In [14]:
unscaled_ffn(example_aggregation_output)

tensor([[-0.0452],
        [-0.5311]], grad_fn=<AddmmBackward0>)

In [15]:
scaled_ffn(example_aggregation_output)

tensor([[ 0.0404],
        [-0.1165]], grad_fn=<AddmmBackward0>)

In [16]:
probs = binary_class_ffn(example_aggregation_output)
(0 < probs).all() and (probs < 1).all()

tensor(True)

### Other predictors coming soon

Beta versions of predictors for uncertainty and spectral tasks will be finalized in v2.1.

In [17]:
from chemprop.nn.predictors import (
    MveFFN,
    EvidentialFFN,
    BinaryDirichletFFN,
    MulticlassDirichletFFN,
    SpectralFFN,
)