## Predictors

In [1]:
import torch
from chemprop.nn.predictors import (
    RegressionFFN,
    BinaryClassificationFFN,
    MulticlassClassificationFFN,
)

This is example output of [aggregation](./aggregation.ipynb) for input to the predictor.

In [2]:
n_datapoints_in_batch = 2
hidden_dim = 300
example_aggregation_output = torch.randn(n_datapoints_in_batch, hidden_dim)

### Feed forward network

The learned representation from message passing and aggregation is a vector like that of fixed representations. While other predictors like random forest could be used to make final predictions from this representation, Chemprop prefers and implements using a feed forward network as that allows for end-to-end training. Three basic Chemprop FFNs differ in the prediction task they are used for. Note that multiclass classification needs to know the number of classes.

In [3]:
regression_ffn = RegressionFFN()
binary_class_ffn = BinaryClassificationFFN()
multi_class_ffn = MulticlassClassificationFFN(n_classes=3)

### Input dimension

The input dimension of the predictor defaults to the default dimension of the message passing hidden representation. If you message passing hidden dimension is different, or if you have addition datapoint descriptors, you need to change the predictor's input dimension.

In [4]:
ffn = RegressionFFN()
ffn(example_aggregation_output)

tensor([[-0.2416],
        [-0.1840]], grad_fn=<AddmmBackward0>)

In [5]:
shorter_hidden_rep = torch.randn(n_datapoints_in_batch, 3)
example_datapoint_descriptors = torch.randn(n_datapoints_in_batch, 12)

input_dim = shorter_hidden_rep.shape[1] + example_datapoint_descriptors.shape[1]

ffn = RegressionFFN(input_dim=input_dim)
ffn(torch.cat([shorter_hidden_rep, example_datapoint_descriptors], dim=1))

tensor([[-0.0159],
        [ 0.3497]], grad_fn=<AddmmBackward0>)

### Output dimension

The number of tasks defaults to 1 but can be adjusted. Predictors that need to predict multiple values per task, like multiclass classification, will automatically adjust the output dimension.

In [6]:
ffn = RegressionFFN(n_tasks=4)
ffn(example_aggregation_output).shape

torch.Size([2, 4])

In [7]:
ffn = MulticlassClassificationFFN(n_tasks=4, n_classes=3)
ffn(example_aggregation_output).shape

torch.Size([2, 4, 3])

### Customization

The following hyperparameters of the predictor are customizable:

 - the hidden dimension between layer, default: 300
 - the number of layer, default 1
 - the dropout probability, default: 0.0 (i.e. no dropout)
 - which activation function, default: ReLU

In [8]:
custom_ffn = RegressionFFN(hidden_dim=600, n_layers=3, dropout=0.1, activation="tanh")
custom_ffn(example_aggregation_output)

tensor([[ 0.1102],
        [-0.1430]], grad_fn=<AddmmBackward0>)

Intermediate hidden representations can also be extracted. Note that each predictor layer consists of an activation layer, followed by dropout, followed by a linear layer. The first predictor layer only has the linear layer.

In [9]:
layer = 2
custom_ffn.encode(example_aggregation_output, i=layer).shape

torch.Size([2, 600])

In [10]:
custom_ffn

RegressionFFN(
  (ffn): MLP(
    (0): Sequential(
      (0): Linear(in_features=300, out_features=600, bias=True)
    )
    (1): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=600, bias=True)
    )
    (2): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=600, bias=True)
    )
    (3): Sequential(
      (0): Tanh()
      (1): Dropout(p=0.1, inplace=False)
      (2): Linear(in_features=600, out_features=1, bias=True)
    )
  )
  (criterion): MSELoss(task_weights=[[1.0]])
  (output_transform): Identity()
)

### Criterion

Each predictor has a criterion that is used as the [loss function](../loss_functions.ipynb) during training. The default criterion for a predictor is defined in the predictor class.

In [11]:
print(RegressionFFN._T_default_criterion)
print(BinaryClassificationFFN._T_default_criterion)
print(MulticlassClassificationFFN._T_default_criterion)

<class 'chemprop.nn.loss.MSELoss'>
<class 'chemprop.nn.loss.BCELoss'>
<class 'chemprop.nn.loss.CrossEntropyLoss'>


A custom criterion can be given to the predictor.

In [12]:
from chemprop.nn import MSELoss

criterion = MSELoss(task_weights=torch.tensor([0.5, 1.0]))
ffn = RegressionFFN(n_tasks=2, criterion=criterion)

### Regression vs. classification

In addition to using different loss functions, regression and classification predictors also differ in their tranforms of the model outputs during inference. 

Regression should use a [scaler transform](../scaling.ipynb) if target normalization is used during training.

Classification uses a sigmoid (for binary classification) or a softmax (for multiclass) transform to keep class probability predictions between 0 and 1. 

In [13]:
probs = binary_class_ffn(example_aggregation_output)
(0 < probs).all() and (probs < 1).all()

tensor(True)

### Other predictors coming soon

Beta versions of predictors for uncertainty and spectral tasks will be finalized in v2.1.

In [14]:
from chemprop.nn.predictors import (
    MveFFN,
    EvidentialFFN,
    BinaryDirichletFFN,
    MulticlassDirichletFFN,
    SpectralFFN,
)