## Aggregation

In [1]:
import torch
from chemprop.nn.agg import MeanAggregation, SumAggregation, NormAggregation, AttentiveAggregation

This is example output from [message passing](./message_passing.ipynb) for input to aggregation.

In [2]:
n_atoms_in_batch = 7
hidden_dim = 3
example_message_passing_output = torch.randn(n_atoms_in_batch, hidden_dim)
which_atoms_in_which_molecule = torch.tensor([0, 0, 1, 1, 1, 1, 2]).long()

### Combine nodes

The aggregation layer combines the node level represenations into a graph level representaiton (usually atoms -> molecule).

### Mean and sum aggregation 

Mean aggregation is recommended when the property to predict does not depend on the number of atoms in the molecules (intensive). Sum aggregation is recommended when the property is extensive, though usually norm aggregation is better.

In [3]:
mean_agg = MeanAggregation()
sum_agg = SumAggregation()

In [4]:
mean_agg(H=example_message_passing_output, batch=which_atoms_in_which_molecule)

tensor([[-0.4593, -0.1808, -0.3459],
        [ 0.9343, -0.1746,  0.7430],
        [-0.4747, -0.9394, -0.3877]])

In [5]:
sum_agg(H=example_message_passing_output, batch=which_atoms_in_which_molecule)

tensor([[-0.9187, -0.3616, -0.6917],
        [ 3.7373, -0.6986,  2.9720],
        [-0.4747, -0.9394, -0.3877]])

### Norm aggregation

Norm aggregation can be better than sum aggregation when the molecules are large as it is best to keep the hidden representation values on the order of 1 (though this is less important when batch normalization is used). The normalization constant can be customized (defaults to 100.0).

In [6]:
norm_agg = NormAggregation()
big_norm = NormAggregation(norm=1000.0)

In [7]:
norm_agg(H=example_message_passing_output, batch=which_atoms_in_which_molecule)

tensor([[-0.0092, -0.0036, -0.0069],
        [ 0.0374, -0.0070,  0.0297],
        [-0.0047, -0.0094, -0.0039]])

In [8]:
big_norm(H=example_message_passing_output, batch=which_atoms_in_which_molecule)

tensor([[-0.0009, -0.0004, -0.0007],
        [ 0.0037, -0.0007,  0.0030],
        [-0.0005, -0.0009, -0.0004]])

### Attentive aggregation 

This uses a learned weighted average to combine atom representations within a molecule graph. It needs to be told the size of the hidden dimension as it uses the hidden representation of each atom to calculate the weight of that atom. 

In [9]:
att_agg = AttentiveAggregation(output_size=hidden_dim)

In [10]:
att_agg(H=example_message_passing_output, batch=which_atoms_in_which_molecule)

tensor([[-0.4551, -0.1791, -0.3438],
        [ 0.9370,  0.1375,  0.3714],
        [-0.4747, -0.9394, -0.3877]], grad_fn=<ScatterReduceBackward0>)