# Group invariant neural networks via restricted weights

## Introduction

In this notebook we use the approach from *Jason Hartford, Devon Graham, Kevin Leyton-Brown, Siamak Ravanbakhsh: “Deep Models of Interactions Across Sets”* for group invariant machine learning. (This is a generalisation of *Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola: “Deep Sets”* and a special case of *Haggai Maron, Or Litany, Gal Chechik, Ethan Fetaya: “On Learning Sets of Symmetric Elements”*) We apply this to the dataset from *P.S. Green, T. Hubsch and C. A. Lutken: “All Hodge Numbers of All Complete Intersection Calabi-Yau Manifolds”*. Namely, we learn learn the first Hodge numbers. Random row and column transformations were applied to input matrices.

## Description of the group invariant neural network architecture

The approach works for functions whose inputs are matrices. Given a linear map `f` and an input matrix `M`, one can act by a permutation s on the input `M`, or on the output `M`. f is called equivariant if `f(s·M)=s·f(M)`.

How many linear maps are there satisfying this property? Surprisingly, the answer is **four**. And that doesn't depend on the size of the matrix. The four linear maps are:

1. Multiply every entry of M by some fixed number
2. Take the average of every row, write it next to each other to get a matrix of the original size, and multiply the result by some fixed number
3. Do the same but for columns
4. Take the average of all matrix elements, repeat that number to get a matrix of the original size, and multiply it by some fixed number

We can take many of these maps one after another, but also simultaneously (in “channels”) to create a neural network with many parameters. The result is an equivariant function. One gets an invariant function from this by taking a pooling operation, for example the sum over all elements.


## Data loading

In [1]:
import numpy as np
X = np.load('data/matrices_permuted.npy')
y = np.load('data/hodge_numbers.npy')[:,0]

## Definition of the neural network architecture

In [2]:
from keras import layers, models, optimizers
from sklearn.model_selection import train_test_split
import numpy as np


def equivariant_layer(inp, number_of_channels_in, number_of_channels_out):
    # four parameters:
    # (1) Multiply every element of the matrix by a parameter
    # (2) Take the average of every row, which gives a 12x1 matrix. Write that 15 times next to each other to get a 12x15 matrix. Multiply the result by a parameter
    # (3) Same for columns
    # (4) Take the average of all matrix elements, which gives a 1x1 matrix. Repeat that number to get a 12x15 matrix. Multiply the result by a parameter
    # inp = layers.Reshape((12, 15, number_of_channels_in))(inp)

    # ---(1)---
    out1 = layers.Conv2D(number_of_channels_out, (1,1), strides=(1, 1), padding='valid', use_bias=False, activation='relu')(inp)

    # ---(2)---
    out2 = layers.AveragePooling2D((1, 15), strides=(1, 1), padding='valid')(inp)
    repeated2 = [out2 for _ in range(15)]
    out2 = layers.Concatenate(axis=2)(repeated2)
    out2 = layers.Conv2D(number_of_channels_out, (1,1), strides=(1, 1), padding='valid', use_bias=False, activation='relu')(out2)

    # ---(3)---
    out3 = layers.AveragePooling2D((12, 1), strides=(1, 1), padding='valid')(inp)
    repeated3 = [out3 for _ in range(12)]
    out3 = layers.Concatenate(axis=1)(repeated3)
    out3 = layers.Conv2D(number_of_channels_out, (1,1), strides=(1, 1), padding='valid', use_bias=False, activation='relu')(out3)

    # ---(4)---
    out4 = layers.AveragePooling2D((12, 15), strides=(1, 1), padding='valid')(inp)
    repeated4 = [out4 for _ in range(12)]
    out4 = layers.Concatenate(axis=1)(repeated4)
    repeated4 = [out4 for _ in range(15)]
    out4 = layers.Concatenate(axis=2)(repeated4)
    out4 = layers.Conv2D(number_of_channels_out, (1,1), strides=(1, 1), padding='valid', use_bias=True, activation='relu')(out4)

    return layers.Add()([out1,out2,out3,out4])



In [3]:
import tensorflow as tf

def soft_acc(y_true, y_pred):
    '''Given two vectors, round both of them element-wise and return the fraction of
    elements that are equal.'''
    y_pred = tf.cast(tf.round(y_pred), tf.float32)
    y_true = tf.cast(y_true, tf.float32)
    return tf.reduce_mean(tf.cast(tf.equal(tf.round(y_true), tf.round(y_pred)), tf.float32))

def get_hartford_network(pooling='sum'):
    '''This constructs the neural network. The architecture is:
    3 equivariant layers, followed by pooling, followed by one hidden fully
    connected layer.'''
    number_of_channels = 32
    inp = layers.Input(shape=(12,15,1))
    inp_list = [inp for _ in range(number_of_channels)]
    inp_duplicated = layers.Concatenate(axis=3)(inp_list)
    e1 = equivariant_layer(inp_duplicated, number_of_channels, number_of_channels)
    e2 = equivariant_layer(e1, number_of_channels, number_of_channels)
    e3 = equivariant_layer(e2, number_of_channels, number_of_channels)

    if pooling=='sum':
        p1 = layers.AveragePooling2D((12, 15), strides=(1, 1), padding='valid')(e3)
    else:
        p1 = layers.MaxPooling2D((12, 15), strides=(1, 1), padding='valid')(e3)
    p2 = layers.Reshape((number_of_channels,))(p1)
    fc1 = layers.Dense(32, activation='relu')(p2)
    out = layers.Dense(1, activation='linear')(fc1)

    model = models.Model(inputs=inp, outputs=out)
    model.compile(
        loss='mean_squared_error',
        optimizer=optimizers.Adam(0.001),
        metrics=[soft_acc],
    )
    return model

## Training the neural network

In [4]:
def train_hartford_network(X_train, y_train, X_test, y_test):
    model = get_hartford_network()
    history = model.fit(
        X_train, y_train,
        epochs=4,
        validation_data=(X_test, y_test),
        batch_size=1
    )
    return history.history['val_soft_acc'][-1]


model = get_hartford_network()
print(f'Test Accuracy of Hartford Neural Network after one run: {train_hartford_network(X, y, X, y)}')


Epoch 1/4
[1m7890/7890[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 1ms/step - loss: 3.4541 - soft_acc: 0.2254 - val_loss: 2.9639 - val_soft_acc: 0.4676
Epoch 2/4
[1m7890/7890[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 1ms/step - loss: 2.5091 - soft_acc: 0.3000 - val_loss: 2.1288 - val_soft_acc: 0.3894
Epoch 3/4
[1m7890/7890[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 1ms/step - loss: 2.2002 - soft_acc: 0.3222 - val_loss: 1.8726 - val_soft_acc: 0.3670
Epoch 4/4
[1m7890/7890[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 1ms/step - loss: 2.1187 - soft_acc: 0.3784 - val_loss: 2.4097 - val_soft_acc: 0.5312
Test Accuracy of Hartford Neural Network after one run: 0.5311787128448486
