# Introduction

This notebook is inspired by:

Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. _Nature_ 323, 533–536 (1986). __[https://doi.org/10.1038/323533a0](https://doi.org/10.1038/323533a0)__

In [1]:
# imports
import numpy as np
import keras as keras
import tensorflow as tf
from enum import Enum
from sklearn.model_selection import train_test_split

# Mirror Symmetry

The first example in the paper is a neural network that tests for mirror symmetry in vectors. A vector like \[1, 3, 2, 2, 3, 1\] has mirror symmetry, while a vector like \[1, 3, 2, 1, 3, 2\] does not.

In [2]:
# Generate symmetric and non-symmetric vectors
def generate_data(samples, length):
    X, y = [], []
    for _ in range(samples // 2):
        half = np.random.rand(length // 2)
        symmetric = np.concatenate([half, half[::-1]])
        non_symmetric = np.random.rand(length)
        X.append(symmetric)
        y.append(1.0)
        X.append(non_symmetric)
        y.append(0.0)
    return np.array(X), np.array(y)

# These values are taken straight from the paper
samples = 64
length = 6
epochs = 1425
epsilon = 0.1
alpha = 0.9

X, y = generate_data(samples, length)

# Note that the model has a single hidden layer with two nodes, and a single output
model = keras.Sequential()
model.add(keras.Input(shape=(length,)))
model.add(keras.layers.Dense(2))
model.add(keras.layers.Activation(keras.activations.sigmoid))
model.add(keras.layers.Dense(1))
model.add(keras.layers.Activation(keras.activations.sigmoid))
model.compile(
    optimizer=keras.optimizers.SGD(learning_rate=epsilon, momentum=alpha),
    loss=keras.losses.MeanSquaredError(),
    metrics=[
        'binary_accuracy'
    ],
)
model.fit(X, y, epochs=epochs, batch_size=samples, verbose=2)

Epoch 1/1425
1/1 - 0s - 132ms/step - binary_accuracy: 0.5000 - loss: 0.2506
Epoch 2/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5000 - loss: 0.2506
Epoch 3/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5000 - loss: 0.2505
Epoch 4/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5000 - loss: 0.2504
Epoch 5/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.4844 - loss: 0.2503
Epoch 6/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.4688 - loss: 0.2502
Epoch 7/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.4688 - loss: 0.2501
Epoch 8/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.4844 - loss: 0.2500
Epoch 9/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5156 - loss: 0.2499
Epoch 10/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5000 - loss: 0.2498
Epoch 11/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.4844 - loss: 0.2497
Epoch 12/1425
1/1 - 0s - 10ms/step - binary_accuracy: 0.5156 - loss: 0.2497
Epoch 13/1425
1/1 - 0s - 11ms/step - binary_accuracy: 0.5156 - loss: 0.2496
Epoch 14/1425
1/1 - 

<keras.src.callbacks.history.History at 0x3258eff50>

In [3]:
X_test, y_test = generate_data(samples // 2, length)

model.evaluate(X_test, y_test)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - binary_accuracy: 0.5938 - loss: 0.2813


[0.2813083231449127, 0.59375]

# Family Relationships

The second example in the paper modeled family relationships (father, mother, husband, wife, daughter, son, aunt, uncle, niece, nephew, brother, sister). 100 examples were used to a tain the model, and the remaining 4 were used as test data. Note that family structure in the paper was quite simple, and doesn't account for things the model didn't account for, like divorce, same-sex marriage, having children out of wedlock, non-binary gender, and remarriage. If things look "wrong" in my family-tree code, it's because I tried to stay loyal to the model in the paper.

In [4]:
# Classes to store a family tree in

class Name(Enum):
    CHRISTOPHER = 0
    PENELOPE = 1
    ANDREW = 2
    CHRISTINE = 3
    MARGARET = 4
    ARTHUR = 5
    VICTORIA = 6
    JAMES = 7
    JENNIFER = 8
    CHARLES = 9
    COLIN = 10
    CHARLOTTE = 11
    ROBERTO = 12
    MARIA = 13
    PIERRO = 14
    FRANCESCA = 15
    GINA = 16
    EMILIO = 17
    LUCIA = 18
    MARCO = 19
    ANGELA = 20
    TOMASO = 21
    ALFONSO = 22
    SOPHIA = 23

class Relationship(Enum):
    FATHER = 24
    MOTHER = 25
    HUSBAND = 26
    WIFE = 27
    SON = 28
    DAUGHTER = 29
    UNCLE = 30
    AUNT = 31
    BROTHER = 32
    SISTER = 33
    NEPHEW = 34
    NIECE = 35

first_family = [
    (Name.CHRISTOPHER, Relationship.WIFE, Name.PENELOPE),
    (Name.PENELOPE, Relationship.HUSBAND, Name.CHRISTOPHER),
    (Name.CHRISTOPHER, Relationship.SON, Name.ARTHUR),
    (Name.ARTHUR, Relationship.FATHER, Name.CHRISTOPHER),
    (Name.CHRISTOPHER, Relationship.DAUGHTER, Name.VICTORIA),
    (Name.VICTORIA, Relationship.FATHER, Name.CHRISTOPHER),
    (Name.PENELOPE, Relationship.SON, Name.ARTHUR),
    (Name.ARTHUR, Relationship.MOTHER, Name.PENELOPE),
    (Name.PENELOPE, Relationship.DAUGHTER, Name.VICTORIA),
    (Name.VICTORIA, Relationship.MOTHER, Name.PENELOPE),
    (Name.ANDREW, Relationship.WIFE, Name.CHRISTINE),
    (Name.CHRISTINE, Relationship.HUSBAND, Name.ANDREW),
    (Name.ANDREW, Relationship.SON, Name.JAMES),
    (Name.JAMES, Relationship.FATHER, Name.ANDREW),
    (Name.ANDREW, Relationship.DAUGHTER, Name.JENNIFER),
    (Name.JENNIFER, Relationship.FATHER, Name.ANDREW),
    (Name.CHRISTINE, Relationship.SON, Name.JAMES),
    (Name.JAMES, Relationship.MOTHER, Name.CHRISTINE),
    (Name.CHRISTINE, Relationship.DAUGHTER, Name.JENNIFER),
    (Name.JENNIFER, Relationship.MOTHER, Name.CHRISTINE),
    (Name.MARGARET, Relationship.HUSBAND, Name.ARTHUR),
    (Name.ARTHUR, Relationship.WIFE, Name.MARGARET),
    (Name.MARGARET, Relationship.NEPHEW, Name.COLIN),
    (Name.COLIN, Relationship.AUNT, Name.MARGARET),
    (Name.MARGARET, Relationship.NIECE, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.AUNT, Name.MARGARET),
    (Name.ARTHUR, Relationship.NEPHEW, Name.COLIN),
    (Name.COLIN, Relationship.UNCLE, Name.ARTHUR),
    (Name.ARTHUR, Relationship.NIECE, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.UNCLE, Name.ARTHUR),
    (Name.ARTHUR, Relationship.SISTER, Name.VICTORIA),
    (Name.VICTORIA, Relationship.BROTHER, Name.ARTHUR),    
    (Name.VICTORIA, Relationship.HUSBAND, Name.JAMES),
    (Name.JAMES, Relationship.WIFE, Name.VICTORIA),
    (Name.VICTORIA, Relationship.SON, Name.COLIN),
    (Name.COLIN, Relationship.MOTHER, Name.VICTORIA),
    (Name.VICTORIA, Relationship.DAUGHTER, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.MOTHER, Name.VICTORIA),
    (Name.JAMES, Relationship.SON, Name.COLIN),
    (Name.COLIN, Relationship.FATHER, Name.JAMES),
    (Name.JAMES, Relationship.DAUGHTER, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.FATHER, Name.JAMES),
    (Name.JAMES, Relationship.SISTER, Name.JENNIFER),
    (Name.JENNIFER, Relationship.BROTHER, Name.JAMES),
    (Name.JENNIFER, Relationship.HUSBAND, Name.CHARLES),
    (Name.CHARLES, Relationship.WIFE, Name.JENNIFER),
    (Name.JENNIFER, Relationship.NEPHEW, Name.COLIN),
    (Name.COLIN, Relationship.AUNT, Name.JENNIFER),
    (Name.JENNIFER, Relationship.NIECE, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.AUNT, Name.JENNIFER),
    (Name.CHARLES, Relationship.NEPHEW, Name.COLIN),
    (Name.COLIN, Relationship.UNCLE, Name.CHARLES),
    (Name.CHARLES, Relationship.NIECE, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.UNCLE, Name.CHARLES),
    (Name.COLIN, Relationship.SISTER, Name.CHARLOTTE),
    (Name.CHARLOTTE, Relationship.BROTHER, Name.COLIN),
]
second_family = [(Name(l.value + 12), m, Name(r.value + 12)) for (l, m, r) in first_family]
relationships = first_family + second_family

X_y = np.array([(l.value, m.value, r.value) for (l, m, r) in relationships])
X = X_y[:, 0:2]
y = keras.utils.to_categorical(X_y[:, 2:], num_classes=24)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.03)


# Note that the model has 5 layers. 
# The 36 inputs in the first layer correspond to (name, relationship) pairs.
# The 24 outputs are all name values.
inputs = len(Name) + len(Relationship)
outputs = len(Name)

model = keras.Sequential()
model.add(keras.Input(shape=(2,)))
model.add(keras.layers.CategoryEncoding(num_tokens=inputs, output_mode='multi_hot'))
model.add(keras.layers.Dense(6))
model.add(keras.layers.Activation(keras.activations.sigmoid))
model.add(keras.layers.Dense(12))
model.add(keras.layers.Activation(keras.activations.sigmoid))
model.add(keras.layers.Dense(12))
model.add(keras.layers.Activation(keras.activations.sigmoid))
model.add(keras.layers.Dense(outputs))
model.add(keras.layers.Activation(keras.activations.sigmoid))


# In the paper the learning rate and momentum change after 20 epochs from
# (0.005, 0.5) to (0.01, 0.9)
initial_epsilon = 0.005
initial_alpha = 0.5
initial_epochs = 20
final_epsilon = 0.005
final_alpha = 0.9
weight_decay = 0.002
epochs = 1500
samples = np.shape(X_train)[0]
correct_true = 0.8
correct_false = 0.2

sgd = keras.optimizers.SGD(learning_rate=initial_epsilon, momentum=initial_alpha, weight_decay = weight_decay)

# This is used to update the gradient descent parameters
class UpdateSGDHyperparameters(keras.callbacks.Callback):
    def __init__(self, optimizer):
        super().__init__()
        self.optimizer = optimizer
        self.epoch_threshold = initial_epochs
        self.new_lr = final_epsilon
        self.new_momentum = final_alpha

    def on_epoch_begin(self, epoch, logs=None):
        if epoch == self.epoch_threshold:
            print(f"Updating optimizer hyperparameters at epoch {epoch}")
            self.optimizer.learning_rate = self.new_lr
            self.optimizer.momentum = self.new_momentum

callback = UpdateSGDHyperparameters(sgd)

# In the paper, if a prediction is bigger than 0.8 and the desired value is 1.0,
# of if a prediction is smaller than 0.2 and the desired value is 1.0, the error
# gets set to 0.
def custom_loss_with_mse(y_true, y_pred):
    correct_on = tf.greater_equal(y_pred, correct_true) & tf.equal(y_true, 1)
    correct_off = tf.less_equal(y_pred, correct_false) & tf.equal(y_true, 0)
    incorrect_mask = tf.cast(~(correct_on | correct_off), tf.float32)
    return tf.reduce_mean(tf.square(y_pred - y_true) * incorrect_mask)

model.compile(
    optimizer=sgd,
    loss=custom_loss_with_mse,
    metrics=[
        'binary_accuracy'
    ],
)
model.fit(X_train, y_train, epochs=epochs, batch_size=samples, callbacks=[callback], verbose=2)

Epoch 1/1500
1/1 - 0s - 225ms/step - binary_accuracy: 0.5772 - loss: 0.2323
Epoch 2/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5772 - loss: 0.2322
Epoch 3/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5772 - loss: 0.2322
Epoch 4/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5772 - loss: 0.2321
Epoch 5/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5772 - loss: 0.2320
Epoch 6/1500
1/1 - 0s - 10ms/step - binary_accuracy: 0.5772 - loss: 0.2320
Epoch 7/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5779 - loss: 0.2319
Epoch 8/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5795 - loss: 0.2318
Epoch 9/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5802 - loss: 0.2317
Epoch 10/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5818 - loss: 0.2316
Epoch 11/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5826 - loss: 0.2316
Epoch 12/1500
1/1 - 0s - 10ms/step - binary_accuracy: 0.5845 - loss: 0.2315
Epoch 13/1500
1/1 - 0s - 11ms/step - binary_accuracy: 0.5876 - loss: 0.2314
Epoch 14/1500
1/1 - 

<keras.src.callbacks.history.History at 0x328777770>

In [5]:
model.evaluate(X_test, y_test)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step - binary_accuracy: 0.9167 - loss: 0.1427


[0.14269614219665527, 0.9166666865348816]