In [58]:
import autokeras as ak
import numpy as np
import pandas as pd
import reber
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.layers import Dense, LSTM, Dropout, Embedding, Activation
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.utils import plot_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

RANDOM_STATE = 42
PADDING_IDX = 0

The data generator creates four different types of strings: "valid" embedded reber strings, and then three different types of invalid embedded reber strings. See reber.py for a full description of each type of invalid string. For now, we don't bother to differentiate between the different types of invalid; we give them all the same "0" label. 

In [38]:
r = reber.ReberGenerator(max_length=15)
X, y = r.make_data(
    50000,
    valid=50,
    symmetry_disturbed=40,
    random=5,
    perturbed=5,  # perturbed is probably too low, given later testing
)
_, word_len = X.shape
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.15, random_state=RANDOM_STATE
)

I've represented the strings by ordinal numbers. Rather than representing the character's place in the English alphabet, they represent the position in *Reber* alphabet (which only consists of BEPSTVX). 0 is the padding character, so B=1, E=2, etc.
If you look below, you can see that the minimum length for a reber string is 8 characters. At position 8 (starting from 0), the first instance of the 0 padding character appears.
You can also see that, because of the "monte carlo" esque way in which I generated the strings (by walking through the grammar until it reaches the end), that lengths of the strings begins to drop off after 8.

In [25]:
X.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
count,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0,120000.0
mean,1.167342,4.001592,1.165125,3.974617,5.467817,5.190667,4.334725,4.184042,3.561342,2.449292,1.6654,1.104875,0.623558,0.370683,0.10195
std,0.83549,1.082135,0.828863,1.079428,1.229692,1.455466,1.86847,1.651977,1.71917,2.098686,2.006475,1.727532,1.289127,1.075308,0.4422
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,3.0,1.0,3.0,5.0,4.0,2.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,4.0,1.0,3.0,6.0,6.0,4.0,4.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0
75%,1.0,5.0,1.0,5.0,7.0,6.0,6.0,6.0,5.0,4.0,3.0,2.0,0.0,0.0,0.0
max,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0


Now I'm going to use autokeras to create a model to do the classification for me.
First, it embeds my ordinal vectors into some higher dimensional space (which it learns)
Then, it automatically determines how many layers of RNN it needs to do the classification.
You can see that it tries multiple different hyperparameters (presumably using some sort of bayesian search methods)

In [28]:
num_features = len(r._reber_letters) + 1  # include padding token

input_node = ak.Input()
output_node = ak.Embedding(
    max_features=num_features, pretraining="random", dropout_rate=0
)(input_node)
output_node = ak.RNNBlock(bidirectional=False, layer_type="lstm")(output_node)
output_node = ak.ClassificationHead()(output_node)
clf = ak.AutoModel(inputs=input_node, outputs=output_node, max_trials=8)
clf.fit(X_train.to_numpy(), y_train.to_numpy(), verbose=2, epochs=10)

INFO:tensorflow:Reloading Oracle from existing project ./auto_model/oracle.json
INFO:tensorflow:Reloading Tuner from ./auto_model/tuner0.json
Train for 1063 steps, validate for 266 steps
Epoch 1/10
1063/1063 - 82s - loss: 0.6699 - accuracy: 0.5347 - val_loss: 0.6551 - val_accuracy: 0.5585
Epoch 2/10
1063/1063 - 83s - loss: 0.6583 - accuracy: 0.5480 - val_loss: 0.6588 - val_accuracy: 0.5531
Epoch 3/10
1063/1063 - 70s - loss: 0.6597 - accuracy: 0.5448 - val_loss: 0.6587 - val_accuracy: 0.5529
Epoch 4/10
1063/1063 - 64s - loss: 0.6596 - accuracy: 0.5468 - val_loss: 0.6579 - val_accuracy: 0.5542
Epoch 5/10
1063/1063 - 64s - loss: 0.6593 - accuracy: 0.5468 - val_loss: 0.6585 - val_accuracy: 0.5533
Epoch 6/10
1063/1063 - 63s - loss: 0.6616 - accuracy: 0.5454 - val_loss: 0.6604 - val_accuracy: 0.5521
Epoch 7/10
1063/1063 - 61s - loss: 0.6539 - accuracy: 0.5556 - val_loss: 0.6428 - val_accuracy: 0.5751
Epoch 8/10
1063/1063 - 63s - loss: 0.6427 - accuracy: 0.5678 - val_loss: 0.6426 - val_accura

Train for 1063 steps, validate for 266 steps
Epoch 1/10
1063/1063 - 73s - loss: 0.6606 - accuracy: 0.5478 - val_loss: 0.5455 - val_accuracy: 0.7114
Epoch 2/10
1063/1063 - 66s - loss: 0.1213 - accuracy: 0.9622 - val_loss: 0.0533 - val_accuracy: 0.9858
Epoch 3/10
1063/1063 - 68s - loss: 0.0680 - accuracy: 0.9836 - val_loss: 0.0605 - val_accuracy: 0.9852
Epoch 4/10
1063/1063 - 69s - loss: 0.0844 - accuracy: 0.9784 - val_loss: 0.0515 - val_accuracy: 0.9885
Epoch 5/10
1063/1063 - 63s - loss: 0.0675 - accuracy: 0.9836 - val_loss: 0.0487 - val_accuracy: 0.9887
Epoch 6/10
1063/1063 - 62s - loss: 0.0649 - accuracy: 0.9836 - val_loss: 0.0353 - val_accuracy: 0.9915
Epoch 7/10
1063/1063 - 63s - loss: 0.0458 - accuracy: 0.9885 - val_loss: 0.0431 - val_accuracy: 0.9907
Epoch 8/10
1063/1063 - 62s - loss: 0.0333 - accuracy: 0.9929 - val_loss: 0.0275 - val_accuracy: 0.9946
Epoch 9/10
1063/1063 - 59s - loss: 0.0384 - accuracy: 0.9910 - val_loss: 0.0367 - val_accuracy: 0.9905
Epoch 10/10
1063/1063 - 65s 

INFO:tensorflow:Oracle triggered exit
Train for 1329 steps, validate for 266 steps
Epoch 1/10
1329/1329 - 89s - loss: 0.6610 - accuracy: 0.5492 - val_loss: 0.6530 - val_accuracy: 0.5631
Epoch 2/10
1329/1329 - 77s - loss: 0.6515 - accuracy: 0.5590 - val_loss: 0.6461 - val_accuracy: 0.5701
Epoch 3/10
1329/1329 - 82s - loss: 0.6486 - accuracy: 0.5625 - val_loss: 0.6467 - val_accuracy: 0.5686
Epoch 4/10
1329/1329 - 78s - loss: 0.2888 - accuracy: 0.8532 - val_loss: 0.1026 - val_accuracy: 0.9739
Epoch 5/10
1329/1329 - 77s - loss: 0.0819 - accuracy: 0.9797 - val_loss: 0.0889 - val_accuracy: 0.9785
Epoch 6/10
1329/1329 - 77s - loss: 0.0654 - accuracy: 0.9842 - val_loss: 0.0418 - val_accuracy: 0.9908
Epoch 7/10
1329/1329 - 76s - loss: 0.0536 - accuracy: 0.9873 - val_loss: 0.0388 - val_accuracy: 0.9922
Epoch 8/10
1329/1329 - 83s - loss: 0.0350 - accuracy: 0.9921 - val_loss: 0.0255 - val_accuracy: 0.9952
Epoch 9/10
1329/1329 - 81s - loss: 0.0332 - accuracy: 0.9921 - val_loss: 0.0108 - val_accurac

In [31]:
model = clf.export_model()
model.save('test_model')

In [44]:
x = np.array(
    [
        r.encode_as_padded_ints(s)
        for s in ["BPBTSSXXVVEPE", "BPBTSSXXVVEPPE", "BPBTSSXXVVEPEE"]
    ]
)
model.predict(x)

array([[9.9877208e-01],
       [2.0613670e-04],
       [9.7213304e-01]], dtype=float32)

The model recognizes that two Ps cannot occur together near the end of the string, but not that two Es cannot occur together. Data needs some augmentation.