## Estid Lozano
## David Herrera

In [None]:
# imports
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
from keras.losses import BinaryCrossentropy
from random import randint, uniform
from tqdm.keras import TqdmCallback

# Exercise 1

Familiarize yourself with the keras machine learning library and
check the following tutorial to learn how to use the time distributed layer to implement a many-to-many LSTM network: https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/.

We now want to to use this architecture in order to predict all items of a sequence that are bigger than some given value.

For example, we have a sequence x = (1, 4, 2, 5, 1, 1, 6) and a threshold of 3. Then the output sequence should be y = (0, 1, 0, 1, 0, 0, 1).

**1.1.** Create a recurrent neural network with keras with one LSTM layer with a single unit. Make sure that the network is not fixed so sequences of a particular length but can receive sequences of arbitrary length.

Since we conduct binary classification (over sequences), use the **binary_crossentropy** as the loss function.

Hint: Check the possibility to leave the time dimension in the input shape as **None**.

In [None]:
n_neurons = 1

model = Sequential()
model.add(LSTM(n_neurons, input_shape=(None, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()

**1.2.** Write a *generator* function (check the **yield** command) that produces 100 training examples of random length between 20 and 50 (all sequences of the same batch may have the same length). Each training example $x_i$ is a sequence of numbers with (float) values between 0 and 10, and the corresponding label sequence $y_i$ should check whether the numbers are at least 10. That is, $y_i$ is a boolean vector of the same length as $x_i$.

Use the **model.fit_generator** command to train the model (choose an appropriate number of epochs).

In [None]:
def gen(n=100, min_len=20, max_len=50, min_val=0, max_val=10, threshold=3):
    X = [np.array([uniform(min_val, max_val) for i in range(randint(min_len, max_len))]) for i in range(n)]
    y = [(x >= threshold).astype(int) for x in X]
    X = [i.reshape(1, len(i), 1) for i in X]
    y = [i.reshape(1, len(i), 1) for i in y]
    while True:
        for i in range(n):
            yield X[i], y[i]

In [None]:
n_epoch = 20
n_train = 100

train_gen = gen(n_train)
model.fit(train_gen, epochs=n_epoch, steps_per_epoch=n_train, verbose=0, callbacks=[TqdmCallback(verbose=1)])

Test your model on a number of randomly chosen sequences. Is it able to make correct
predictions?

In [None]:
n_test = 10
test_gen = gen(n_test)
total_mistakes = 0
total_predicts = 0

for i in range(n_test):
    xi, yi = next(test_gen)
    yi = yi[0].T[0]
    yi_hat = np.round(model.predict(xi)[0].T[0], 0).astype(int)
    mistakes = sum(yi != yi_hat)
    # print("Expected:",yi,"Predicted",yi_hat,"Mistakes:",mistakes,"/",len(yi))
    total_mistakes += mistakes
    total_predicts += len(yi)
acc = 1 - round(total_mistakes / total_predicts, 3)
print("Accuracy:", acc)
print("The model is " + ("" if acc >= 0.9 else "not ") + "able to make correct predictions.")

# Exercise 2

In this exercise, we reproduce the experiments on the reber grammar.

**2.1.** Write a function to prepare a binary sequence database for a given symbolic database. Use one attribute for each possible symbol, and encode the x-sequences as sequences of binary vectors (with exactly one one for each symbol) and y-sequences as binary vectors with a 1 at the position of every possible next symbol (notated with | in the csv files).

Apply this function to the **reber1** and **reber2** datasets to produce binary sequence datasets. In this case, you should have sequences that have for each time step a 7-vector (for both x-sequences and y-sequences). If in doubt, look at the example in the book.

**2.2.** create a generator that returns batches of size one, each time an example from the dataset created in this way.

Hint: Maybe you want to use the modulo operator % and maintain an iterator variable, so that you keep iterating over the whole dataset.

**2.3.** What are your results if using a RNN layer or LSTM layer respectively with 20 units and training only on the first 400 examples (and validating on the ultimate 100 sequences)? Can you learn to perfectly predict the next possible symbols?