## Deep learning experiments

This notebook accumulates deep learning experiments for all benchmark datasets that can be handled with standard reservoir memory machines, in particular the latch, copy, repeat copy, and signal copy task. We test here a standard gated recurrent unit as well as a gated recurrent unit with a state mechanism like the reservoir memory machine.

*Note:* Executing this notebook may take very long (> 1 day), especially for the signal copy task.

In [1]:
# in this first cell we set some experimental meta-parameters that are used across all
# datasets

# the number of training time series
N = 90
# the number of test time series
N_test = 10
# the number of repeats for the experiments
R = 3
# the names of the tasks to be performed
tasks = ['latch', 'copy', 'repeat_copy', 'signal_copy']
# the number of neurons for each task
ms = [64, 256, 256, 64]
# the number of input dimensions for each task
ns = [1, 9, 9, 2]
# the memory size for each task
Ks = [2, 21, 11, 2]
# the output size for each task
Ls = [1, 8, 8, 1]

# hyper-parameters for training

# the maximum number of epochs
num_epochs = 1000
# loss threshold for early stopping
loss_threshold = 1E-3
# minibatch size
minibatch_size = 32
# the learning rate
lr = 1E-3
# the weight decay factor
weight_decay = 1E-8

# variables for error reporting
# update factor for moving average over the loss
avg_factor = 0.1
# number of steps until error is printed
print_step = 50

# model names
models = ['GRU', 'GRU-MM']

## Experiment

After all the hyperparameter setup above we can now iterate over all tasks and
perform the experiments. Be advised that it may take ~a day to complete all cells.

In [None]:
import numpy as np
import torch
import rmm2.deep_memory_machine as dmm
import json
import random
import time
from dataset_generators import generate_data

# iterate over all tasks
for task_idx in range(len(tasks)):
    task = tasks[task_idx]
    print('------ Task %d of %d: %s -----' % (task_idx+1, len(tasks), task))
    # load task hyperparameters
    m = ms[task_idx]
    n = ns[task_idx]
    K = Ks[task_idx]
    L = Ls[task_idx]
    # initialize error and runtime arrays
    errors   = np.zeros((len(models), R))
    runtimes = np.zeros((len(models), R))
    # iterate over all experimental repeats
    for r in range(R):
        print('--- repeat %d of %d ---' % (r+1, R))
        # sample training and test data
        Xs, Qs, Ys = generate_data(N, task)
        Xs_test, Qs_test, Ys_test = generate_data(N_test, task)
        # now iterate over all models
        for model_idx in range(len(models)):
            model = models[model_idx]
            print('-- model: %s --' % model)
            # set up the model
            start_time = time.time()
            if model == 'GRU':
                net = dmm.GRUInterface(m, n, L)
            elif model == 'GRU-MM':
                net = dmm.DeepMemoryMachine(m, n, K, L)
            # set up an optimizer
            optim = torch.optim.Adam(net.parameters(), lr = lr, weight_decay = weight_decay)
            # set up aux variables
            loss_avg = None
            j = N
            # start training
            for epoch in range(num_epochs):
                optim.zero_grad()
                # generate a new permutation of the data
                if j + minibatch_size >= N:
                    pi = np.random.permutation(N)
                    j = 0
                # generate minibatch of data
                minibatch_loss = torch.zeros(1)
                for i in range(minibatch_size):
                    X, Q, Y = Xs[pi[j]], Qs[pi[j]], Ys[pi[j]]
                    j += 1
                    # compute the loss
                    if model == 'GRU':
                        Ypred = net(X)
                        loss  = torch.nn.functional.mse_loss(Ypred, torch.tensor(Y, dtype=torch.float))
                    else:
                        loss  = net.compute_teacher_forcing_loss(X, Q, Y)
                    # add to minibatch
                    minibatch_loss = minibatch_loss + loss
                # compute gradient
                minibatch_loss.backward()
                # perform optimization step
                optim.step()
                # record loss
                if loss_avg is None:
                    loss_avg = minibatch_loss.item() / minibatch_size
                else:
                    loss_avg = avg_factor * minibatch_loss.item() / minibatch_size + (1. - avg_factor) * loss_avg
                if (epoch + 1) % print_step == 0:
                    print('moving average loss in epoch %d: %g' % (epoch+1, loss_avg))
                if loss_avg < loss_threshold:
                    print('ended training already after %d epochs because moving average loss %g was below loss threshold.' % (epoch + 1, loss_avg))
                    break
            # measure the RMSE on the test data
            mse = 0.
            for i in range(N_test):
                Ypred = net(Xs_test[i])
                mse   += np.mean((Ypred.detach().numpy() - Ys_test[i]) ** 2)
            rmse = np.sqrt(mse / N_test)
            runtimes[model_idx, r] = time.time() - start_time
            errors[model_idx, r] = rmse
    # print results
    for model_idx in range(len(models)):
        print('%s: %g +- %g (took %g seconds)' % (models[model_idx], np.mean(errors[model_idx, :]), np.std(errors[model_idx, :]), np.mean(runtimes[model_idx, :])))
    # write results to file
    np.savetxt('%s_deep_errors.csv' % task, errors.T, delimiter='\t', header='\t'.join(models), comments='')
    np.savetxt('%s_deep_runtimes.csv' % task, runtimes.T, delimiter='\t', header='\t'.join(models), comments='')

------ Task 1 of 5: latch -----
--- repeat 1 of 3 ---
-- model: GRU --
moving average loss in epoch 50: 0.241303
moving average loss in epoch 100: 0.235658
moving average loss in epoch 150: 0.230335
moving average loss in epoch 200: 0.226912
moving average loss in epoch 250: 0.217274
moving average loss in epoch 300: 0.180054
moving average loss in epoch 350: 0.159764
moving average loss in epoch 400: 0.152201
moving average loss in epoch 450: 0.147697
moving average loss in epoch 500: 0.13902
moving average loss in epoch 550: 0.139724
moving average loss in epoch 600: 0.137675
moving average loss in epoch 650: 0.133143
moving average loss in epoch 700: 0.130034
moving average loss in epoch 750: 0.127991
moving average loss in epoch 800: 0.111145
moving average loss in epoch 850: 0.0237343
moving average loss in epoch 900: 0.00442437
moving average loss in epoch 950: 0.00250686
moving average loss in epoch 1000: 0.00184261
-- model: GRU-MM --
moving average loss in epoch 50: 0.961258
m

moving average loss in epoch 900: 0.0389793
moving average loss in epoch 950: 0.032572
moving average loss in epoch 1000: 0.0291979
-- model: GRU-MM --
moving average loss in epoch 50: 1.81226
moving average loss in epoch 100: 0.660042
moving average loss in epoch 150: 0.287556
moving average loss in epoch 200: 0.126725
moving average loss in epoch 250: 0.0554335
moving average loss in epoch 300: 0.0285062
moving average loss in epoch 350: 0.0181614
moving average loss in epoch 400: 0.0128996
moving average loss in epoch 450: 0.00973865
moving average loss in epoch 500: 0.00763889
moving average loss in epoch 550: 0.00609728
moving average loss in epoch 600: 0.00511588
moving average loss in epoch 650: 0.00428265
moving average loss in epoch 700: 0.00371517
moving average loss in epoch 750: 0.00324978
moving average loss in epoch 800: 0.00284412
moving average loss in epoch 850: 0.0025184
moving average loss in epoch 900: 0.00226947
moving average loss in epoch 950: 0.0020428
moving av

moving average loss in epoch 800: 0.0020335
moving average loss in epoch 850: 0.0017985
moving average loss in epoch 900: 0.00161357
moving average loss in epoch 950: 0.00144574
moving average loss in epoch 1000: 0.00130937
GRU: 0.445747 +- 0.0150524 (took 1063.18 seconds)
GRU-MM: 0.0201486 +- 0.00102987 (took 869.846 seconds)
------ Task 4 of 5: signal_copy -----
--- repeat 1 of 3 ---
-- model: GRU --
moving average loss in epoch 50: 886.771
moving average loss in epoch 100: 740.045
moving average loss in epoch 150: 726.217
moving average loss in epoch 200: 505
moving average loss in epoch 250: 727.621
moving average loss in epoch 300: 767.261
moving average loss in epoch 350: 651.56
moving average loss in epoch 400: 393.723
moving average loss in epoch 450: 488.027
moving average loss in epoch 500: 650.681
moving average loss in epoch 550: 901.012
moving average loss in epoch 600: 739.086
moving average loss in epoch 650: 897.329
moving average loss in epoch 700: 766.959
moving avera