## Deep learning experiments

This notebook accumulates deep learning experiments for all benchmark datasets that can be handled with standard reservoir memory machines, in particular the latch, copy, repeat copy, and signal copy task. We test here a standard gated recurrent unit as well as a gated recurrent unit with a state mechanism like the reservoir memory machine.

*Note:* Executing this notebook may take very long, especially for the signal copy task.

In [1]:
# in this first cell we set some experimental meta-parameters that are used across all
# datasets

# the number of training time series
N = 90
# the number of test time series
N_test = 10
# the number of repeats for the experiments
R = 3
# the names of the tasks to be performed
tasks = ['latch', 'copy', 'repeat_copy', 'signal_copy']
# the number of neurons for each task
ms = [64, 256, 256, 64]
# the number of input dimensions for each task
ns = [1, 9, 9, 2]
# the memory size for each task
Ks = [2, 21, 11, 2]
# the output size for each task
Ls = [1, 8, 8, 1]

# hyper-parameters for training

# the maximum number of epochs
num_epochs = 1000
# loss threshold for early stopping
loss_threshold = 1E-3
# minibatch size
minibatch_size = 32
# the learning rate
lr = 1E-3
# the weight decay factor
weight_decay = 1E-8

# variables for error reporting
# update factor for moving average over the loss
avg_factor = 0.1
# number of steps until error is printed
print_step = 50

# model names
models = ['GRU', 'GRU-MM']

## Experiment

After all the hyperparameter setup above we can now iterate over all tasks and
first perform hyperparameter optimization, followed by the actual experiment.

In [2]:
import numpy as np
import torch
import rmm2.deep_memory_machine as dmm
import json
import random
import time
from dataset_generators import generate_data

# iterate over all tasks
for task_idx in range(len(tasks)):
    task = tasks[task_idx]
    print('------ Task %d of %d: %s -----' % (task_idx+1, len(tasks), task))
    # load task hyperparameters
    m = ms[task_idx]
    n = ns[task_idx]
    K = Ks[task_idx]
    L = Ls[task_idx]
    # initialize error and runtime arrays
    errors   = np.zeros((len(models), R))
    runtimes = np.zeros((len(models), R))
    # iterate over all experimental repeats
    for r in range(R):
        print('--- repeat %d of %d ---' % (r+1, R))
        # sample training and test data
        Xs, Qs, Ys = generate_data(N, task)
        Xs_test, Qs_test, Ys_test = generate_data(N_test, task)
        # now iterate over all models
        for model_idx in range(len(models)):
            model = models[model_idx]
            print('-- model: %s --' % model)
            # set up the model
            start_time = time.time()
            if model == 'GRU':
                net = dmm.GRUInterface(m, n, L)
            elif model == 'GRU-MM':
                net = dmm.DeepMemoryMachine(m, n, K, L)
            # set up an optimizer
            optim = torch.optim.Adam(net.parameters(), lr = lr, weight_decay = weight_decay)
            # set up aux variables
            loss_avg = None
            j = N
            # start training
            for epoch in range(num_epochs):
                optim.zero_grad()
                # generate a new permutation of the data
                if j + minibatch_size >= N:
                    pi = np.random.permutation(N)
                    j = 0
                # generate minibatch of data
                minibatch_loss = torch.zeros(1)
                for i in range(minibatch_size):
                    X, Q, Y = Xs[pi[j]], Qs[pi[j]], Ys[pi[j]]
                    j += 1
                    # compute the loss
                    if model == 'GRU':
                        Ypred = net(X)
                        loss  = torch.nn.functional.mse_loss(Ypred, torch.tensor(Y, dtype=torch.float))
                    else:
                        loss  = net.compute_teacher_forcing_loss(X, Q, Y)
                    # add to minibatch
                    minibatch_loss = minibatch_loss + loss
                # compute gradient
                minibatch_loss.backward()
                # perform optimization step
                optim.step()
                # record loss
                if loss_avg is None:
                    loss_avg = minibatch_loss.item() / minibatch_size
                else:
                    loss_avg = avg_factor * minibatch_loss.item() / minibatch_size + (1. - avg_factor) * loss_avg
                if (epoch + 1) % print_step == 0:
                    print('moving average loss in epoch %d: %g' % (epoch+1, loss_avg))
                if loss_avg < loss_threshold:
                    print('ended training already after %d epochs because moving average loss %g was below loss threshold.' % (epoch + 1, loss_avg))
                    break
            # measure the RMSE on the test data
            mse = 0.
            for i in range(N_test):
                Ypred = net(Xs_test[i])
                mse   += np.mean((Ypred.detach().numpy() - Ys_test[i]) ** 2)
            rmse = np.sqrt(mse / N_test)
            runtimes[model_idx, r] = time.time() - start_time
            errors[model_idx, r] = rmse
    # print results
    for model_idx in range(len(models)):
        print('%s: %g +- %g (took %g seconds)' % (models[model_idx], np.mean(errors[model_idx, :]), np.std(errors[model_idx, :]), np.mean(runtimes[model_idx, :])))
    # write results to file
    np.savetxt('%s_deep_errors.csv' % task, errors.T, delimiter='\t', header='\t'.join(models), comments='')
    np.savetxt('%s_deep_runtimes.csv' % task, runtimes.T, delimiter='\t', header='\t'.join(models), comments='')

------ Task 4 of 4: signal_copy -----
--- repeat 1 of 2 ---
-- model: GRU --
moving average loss in epoch 50: 4371.52
moving average loss in epoch 100: 4537.75
moving average loss in epoch 150: 3546.32
moving average loss in epoch 200: 4063.7
moving average loss in epoch 250: 3967.71
moving average loss in epoch 300: 5051.39
moving average loss in epoch 350: 5258.5
moving average loss in epoch 400: 5569.77
moving average loss in epoch 450: 3458.72
moving average loss in epoch 500: 6627.35
moving average loss in epoch 550: 3168.82
moving average loss in epoch 600: 4934.2
moving average loss in epoch 650: 5721
moving average loss in epoch 700: 3906.8
moving average loss in epoch 750: 5287.53
moving average loss in epoch 800: 6130.22
moving average loss in epoch 850: 6855.02
moving average loss in epoch 900: 4567.7
moving average loss in epoch 950: 3454.58
moving average loss in epoch 1000: 6276.74
-- model: GRU-MM --
moving average loss in epoch 50: 5077.46
moving average loss in epoch 1