Simple example #8

allchemist · 2015-03-10T13:01:21Z

Hello!

Can you please provide a simple example of usage?

Thanks!

siemanko · 2015-03-10T15:42:42Z

Yes

On Tue, Mar 10, 2015, 9:01 AM allchemist notifications@github.com wrote:

Hello!

Can you please provide a simple example of usage?

Thanks!

—
Reply to this email directly or view it on GitHub
#8.

mheilman · 2015-03-10T22:14:50Z

Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the examples in the README to make something that will run.

#!/usr/bin/env python3

import numpy as np
import theano
import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells)


def main():
    # Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of
    # the input sequence.  This probably isn't really a good example use case for an LSTM, but it's simple.
    rng = np.random.RandomState(123456789)
    input_size = 1
    input_length = 10
    sample_size = 500
    num_iterations = 3
    examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
    labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
                       for x in examples],
                      dtype=theano.config.floatX)

    hidden_layer_size = 10
    num_hidden_layers = 2

    model = StackedCells(input_size,
                         layers=[hidden_layer_size] * num_hidden_layers,
                         activation=T.tanh,
                         celltype=LSTM)

    # Make the connections from the input to the first layer have linear activations.
    model.layers[0].in_gate2.activation = lambda x: x

    # Add an output layer to predict the labels for each time step.
    output_layer = Layer(hidden_layer_size, 1, T.nnet.sigmoid)
    model.layers.append(output_layer)

    def step(x, *prev_hiddens):
        activations = model.forward(x, prev_hiddens=prev_hiddens)
        return activations

    input_vec = T.vector('input_vec')
    input_mat = input_vec.dimshuffle((0, 'x'))

    result, _ = theano.scan(fn=step,
                            sequences=[input_mat],
                            outputs_info=([dict(initial=hidden_layer.initial_hidden_state, taps=[-1])
                                           for hidden_layer in model.layers[:-1]] +
                                          [dict(initial=T.zeros_like(model.layers[-1].bias_matrix), taps=[-1])]))

    target = T.vector('target')
    prediction = result[-1].T[0]

    cost = T.nnet.binary_crossentropy(prediction, target).mean()

    updates, _, _, _, _ = create_optimization_updates(cost, model.params)

    update_func = theano.function([input_vec, target], cost, updates=updates, allow_input_downcast=True)
    predict_func = theano.function([input_vec], prediction, allow_input_downcast=True)

    for cur_iter in range(num_iterations):
        for i, (example, label) in enumerate(zip(examples, labels)):
            c = update_func(example, label)
            if i % 100 == 0:
                print(".", end="")
        print()

    test_cases = [np.array([-1, 1, 0, 1, -2, 0, 1, 0, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([-2, -2, -2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 0, 0, 0, 2, 0, 0, 0, 0, -2, 0, 0, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX)]


    for example in test_cases:
        print("input", "output", sep="\t")
        for x, pred in zip(example, predict_func(example)):
            print(x, "{:.3f}".format(pred), sep="\t")
        print()

if __name__ == "__main__":
    main()

allchemist · 2015-03-10T22:24:15Z

Big thanks!

JonathanRaiman · 2015-03-11T02:48:26Z

Thanks !!!

allchemist · 2015-03-11T10:24:53Z

Sorry to bother, please help understand training data preparation.

As i understand, LSTM block recieves one value and returnes one value. So to make it learn a sequence we should prepare known sequence 'seq' as pairs [seq[1] , seq[2]], [seq[2] , seq[3]], [seq[3] , seq[4]] etc (first elem is train input, second - train target)

I tried different train data, but didnt manage to achieve the recurrence effect, so 'predict_func' returns same result on several repeated calls on same input. Looks like i failed in data preparation. Or it just resets its state after each 'predict_func' call?

If it isnt much trouble, can you show an example with sequence forecasting?
Currently i use Lstm in PyBrain, but very interested in theano implementation.

stephenjia · 2015-03-12T16:12:46Z

@mheilman Thanks for your code. But can I ask you how extend it to minibatch case? I tried to modified your code for minibatch case, but I suffered problems with the input . Can you give me some help? Thanks.
...
input_mat = T.matrix('input_mat')
input_tensor = input_mat.dimshuffle((1,'x',0))
...

mheilman · 2015-03-12T16:37:32Z

Sorry @stephenjia, but I don't have a good enough understanding of the best way to do that at the moment. One idea is to make another scan operation that iterates over the examples in a minibatch and makes updates for each (using the existing scan op). I'm not sure that's a good approach, though.

stephenjia · 2015-03-12T16:45:12Z

@mheilman Thanks all the same. I also tried the method in README file but I still got mistakes. I will try to find where I am wrong

JonathanRaiman · 2015-03-12T17:58:44Z

@stephenjia @mheilman To get a minibatch case to work with LSTMs in a sequence forecasting setting here is one way to get this to work:

Suppose you have n = 30 symbols to predict:

 n = 30

Build some layers out of LSTMs:

from theano_lstm import LSTM, StackedCells, Layer, masked_loss
hidden = 20
model = StackedCells(n, layers=[hidden, hidden], activation=T.tanh, celltype=LSTM)

Add a classifier:
model.layers.append(Layer(hidden, n, lambda x: T.nnet.softmax(x)[0]))

Construct the recurrent prediction:

def step(x, *prev_hiddens):
      new_states = model.forward(x, prev_hiddens[:-1], 0.0)
      return new_states

Store your mini batch as one big matrix with one row per example (if all examples don't have the same length then pad them with 0s or something else):

 observations = T.matrix()

 result, updates = theano.scan(step,
                             observations[:,:-1],
                             outputs_info=[
     dict(initial=layer.initial_hidden_state, taps=[-1])
     for layer in model.layers if hasattr(layer, 'initial_hidden_state')
                             ] + [None] )

Now the error is as follows, we apply KL divergence at each timestep between a prediction and the next tilmestep observation (replace targets with what you want, but it must have the same dimensions as result[-1] (e.g. the softmaxes from the LSTMs stacks above). Also create an array of lengths of observations (if all lengths are not equal then this says: "sequence one is 1 long, sequence 2 is 2 long, sequence 3 is 5 long, etc..."):
np.array([1, 2, 5, 2, 1, 3])

and tell the error where the sequences start:

 observation_starts = np.zeros(5, dtype=np.int32)

Tell the system what sequence needs to be forecasted (here we're just forecasting ourselves 1 step ahead):

targets = observations[:,1:]
observation_lengths = np.array([1, 2, 5, 2, 1, 3]) # etc... 
observation_starts = np.zeros(5, dtype=np.int32)
 error = masked_loss(result[-1], 
             targets,
             observation_lengths,
             observation_starts)

 error = error.sum()

Compute gradient descent:

updates, _, _, _, _ = create_optimization_updates(cost, model.params)

update_func = theano.function(
      [input_vec, target],
      cost,
      updates=updates,
      allow_input_downcast=True)

stephenjia · 2015-03-12T19:45:24Z

@JonathanRaiman Thanks for your detailed reply. I modified @mheilman example based on your suggestion, now I only play with the forward propagation part. However, there is a problem even before scan operation, says 'IndexError: tuple index out of range'.

#Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the #examples in the README to make something that will run.
from future import print_function

import numpy as np
import theano
import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells, masked_loss)

import random
def get_minibatches_idx(n, minibatch_size, shuffle=False):
"""
Used to shuffle the dataset at each iteration.
"""
idx_list = np.arange(n, dtype="int32")

if shuffle:
    random.shuffle(idx_list)

minibatches = []
minibatch_start = 0
for i in range(n // minibatch_size):
    minibatches.append(idx_list[minibatch_start:
                                minibatch_start + minibatch_size])
    minibatch_start += minibatch_size

if (minibatch_start != n):
    # Make a minibatch out of what is left
    minibatches.append(idx_list[minibatch_start:])

return zip(range(len(minibatches)), minibatches)

def main():
# Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of
# the input sequence. This probably isn't really a good example use case for an LSTM, but it's simple.
import pdb; pdb.set_trace()
rng = np.random.RandomState(123456789)
input_size = 1
input_length = 10
sample_size = 500
num_iterations = 3
examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
for x in examples],
dtype=theano.config.floatX)

hidden_layer_size = 10
num_hidden_layers = 2

model = StackedCells(input_size,
                     layers=[hidden_layer_size,hidden_layer_size],
                     activation=T.tanh,
                     celltype=LSTM)

# Add an output layer to predict the labels for each time step.
model.layers.append(Layer(hidden_layer_size, input_length, lambda x: T.nnet.sigmoid(x)[0]))

def step(x, *prev_hiddens):
    activations = model.forward(x, prev_hiddens[:-1])
    return activations

initial_obs = T.matrix('')
#timesteps = T.iscalar('timesteps')

result, _ = theano.scan(step,
                          initial_obs[:,:-1],
                          outputs_info=[dict(initial=layer.initial_hidden_state, taps=[-1]) for layer in model.layers if hasattr(layer, 'initial_hidden_state')] + [None]) 

prediction = result[-1]

predict_func = theano.function(initial_obs, prediction, allow_input_downcast=True)

# get minibatches    
batches_idx = get_minibatches_idx(examples.shape[0], 5, shuffle=False)

for cur_iter in range(num_iterations):
    for _, batch_idx in batches_idx:
        batch_example = examples[batch_idx,:]
        batch_label =labels[batch_idx,:]
        output_all = predict_func(batch_example)

if name == "main":
main()

JonathanRaiman · 2015-03-13T07:28:58Z

@stephenjia Sounds like I made a typo somewhere. I'll have a look this weekend and send you a revised version

stephenjia · 2015-03-13T09:04:18Z

@JonathanRaiman Thanks a lot.

stephenjia · 2015-03-13T18:04:47Z

@JonathanRaiman, @mheilman
I know what my problem is. I should give the initial state a variable with same shape as each timestep's input, that is, I should give the initial state a variable by repeating layer.initial_hidden_state n_sample times (ndim of the initial state should be 2 instead of 1).

JonathanRaiman · 2015-03-19T08:47:36Z

@mheilman @stephenjia Here's a better example for sequence forecasting that runs (no typos this time) with some comments on what everything does.

mheilman · 2015-03-19T15:53:33Z

nice!

stephenjia · 2015-03-19T16:02:06Z

@JonathanRaiman @mheilman thx!

JonathanRaiman closed this as completed Mar 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple example #8

Simple example #8

allchemist commented Mar 10, 2015

siemanko commented Mar 10, 2015

mheilman commented Mar 10, 2015

allchemist commented Mar 10, 2015

JonathanRaiman commented Mar 11, 2015

allchemist commented Mar 11, 2015

stephenjia commented Mar 12, 2015

mheilman commented Mar 12, 2015

stephenjia commented Mar 12, 2015

JonathanRaiman commented Mar 12, 2015

stephenjia commented Mar 12, 2015

JonathanRaiman commented Mar 13, 2015

stephenjia commented Mar 13, 2015

stephenjia commented Mar 13, 2015

JonathanRaiman commented Mar 19, 2015

mheilman commented Mar 19, 2015

stephenjia commented Mar 19, 2015

Simple example #8

Simple example #8

Comments

allchemist commented Mar 10, 2015

siemanko commented Mar 10, 2015

mheilman commented Mar 10, 2015

allchemist commented Mar 10, 2015

JonathanRaiman commented Mar 11, 2015

allchemist commented Mar 11, 2015

stephenjia commented Mar 12, 2015

mheilman commented Mar 12, 2015

stephenjia commented Mar 12, 2015

JonathanRaiman commented Mar 12, 2015

stephenjia commented Mar 12, 2015

JonathanRaiman commented Mar 13, 2015

stephenjia commented Mar 13, 2015

stephenjia commented Mar 13, 2015

JonathanRaiman commented Mar 19, 2015

mheilman commented Mar 19, 2015

stephenjia commented Mar 19, 2015