Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple example #8

Closed
allchemist opened this issue Mar 10, 2015 · 16 comments
Closed

Simple example #8

allchemist opened this issue Mar 10, 2015 · 16 comments

Comments

@allchemist
Copy link

Hello!

Can you please provide a simple example of usage?

Thanks!

@siemanko
Copy link
Collaborator

Yes

On Tue, Mar 10, 2015, 9:01 AM allchemist notifications@github.com wrote:

Hello!

Can you please provide a simple example of usage?

Thanks!


Reply to this email directly or view it on GitHub
#8.

@mheilman
Copy link
Contributor

Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the examples in the README to make something that will run.

#!/usr/bin/env python3

import numpy as np
import theano
import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells)


def main():
    # Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of
    # the input sequence.  This probably isn't really a good example use case for an LSTM, but it's simple.
    rng = np.random.RandomState(123456789)
    input_size = 1
    input_length = 10
    sample_size = 500
    num_iterations = 3
    examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
    labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
                       for x in examples],
                      dtype=theano.config.floatX)

    hidden_layer_size = 10
    num_hidden_layers = 2

    model = StackedCells(input_size,
                         layers=[hidden_layer_size] * num_hidden_layers,
                         activation=T.tanh,
                         celltype=LSTM)

    # Make the connections from the input to the first layer have linear activations.
    model.layers[0].in_gate2.activation = lambda x: x

    # Add an output layer to predict the labels for each time step.
    output_layer = Layer(hidden_layer_size, 1, T.nnet.sigmoid)
    model.layers.append(output_layer)

    def step(x, *prev_hiddens):
        activations = model.forward(x, prev_hiddens=prev_hiddens)
        return activations

    input_vec = T.vector('input_vec')
    input_mat = input_vec.dimshuffle((0, 'x'))

    result, _ = theano.scan(fn=step,
                            sequences=[input_mat],
                            outputs_info=([dict(initial=hidden_layer.initial_hidden_state, taps=[-1])
                                           for hidden_layer in model.layers[:-1]] +
                                          [dict(initial=T.zeros_like(model.layers[-1].bias_matrix), taps=[-1])]))

    target = T.vector('target')
    prediction = result[-1].T[0]

    cost = T.nnet.binary_crossentropy(prediction, target).mean()

    updates, _, _, _, _ = create_optimization_updates(cost, model.params)

    update_func = theano.function([input_vec, target], cost, updates=updates, allow_input_downcast=True)
    predict_func = theano.function([input_vec], prediction, allow_input_downcast=True)

    for cur_iter in range(num_iterations):
        for i, (example, label) in enumerate(zip(examples, labels)):
            c = update_func(example, label)
            if i % 100 == 0:
                print(".", end="")
        print()

    test_cases = [np.array([-1, 1, 0, 1, -2, 0, 1, 0, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([-2, -2, -2, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX),
                  np.array([2, 0, 0, 0, 2, 0, 0, 0, 0, -2, 0, 0, 0, 0, 0], dtype=theano.config.floatX),
                  np.array([2, 2, 2, 0, 0, 0, 2, 2, 2, 0], dtype=theano.config.floatX)]


    for example in test_cases:
        print("input", "output", sep="\t")
        for x, pred in zip(example, predict_func(example)):
            print(x, "{:.3f}".format(pred), sep="\t")
        print()

if __name__ == "__main__":
    main()


@allchemist
Copy link
Author

Big thanks!

@JonathanRaiman
Copy link
Owner

Thanks !!!

@allchemist
Copy link
Author

Sorry to bother, please help understand training data preparation.

As i understand, LSTM block recieves one value and returnes one value. So to make it learn a sequence we should prepare known sequence 'seq' as pairs [seq[1] , seq[2]], [seq[2] , seq[3]], [seq[3] , seq[4]] etc (first elem is train input, second - train target)

I tried different train data, but didnt manage to achieve the recurrence effect, so 'predict_func' returns same result on several repeated calls on same input. Looks like i failed in data preparation. Or it just resets its state after each 'predict_func' call?

If it isnt much trouble, can you show an example with sequence forecasting?
Currently i use Lstm in PyBrain, but very interested in theano implementation.

@stephenjia
Copy link

@mheilman Thanks for your code. But can I ask you how extend it to minibatch case? I tried to modified your code for minibatch case, but I suffered problems with the input . Can you give me some help? Thanks.
...
input_mat = T.matrix('input_mat')
input_tensor = input_mat.dimshuffle((1,'x',0))
...

@mheilman
Copy link
Contributor

Sorry @stephenjia, but I don't have a good enough understanding of the best way to do that at the moment. One idea is to make another scan operation that iterates over the examples in a minibatch and makes updates for each (using the existing scan op). I'm not sure that's a good approach, though.

@stephenjia
Copy link

@mheilman Thanks all the same. I also tried the method in README file but I still got mistakes. I will try to find where I am wrong

@JonathanRaiman
Copy link
Owner

@stephenjia @mheilman To get a minibatch case to work with LSTMs in a sequence forecasting setting here is one way to get this to work:

Suppose you have n = 30 symbols to predict:

 n = 30

Build some layers out of LSTMs:

from theano_lstm import LSTM, StackedCells, Layer, masked_loss
hidden = 20
model = StackedCells(n, layers=[hidden, hidden], activation=T.tanh, celltype=LSTM)

Add a classifier:
model.layers.append(Layer(hidden, n, lambda x: T.nnet.softmax(x)[0]))

Construct the recurrent prediction:

def step(x, *prev_hiddens):
      new_states = model.forward(x, prev_hiddens[:-1], 0.0)
      return new_states

Store your mini batch as one big matrix with one row per example (if all examples don't have the same length then pad them with 0s or something else):

 observations = T.matrix()

 result, updates = theano.scan(step,
                             observations[:,:-1],
                             outputs_info=[
     dict(initial=layer.initial_hidden_state, taps=[-1])
     for layer in model.layers if hasattr(layer, 'initial_hidden_state')
                             ] + [None] )

Now the error is as follows, we apply KL divergence at each timestep between a prediction and the next tilmestep observation (replace targets with what you want, but it must have the same dimensions as result[-1] (e.g. the softmaxes from the LSTMs stacks above). Also create an array of lengths of observations (if all lengths are not equal then this says: "sequence one is 1 long, sequence 2 is 2 long, sequence 3 is 5 long, etc..."):
np.array([1, 2, 5, 2, 1, 3])

and tell the error where the sequences start:

 observation_starts = np.zeros(5, dtype=np.int32)

Tell the system what sequence needs to be forecasted (here we're just forecasting ourselves 1 step ahead):

targets = observations[:,1:]
observation_lengths = np.array([1, 2, 5, 2, 1, 3]) # etc... 
observation_starts = np.zeros(5, dtype=np.int32)
 error = masked_loss(result[-1], 
             targets,
             observation_lengths,
             observation_starts)

 error = error.sum()

Compute gradient descent:

updates, _, _, _, _ = create_optimization_updates(cost, model.params)

update_func = theano.function(
      [input_vec, target],
      cost,
      updates=updates,
      allow_input_downcast=True)

@stephenjia
Copy link

@JonathanRaiman Thanks for your detailed reply. I modified @mheilman example based on your suggestion, now I only play with the forward propagation part. However, there is a problem even before scan operation, says 'IndexError: tuple index out of range'.

#Here's a very, very simple example that might help. It doesn't do anything useful, but it expands on the #examples in the README to make something that will run.
from future import print_function

import numpy as np
import theano
import theano.tensor as T

from theano_lstm import (create_optimization_updates, Layer, LSTM, StackedCells, masked_loss)

import random
def get_minibatches_idx(n, minibatch_size, shuffle=False):
"""
Used to shuffle the dataset at each iteration.
"""
idx_list = np.arange(n, dtype="int32")

if shuffle:
    random.shuffle(idx_list)

minibatches = []
minibatch_start = 0
for i in range(n // minibatch_size):
    minibatches.append(idx_list[minibatch_start:
                                minibatch_start + minibatch_size])
    minibatch_start += minibatch_size

if (minibatch_start != n):
    # Make a minibatch out of what is left
    minibatches.append(idx_list[minibatch_start:])

return zip(range(len(minibatches)), minibatches)

def main():
# Make a dataset where the network should learn whether the number 1 has been seen yet in the first column of
# the input sequence. This probably isn't really a good example use case for an LSTM, but it's simple.
import pdb; pdb.set_trace()
rng = np.random.RandomState(123456789)
input_size = 1
input_length = 10
sample_size = 500
num_iterations = 3
examples = rng.choice([-2, -1, 0, 1, 2], (sample_size, input_length)).astype(theano.config.floatX)
labels = np.array([[1 if np.sum(np.abs(x[:y + 1])) > 5 else 0 for y in range(len(x))]
for x in examples],
dtype=theano.config.floatX)

hidden_layer_size = 10
num_hidden_layers = 2

model = StackedCells(input_size,
                     layers=[hidden_layer_size,hidden_layer_size],
                     activation=T.tanh,
                     celltype=LSTM)

# Add an output layer to predict the labels for each time step.
model.layers.append(Layer(hidden_layer_size, input_length, lambda x: T.nnet.sigmoid(x)[0]))

def step(x, *prev_hiddens):
    activations = model.forward(x, prev_hiddens[:-1])
    return activations

initial_obs = T.matrix('')
#timesteps = T.iscalar('timesteps')

result, _ = theano.scan(step,
                          initial_obs[:,:-1],
                          outputs_info=[dict(initial=layer.initial_hidden_state, taps=[-1]) for layer in model.layers if hasattr(layer, 'initial_hidden_state')] + [None]) 

prediction = result[-1]

predict_func = theano.function(initial_obs, prediction, allow_input_downcast=True)

# get minibatches    
batches_idx = get_minibatches_idx(examples.shape[0], 5, shuffle=False)

for cur_iter in range(num_iterations):
    for _, batch_idx in batches_idx:
        batch_example = examples[batch_idx,:]
        batch_label =labels[batch_idx,:]
        output_all = predict_func(batch_example)             

if name == "main":
main()

@JonathanRaiman
Copy link
Owner

@stephenjia Sounds like I made a typo somewhere. I'll have a look this weekend and send you a revised version

@stephenjia
Copy link

@JonathanRaiman Thanks a lot.

@stephenjia
Copy link

@JonathanRaiman, @mheilman
I know what my problem is. I should give the initial state a variable with same shape as each timestep's input, that is, I should give the initial state a variable by repeating layer.initial_hidden_state n_sample times (ndim of the initial state should be 2 instead of 1).

@JonathanRaiman
Copy link
Owner

@mheilman @stephenjia Here's a better example for sequence forecasting that runs (no typos this time) with some comments on what everything does.

@mheilman
Copy link
Contributor

nice!

@stephenjia
Copy link

@JonathanRaiman @mheilman thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants