Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow symbolic in reshape #287

Merged
merged 1 commit into from
Jun 8, 2015
Merged

allow symbolic in reshape #287

merged 1 commit into from
Jun 8, 2015

Conversation

skaae
Copy link
Member

@skaae skaae commented Jun 5, 2015

Allow symbolic variables in the shape specifcation when using the reshape layer.

Its discussed here: https://groups.google.com/forum/#!topic/lasagne-users/eA995V73K8I

One use case is to allow for both variable batchsize and sequence length when using recurrent nets.

It passes the tests but i'm not sure if it has some undesired consequences?

@benanne
Copy link
Member

benanne commented Jun 5, 2015

Maybe we should check that the TensorVariable is scalar as well? I think as it stands you could pass in any tensor and it wouldn't complain until runtime.

Then we can also get rid of that pass statement which looks a bit odd :)

@f0k
Copy link
Member

f0k commented Jun 5, 2015

Your test is missing get_output_shape_for(). All the tests I wrote first check layer.output_shape and then check layer.get_output_for(inputdata).eval(). Add that and you will see you need to adapt ReshapeLayer.get_output_shape_for() as well. Specifically, you need a second for loop after the one caring for [i]:

# Secondly, replace all symbolic shapes with `None`, as we cannot
# infer their size here.
for dim, o in enumerate(output_shape):
    if isinstance(o, TensorVariable):
        output_shape[dim] = None
        masked_output_shape[dim] = None

You can also merge it with the first for loop if you find a good way of documenting it, otherwise just copy and paste my suggestion as is.


Apart from that, I don't understand your use case. Why do you need a ReshapeLayer after the EmbeddingLayer? Shouldn't the embedding layer give you the correct shape already?

@skaae
Copy link
Member Author

skaae commented Jun 6, 2015

Yeah is see the test is not correct.

My use case is for recurrent nets. When you combine recurrent nets and feed forward nets
you need to do the reshapes:

Recurrent                                             Feedforward
(batch_size, seq_len, num_units) <-> (batch_size*seq_len, num_units)

If you want both seq_len and batch_size to be flexible you need to let the reshape depend on a
symbolic variable.

@skaae
Copy link
Member Author

skaae commented Jun 6, 2015

I adressed @f0k's and @benanne's comments.

elif isinstance(s, T.TensorVariable):
if s.ndim != 0:
raise ValueError(
"The symbolic variable specifying shape must be a "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "A symbolic variable in a shape specification must be a scalar, but had %i dimensions"? The variable technically doesn't specify the shape, it only specifies one dimension.

@benanne
Copy link
Member

benanne commented Jun 7, 2015

Couldn't resist a bit of nitpicking, sorry! :) Apart from that error message everything looks great. I'm okay with merging either way, as I said myself we shouldn't be nitpicking about docs / error messages too much just yet (see #279).

ReshapeLayer is getting pretty complicated... but I guess it already was before anyway.

@skaae
Copy link
Member Author

skaae commented Jun 7, 2015

No worries. I think its important to have meaningfull errors, and your sugesstions is more precise :)

Maybe i should also add an example?

@skaae
Copy link
Member Author

skaae commented Jun 7, 2015

Updated with @benanne's error message and added example with symbolic dimension specification.

@benanne
Copy link
Member

benanne commented Jun 7, 2015

Sweet, looks good to me. I'll leave this to @f0k to merge in case he has any further comments.

>>> l_in = InputLayer((None, None, 10))
>>> l1 = ReshapeLayer(l_in, (batch_size*seq_len, [2]))
>>> l1.output_shape
(None, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a good example, because you could just do ReshapeLayer(l_in, (-1, [2])) for that. Better remove the example than keeping people wondering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe we should remove the example from here and add an example in the recurrent docs?

@f0k
Copy link
Member

f0k commented Jun 7, 2015

Minor comments, but the code and the tests look good! So the use case for this is going from (seqlen*bs, N) back to (bs, seqlen, N), right? Going from (bs, seqlen, N) to (seqlen*bs, N) doesn't need this PR, that's why I'm asking. I think we don't need to illustrate this in the docstring then, it gets too involved to construct a case that cannot do without symbolic variables.

@skaae
Copy link
Member Author

skaae commented Jun 7, 2015

I removed the blanks. I agree with your comment about the example, it is probably more confusing.
You are correct with the use case. You can go from (bs, seqlen, N) to (bs*seqlen, N) with (-1, N) but not in the opposite direction.

As i mentioned above i think we should add an example in the recurrent docs. Do you agree?

@f0k
Copy link
Member

f0k commented Jun 7, 2015

As i mentioned above i think we should add an example in the recurrent docs. Do you agree?

Yes, that seems to be a good place! You'd need some example on how to use a custom subnetwork for the input-to-hidden or hidden-to-hidden connections anyway, and this will include the reshapes (unless it's hidden in the recurrent layer class, but I'm not sure if that'd be good).

@skaae
Copy link
Member Author

skaae commented Jun 7, 2015

Yes. I'll create an issue in @craffel's repo so we wont forget.

@skaae
Copy link
Member Author

skaae commented Jun 8, 2015

I removed the new example and removed the blanks.

@benanne
Copy link
Member

benanne commented Jun 8, 2015

Nice work, merging!

benanne added a commit that referenced this pull request Jun 8, 2015
allow symbolic in reshape
@benanne benanne merged commit b5f9794 into Lasagne:master Jun 8, 2015
@skaae
Copy link
Member Author

skaae commented Jun 8, 2015

I'm writing the example with recurrent layers + dynamic reshape and I think i found an bug/non desired behavior:
My code is

from lasagne.layers import *
import lasagne
from lasagne.nonlinearities import softmax
import theano.tensor as T
import numpy as np
x, y = T.tensor3(), T.matrix()
batchsize, seqlen, _ = x.shape

num_inputs, num_units, num_classes = 10, 12, 5
l_inp = InputLayer((batchsize, seqlen, 10))
l_lstm_fwd = LSTMLayer(l_inp, num_units=num_units)
l_lstm_bck = LSTMLayer(l_inp, num_units=num_units, backwards=True)
l_lstm = ConcatLayer([l_lstm_fwd, l_lstm_bck], axis=2)
l_shp1 = ReshapeLayer(l_lstm, (-1, 2*num_units)) #<<<<<<<<<problem
l_softmax = DenseLayer(l_shp1, num_units=num_classes, nonlinearity=softmax)
l_out = ReshapeLayer(l_softmax, (batchsize, seqlen, num_classes))

lasagne.layers.get_output(
    l_out, x).eval({x: np.ones((10, 5, num_inputs), dtype='float32')})

If I have minus -1 in the marked line i get an error in the sanity check in 155. I i replace it with
batchsize*seqlen it runs.

I havent fully figured out what the sanity checks but thought i would mention it anyway.

I think that the error message in line 98 should be changed to something like:

raise ValueError(shape must be a tuple of int, [int], -1 or SymbolicVariables)

@skaae
Copy link
Member Author

skaae commented Jun 8, 2015

The problem seems to be that line 130 ff replace TensorsVariables with None

        for dim, o in enumerate(output_shape):
            if isinstance(o, T.TensorVariable):
                output_shape[dim] = None
                masked_output_shape[dim] = None

which is then converted to None in line 137.

        output_size = (None if any(x is None for x in masked_output_shape)
                       else np.prod(masked_output_shape))

I think it can be solved by checking for tensors

        def has_tensor_input(lst):
            return any(map(lambda v: isinstance(v, T.TensorVariable), lst))

        has_tensor = (has_tensor_input(masked_input_shape)) or (
            has_tensor_input(masked_output_shape))
        del masked_input_shape, masked_output_shape
        # Finally, infer value for -1 if needed
        if -1 in output_shape:
            dim = output_shape.index(-1)
            if (input_size is None) or (output_size is None):
                output_shape[dim] = None
                output_size = None
            else:
                output_size *= -1
                output_shape[dim] = input_size // output_size
                output_size *= output_shape[dim]
        # Sanity check
        if (input_size is not None) and (output_size is not None) \
           and (input_size != output_size) and not has_tensor:
            raise ValueError("%s cannot be reshaped to specification %s. "
                             "The total size mismatches." %
                             (input_shape, self.shape))

@skaae
Copy link
Member Author

skaae commented Jun 8, 2015

Sorry for making this so long :) But i figured that maybe i'm not supposed to give input sizes that are non int/[int] in the input layer? The shape layer works if I use

l_inp = InputLayer((None, None, 10))

@benanne
Copy link
Member

benanne commented Jun 8, 2015

Indeed, symbolic scalars are not supported in the shape specification of InputLayer. Maybe they should be? It'd be better to open a new issue for this though.

@skaae
Copy link
Member Author

skaae commented Jun 8, 2015

ok. I guess its old habit to pass anything called batchsize into the input layer.

@f0k
Copy link
Member

f0k commented Jun 16, 2015

Indeed, symbolic scalars are not supported in the shape specification of InputLayer. Maybe they should be?

No, we want the shapes to be non-symbolic. InputLayer((None, None, 10)) is the correct solution if the first two sizes are only known at run-time. The only thing we could do is replacing symbolic dimensions with None automatically in the constructor, but that would complicate the documentation.

@skaae skaae deleted the reshape branch June 16, 2015 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants