Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Layer base class #678

Open
f0k opened this issue May 9, 2016 · 14 comments
Open

New Layer base class #678

f0k opened this issue May 9, 2016 · 14 comments
Milestone

Comments

@f0k
Copy link
Member

f0k commented May 9, 2016

This is a proposal to resolve the long-standing discussions on multiple-output layers (#337) and merge layers accepting dictionaries (#537), both required for recurrent containers (#629). It was originally formulated in #537 (comment); I'll give some more detail here and hope to reignite the discussion.

Main idea: Create a new BaseLayer class, with Layer, InputLayer and MergeLayer subclassing it.
The BaseLayer would be something quite universal: its incomings could be a single layer, or a list of layers, or a dictionary of layers, or a list of dictionaries of layers, or anything along those lines. Similarly, its output shape could be a single tuple, or a list of tuples, or a dictionary of tuples, or a list of dictionaries of tuples, or anything along those lines. Its output would match its output shape definition (i.e., it would produce a tensor, or a list of tensors, or a dictionary of tensors, you get it).
The Layer subclass would check that, after construction, its input shape is a single shape tuple.
The InputLayer subclass would further check that its input layer is None. It can subclass Layer.
The MergeLayer subclass would instead check that its input shape is a list of tuples. We can have another subclass, DictMergeLayer, that checks that its input shape is a dictionary of tuples.
lasagne.layers.get_output_for() would only need to know how to handle a BaseLayer and an InputLayer.

This change would require lasagne.layers.get_output_for() to become quite a bit more generic, as it needs to deal with nested dictionaries and lists. However, it would avoid further isinstance checks. The only special case would be for InputLayer, in case an input is passed to get_output_for() and we need to find the layer it should be linked to (and we don't want to link it to any free-floating layer we can find). While nested dictionaries and lists seem unnecessarily complex, they would seamlessly allow MergeLayers of multiple-output layers. But we don't necessarily have to allow them.

The gist of this is that any BaseLayer will just have to rely on its input_shapes property to see what to expect in its get_output_for() method. It doesn't have to care whether it is linked to a single layer producing multiple outputs, or multiple layers producing single outputs, or anything more complex.

Sketch of the code:

class BaseLayer(object):
    def __init__(self, incomings, name=None):
        # assign self.input_shapes and self.input_layers accordingly
        self.input_shapes = ...
        self.input_layers = ...
        self.name = name

class Layer(BaseLayer):
    def __init__(self, incoming, **kwargs):
        super(Layer, self).__init__(incoming, **kwargs)
        if not isinstance(self.input_shapes, tuple):
            raise TypeError("%s expects a single input only" % self.__class__.__name__)
    @property
    def input_shape(self):
        return self.input_shapes
    @property
    def input_layer(self):
        return self.input_layers

class InputLayer(Layer):
    def __init__(self, shape, input_var=None, **kwargs):
        super(InputLayer, self).__init__(shape, **kwargs)
        if self.input_layer is not None:
            raise TypeError("InputLayer shape must be a shape tuple, not a %r" % type(shape))")
        if input_var is None:
            ...
        self.input_var = input_var
    def get_output_for(self, input, **kwargs):
        assert input is None
        return self.input_var

class MergeLayer(BaseLayer):
    def __init__(self, incomings, **kwargs):
        super(MergeLayer, self).__init__(incomings, **kwargs)
        if not isinstance(self.input_shapes, list) or any(not isinstance(shape, tuple) for shape in self.input_shapes):
            raise TypeError("%s expects a list of inputs, got %r" % (self.__class__.__name__, self.input_shapes))
    def get_output_for(self, inputs, **kwargs):
        # the MergeLayer base class just returns a list of incoming tensors
        return inputs

class DictMergeLayer(BaseLayer):
    def __init__(self, incomings, **kwargs):
        super(DictMergeLayer, self).__init__(incomings, **kwargs)
        if not isinstance(self.input_shapes, dict) or any(not isinstance(shape, tuple) for shape in self.input_shapes.values()):
            raise TypeError("%s expects a dict of inputs, got %r" % (self.__class__.__name__, self.input_shapes))
    def get_output_for(self, inputs, **kwargs):
        # the DictMergeLayer base class just returns a dict of incoming tensors
        return inputs

Note that this will not change anything for existing Layer and MergeLayer subclasses -- they can still access their incomings via self.input_shape or self.input_shapes and know they're of the expected form (a single shape or a list of shapes). New code can continue to use either of these base classes, or start with BaseLayer if they want to support wild combinations of multiple inputs.

If you're still with me, what do you think of this proposal?


Something more technical that's just relevant to the concrete implementation: For full backward compatibility, we would need to have MergeLayer subclass Layer -- user code may have isinstance(..., Layer) in place and expect to catch everything. This is annoying, as we can't do the "is it a single input tensor" check in the Layer constructor then, but will require a separate self.check_inputs method called from the constructor that subclasses can override. If we're that far, we can also turn things around, ditch the BaseLayer and have Layer implement all its functionality, with a new UniversalLayer that subclasses Layer just to override the check_inputs method with something more liberal (return True). Either this, or we tell users to replace isinstance(..., Layer) with isinstance(..., BaseLayer) as needed.


As another variation, MergeLayer could accept either a list or a dict, and DictMergeLayer / ListMergeLayer would restrict this further.

@wuaalb
Copy link
Contributor

wuaalb commented May 10, 2016

Thank you for advancing this issue; I'm especially looking forward to multiple output layers.

If I correctly understood the issue with backwards compatibility of isinstance(..., Layer) as a catch-all in user code; would it maybe be possible to deprecate this usage pattern and encourage users to change it to isinstance(..., BaseLayer)?

A temporary work-around might be something like

import warnings

class LayerBackwardCompatMeta(type):
    def __instancecheck__(cls, instance):
        if type(instance).__name__ == 'BaseLayer':
            warnings.warn('isinstance(instance, Layer) is deprecated, use isinstance(instance, BaseLayer) instead')
        return type(instance).__name__ in ['BaseLayer', 'Layer']
    #def __subclasscheck__(cls, subclass):
    #    ...

class BaseLayer(object):
    pass

class Layer(BaseLayer):
    __metaclass__ = LayerBackwardCompatMeta

print('isinstance(Layer(), Layer):         {}'.format(isinstance(Layer(), Layer)))  # True
print('isinstance(Layer(), BaseLayer):     {}'.format(isinstance(Layer(), BaseLayer)))  # True

print('isinstance(BaseLayer(), Layer):     {}'.format(isinstance(BaseLayer(), Layer)))  # True, but warn
print('isinstance(BaseLayer(), BaseLayer): {}'.format(isinstance(BaseLayer(), BaseLayer)))  # True

There may be quite some cases where the decision whether a merge layer should accept only lists or only dicts is not so clear. For those cases would the variant of your code sketch require deriving from both MergeLayer and DictMergeLayer? I think I prefer the solution at the bottom of your post.

@f0k
Copy link
Member Author

f0k commented May 10, 2016

There may be quite some cases where the decision whether a merge layer should accept only lists or only dicts is not so clear.

True, the ElemwiseSumLayer could accept both, for instance.

For those cases would the variant of your code sketch require deriving from both MergeLayer and DictMergeLayer?

No, it would require deriving from BaseLayer directly and checking that self.input_shape is not just a single shape tuple. If we allow nested dicts/lists, it may also need checking whether the input shape is not nested.

I think I prefer the solution at the bottom of your post.

It's just that existing MergeLayer subclasses could be passed dictionaries then. It wouldn't break anyone's code, though.

@benanne
Copy link
Member

benanne commented May 13, 2016

I've only glanced over the proposal so far, but it looks good to me. It complicates the code quite a bit, unfortunately, but I think the use cases for this are becoming too important and we need to adapt. I'm not a huge fan of the names Layer and BaseLayer. It's weird to have Layer derive from BaseLayer because the former is the most "generic" name.

Also, how common is isinstance(..., Layer) in user code, really? I can see it being useful in situations where both layers and shapes are accepted, but how often does that really come up? I would imagine things like isinstance(..., SomeSpecificLayerSubclass) are much more common, and that would still work as before. Backwards compatibility is very important, but I think this might be relatively uncommon, in which case it shouldn't hold us back.

@f0k
Copy link
Member Author

f0k commented May 14, 2016

It complicates the code quite a bit, unfortunately

If we only allow plain inputs, lists of inputs and dicts of inputs (no nesting), it's fairly easy. Otherwise it involves some recursive function in the base class constructor and in get_output().

It's weird to have Layer derive from BaseLayer because the former is the most "generic" name.

In a perfect world, Layer would be called SimpleLayer or BasicLayer and all single-input layers would subclass it. We can't rename Layer now, so we either need a new name for the base class or Layer needs to support multiple inputs and then constrain it again in a way that subclasses can relax. We can also try to find a better name for BaseLayer.

Also, how common is isinstance(..., Layer) in user code, really?

Maybe that's really something we shouldn't care too much about.

@benanne
Copy link
Member

benanne commented May 15, 2016

Maybe that's really something we shouldn't care too much about.

If we decide not to worry about it, that would mean we are free to rename Layer, right? Or is there something else I'm missing?

@f0k
Copy link
Member Author

f0k commented May 15, 2016

If we decide not to worry about it, that would mean we are free to rename Layer, right?

If we decide not to worry about isinstance checks, we can use a different name for the layer base class (i.e., the one subclassing object). But we don't only have isinstance checks, we also have subclassing -- and I think we should worry about not breaking class MyLayer(Layer). This means whatever we do, Layer should remain to be a class for a layer accepting a single input only, providing input_layer and input_shape attributes, and feeding a single layer/shape to get_output_for() / get_output_shape_for().

@benanne
Copy link
Member

benanne commented May 15, 2016

Right, makes sense. Bummer :)

@justheuristic
Copy link

justheuristic commented May 30, 2016

Also, how common is isinstance(..., Layer) in user code, really?

It also occurs frequently for anyone who is trying to develop libraries on top of lasagne, when trying to assert anything or figuring out whether user sent a single layer or several ones.

p.s. since we're talking about layer redesign, is there any way to include a mechanic that quickly creates a duplicate of a layer with same shared params. This would be useful every time one has to use same embedding or convolution for different network inputs.

p.p.s.
We recently had to implement DictMergeLayer that had to pass instance checks and so far it looks thus
https://github.com/yandexdataschool/AgentNet/blob/master/agentnet/utils/layers.py#L51

@benanne
Copy link
Member

benanne commented May 31, 2016

p.s. since we're talking about layer redesign, is there any way to include a mechanic that quickly creates a duplicate of a layer with same shared params. This would be useful every time one has to use same embedding or convolution for different network inputs.

That sounds like something that should be doable, although it could get complicated because the Layer base class doesn't necessarily know what extra arguments are required to create a new instance of a given subclass. So instantiating a new instance in the usual way and then changing its parameter variables might not be straightforward.

@f0k
Copy link
Member Author

f0k commented Jan 24, 2017

Some more thoughts. There are more options that wouldn't require adding a new base class, but using Layer as the base:

  • Add a constructor argument accept={'plain'} to Layer that can be overridden by subclasses with accept={'list'}, accept={'list', 'dict'}, for example. This would place all checking code in Layer.__init__, with a way for subclasses to influence it. Disadvantage: Since all existing classes pass on **kwargs, this would allow users to do DenseLayer([l], num_units=123, accept={'list'}) which would fail.
  • Add a class attribute accepts = {'plain'} to Layer that can be overridden by subclasses. Again, this would place all checking code in Layer.__init__, and be overridden by MergeLayer and DictMergeLayer. Disadvantage: It's inelegant and unobvious to influence the base class constructor with class attributes.
  • Already mentioned in the first post: Add a check_inputs method to Layer that can be overridden by subclasses. Advantage: Directly understandable code (isinstance(self.input_shapes, dict) instead of an indirection (accept={'dict'}), and easy to customize. Disadvantage: An additional method, and two places to put more specific input checks (constructor and check_inputs) like verifying the input shape or dimensionality.

An advantage of placing all checking code in the base class is that it can easily be shared. Note that proper checking needs more than just verifying whether we have a tuple, list or dict:

if isinstance(self.input_shapes, tuple) and ('plain' in self.accepts):
    # verify that it's a valid shape (only nonnegative int or None)
elif isinstance(self.input_shapes, list) and ('list' in self.accepts):
    # verify that all items are valid shapes
elif isinstance(self.input_shapes, dict) and ('dict' in self.accepts):
    # verify that all keys are strings, and all values are valid shapes
else:
    # raise an error

Some layers may want to accept both lists and dicts, or all options.

The advantage of keeping Layer the base class is full backwards compatibility. The disadvantage is that with Layer as the base class, either all of Layer, MergeLayer and DictMergeLayer will have self.input_shape or none. Giving MergeLayer both input_shape and input_shapes attributes is utterly confusing. But actually, the same holds for Layer; we wouldn't really want it to have both input_shape and input_shapes if we introduce a new layer base class.

Was there any reason for not only extending MergeLayer to allow dicts and lists as incomings, have DictMergeLayer and ListMergeLayer as specializations, and allow all layers (including Layer, excluding InputLayer) to produce multiple outputs? Compared to all other proposals, this will require more code in get_output and not permit a layer that accepts both single tensors and tensor lists/dicts, but it would avoid changing the class hierarchy.

@botev
Copy link
Contributor

botev commented Mar 11, 2017

So just to through out something I did over the weekend, which the main goal was restructuring the Lasagne Recurrent implementation. Nevertheless, the root of all evils of making a single "Step" Layer was the fact that some layer has single input/output some multiple.

So what I did was remove the MergeLayer, and make the Layer to have input_shapes and input_layers (plurals) only. E.g. if you pass a single think it converts it to a tuple. Additionally, all layers now have get_ouput_shapes and get_ouputs_for (again plurals) which always return lists/tuples. This way everything is uniform across the board - everything accepts and outputs multiple things. Added extra flag max_inputs to Layer and then it checks in the base __init__ that the input_shapes are not more, in which case raise an error. Then for back portability get_output_shape returns get_output_shapes()[0] if the len is 0 otherwise throws an error. Same for get_output. All Layers had to change the return of shapes and outputs to tuples, which is just adding an extra comma. Had to modify some of the tests to accommodate the changes in the Mock passed, but now all tests pass. Everything works and the recurrence layer is close to the CustomRecurrent layer, but much easier to extend for StepLayers and I have something like this:

class GRUStep(AbstractStepLayer):
    def __init__(self, incoming, num_units,
                 nonlinearity=nonlinearities.tanh,
                 gates_function=nonlinearities.sigmoid,
                 W=init.Orthogonal(),
                 h_init=init.Constant(0.),
                 **kwargs):
        super(GRUStep, self).__init__(incoming, 3 * num_units, **kwargs)
        if len(self.input_shapes[0]) != 2:
            raise ValueError("GRUStep accepts only 2D inputs")
        self.W = self.add_param(W, (num_units, 3 * num_units),
                                name="W")
        self.add_init_param(h_init, (None, num_units), name="h_init")
        self.f = nonlinearity
        self.g = gates_function

    def get_output_shapes_for(self, input_shapes):
        x_shape = input_shapes[0]
        return (x_shape[0], self.num_units // 3),

    def get_outputs_for(self, inputs, **kwargs):
        x = inputs[0]
        h = inputs[1]
        n = self.num_units // 3
        # If we precompute the output this is only dot(h, w)
        a = T.dot(h, self.W)
        ru = self.g(a[:, :2*n] + x[:, :2*n])
        r = ru[:, :n]
        u = ru[:, n:]
        c_in = a[:, 2*n:]
        c = self.f(c_in * r + x[:, 2*n:])
        return u * c + (1 - u) * h,

I'm pretty sure you probably don't want so radical change, but hey I really enjoyed it, and now things are way more simple to manipulate around. And the whole recurrent module with 2 more layers implemented is 500 LOC.

@f0k
Copy link
Member Author

f0k commented Mar 12, 2017

Thanks for sharing!

I'm pretty sure you probably don't want so radical change

Exactly -- any changes that break existing Layer subclasses or MergeLayer subclasses should be avoided, as this would break user code and we're not Keras. Furthermore, the simple case of a single-input, single-output layer should stay as easy as it is now, in accordance with our design principles (specifically the fourth and fifth).

And the whole recurrent module with 2 more layers implemented is 500 LOC.

That's also the goal of #629, still blocked by this very issue (to allow multiple outputs), which in turn is blocked by milestone 0.2, for which I'd appreciate additional contributors!

@botev
Copy link
Contributor

botev commented Mar 12, 2017

Exactly -- any changes that break existing Layer subclasses or MergeLayer subclasses should be avoided, as this would break user code and we're not Keras.

So it does not make any breaks in the user code unless the user has own classes which extended MergeLayer as it has been removed. Anything else is back compatible (in theory).

Nevertheless, I totally understand you guys why you don't want such a brake. Potentially, I could help with the RNN module, but I have to implement a few more things in our fork, like variational stuff, before I have time.

@f0k
Copy link
Member Author

f0k commented Mar 12, 2017

So it does not make any breaks in the user code unless the user has own classes which extended MergeLayer as it has been removed. Anything else is back compatible (in theory).

From your description, you renamed get_output_for to get_outputs_for and added a default implementation of get_output_for in the base class. An existing Layer subclass will only overwrite get_output_for, and then it doesn't follow your new API. You'd need to provide a default implementation of get_outputs_for in the base class instead. Requiring single-output layers (which are the most common type) to return a tuple of tensors adds some unnecessary cognitive overhead. Only implementing get_output_for for single-output layers means they will end up with both get_output_for and get_outputs_for (and the same for the shapes), which at the very least would be mildly confusing to new users. So I think renaming these methods is not a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants