Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Arbitrary expressions as Layer parameters #11
However, it would be cool if
However, this requires some modifications to
Given an arbitary Theano expression, it is fairly easy to get a list of all shared variables that occur in it, by traversing the Theano graph. This isn't very 'clean' I suppose, but definitely possible.
However, that just gives us a list of all shared variables, and we have no way of knowing if all of those contain learnable parameters. Perhaps some of them should not be touched by the learning.
We could assume that all shared variables represent learnable parameters by default. This would usually be the case. But then we need to provide a way for the user to specify that a given variable is not to be touched (for example, this could be a variable that contains a binary mask that restricts some of the parameters to be zero). Perhaps an extra attribute on the
Should we support arbitrary expressions as
What do you guys think?
As we discussed via email before, my only concern is that allowing arbitrary expressions will make it more difficult to store/load models in HDF5 format (but still possible if I don't insist on restoring the original configuration of variables, just an equivalent network with respect to the forward pass). That's a minor disadvantage compared to the flexibility we would gain in defining and training models without writing any additional classes.
Regarding the added complexity,
Another problem is regularization. Currently
The regularization thing was an afterthought :) So that could probably be revamped completely. I agree that if possible, the stuff in
referenced this issue
Dec 29, 2014
I guess the current idea is that we are not supporting this - in other words, it is safe to assume that layer parameters are always shared variables in the code (and indeed, this assumption is already made in many parts of the code anyway). If users want to reparameterize a layer, the preferred method is to subclass it.
referenced this issue
Jun 2, 2015
Reopening as this issue came up again on the mailing list: https://groups.google.com/forum/#!topic/lasagne-users/ABiUmAIT-ho
To paraphrase, there are two assumptions in our code and/or user code, one of which we would have to break to support this. The assumptions are: a) Every
I'd suggest to break the first one, such that when a user supplies a custom expression for a constructor parameter
Furthermore, both our code and user code relies on
Sander had the objection that if an expression depends on multiple shared variables, it's unclear whether all of them are meant to be trained, and said:
My suggestion would be to not bother with this, and just have all shared variables involved in an expression be collected by
I'd strongly vote for having
I disagree. Layers give tags to 'virtual' parameters, and those should just apply to all 'real' variables contained within. Everything else is unnecessarily complex for now. (And as I said, there's always
/edit: There also was the suggestion of only allowing parameter expressions that depend on exactly one shared variable, but I'm not actually sure how that would help.
/edit edit: If we allow arbitrary expressions, it's also possible that a 'virtual' parameter doesn't have any 'real' parameter at all -- e.g., because it is a constant, or random.
But then how would you change their values during training? I might not want to perform gradient updates on a given variable, but that doesn't mean I don't want to change it at all :)
Agreed, I could live with that.
I think we need to reflect on nomenclature here. If we are going to refer to both real and virtual parameters as 'params', we are going to hopelessly confuse people. I agree that letting
Also let's keep in mind that we want to shield any users who do not wish to use this feature (i.e. the majority of them, most likely) from ever having to deal with any of this. From what's discussed so far I think this would be the case, but let's just keep it in mind nevertheless.
Fair enough, except for the
Yes... my idea was that only very few people would need to deal with that. Most people won't use the feature, and the ones that do
Referring to the latter as
I think that's totally fine if
You are probably right. This is a less common use case and keeping everything as
The one situation I'm still concerned about is when people don't use
Maybe we should start drafting a PR for this, to see if any other hurdles come up that we've missed.
Looking into the code, I think we should probably even change
One thing came to my mind: the recurrent layers already support taking a Theano expression for the initial hidden state, but not in a way that they want to include all shared variables in that expression in training. The recurrent layers do have a flag telling whether to learn the initial hidden state, though, so at least for training their behaviour wouldn't change if we officially allowed Theano expressions for parameters. We'd need to check what it means for
But any expression involving tensors is a
The question is whether the expression could be the output of a neural network, for example (i.e., whether it could depend on some shared variables). If it's always just a symbolic input variable (i.e., its
This was referenced
Sep 2, 2015
But in that use case, what kind of TensorVariable would that be? Just a
Would be doable, given that the recurrent layers are
Yes. But couldn't we get remove the support for hid_init being a tensor variable and instead allow hid_init to be a layer?
Sure, if you want it to be a basic tensor variable, you'd need to set it to an InputLayer instance then, which has that variable (or expression) as its
The point of this Issue (#11) is to allow all network parameters to be theano expressions, so that would naturally include the case of it being a
That would be a good idea then, as it allows