New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple "reapply" functionality #720
Comments
Interesting to see that this works -- so basically,
There's one solution that you haven't mentioned: Construct your network such that it processes both parts at once, and split them up afterwards. I.e., instead of: out1 = lasagne.layers.get_output(my_nn, {my_nn_input:input1}
out2 = lasagne.layers.get_output(my_nn, {my_nn_input:input2} do: input = T.concatenate((input1, input2), axis=0)
...
l_out1 = lasagne.layers.SliceLayer(my_nn, slice(None, len(input1)), axis=0)
l_out2 = lasagne.layers.SliceLayer(my_nn, slice(len(input1), None), axis=0) Now you can continue constructing your network, and don't have to worry about collecting parameters from the different fragments. This will also result in a more efficient solution, since Theano can process both input parts in a single minibatch. However, it won't work well for recurrent networks if the two input parts have large different sequence lengths (the shorter one would have to be padded).
It would be nice to have such functionality, but we'll have to investigate whether Thinking a little more, another solution would be a special layer class that embeds the application of a network to its input(s): class MacroLayer(MergeLayer):
def __init__(self, incomings, network, input_layers=None, **kwargs):
super(MergeLayer, self).__init__(incomings, **kwargs)
self.network = network
all_layers = lasagne.layers.get_all_layers(network)
if input_layers is None:
input_layers = [layer for layer in all_layers if getattr(layer, 'input_layer', None) is None]
self.input_layers = input_layers
self.params = dict(itertools.chain.from_iterable(layer.params.items() for layer in all_layers))
def get_output_shape_for(self, input_shapes):
return lasagne.layers.get_output_shape(self.network, dict(zip(self.input_layers, input_shapes)))
def get_output_for(self, inputs, **kwargs):
return lasagne.layers.get_output(self.network, dict(zip(self.input_layers, input_shapes)), **kwargs) Each branch of your network would become a |
I myself was surprised, but the cloned networks seem to work perfectly right - i even lost a bet (for a sandwich) when i promised to find a way to break the code :)
That's awesome! You have probably just saved one of my friends 10h of compilation time in his High Energy Physics research :)
On one hand, that's much more code if you just want to copy a single layer.
That piece of code has no dependencies apart from "check_list" thingy that converts stuff(tuple, set) to a list, which must have some analogy in the lasagne utils. |
Definitely, the recurrent container does that as well and in both cases we need to figure out what's the best API for linking the embedded network to the outer inputs.
How can we know that Looking at your implementation, I'm worried about the following:
If we can make the implementation and API a little nicer and we are sure it cannot break, I'd be fine with including it. @benanne, what's your opinion? |
We should probably document this trick somewhere, it's super useful and it's something that keeps coming up on the mailing list as well.
Sure, sounds good! |
Sorry for disappearing
Will now try to assemble the thing |
Hi, I have a similar problem, i.e. I want to apply the same set of convolutions (shared weights) to lots of different 'color' channels. As suggested by @f0k, I slice the channels, create conv-layers that all sharethe same W/b variables and merge the output later. This works quite fine in theory, however, when i try to compile the architecture into a theano-function, this compilation takes alot of time. Looking at the theano-profiler, it seems that most of the compilation time is consumed by the optimizer, in particular a I tried to construct a minimal example: from lasagne.layers import get_output, InputLayer, DenseLayer, SliceLayer, ConcatLayer, Conv2DLayer, ReshapeLayer
from lasagne.nonlinearities import softmax
import theano
import theano.tensor as T
import time
theano.config.profile = True
theano.config.profile_optimizer = True
n_channel = 34 # <- replicates the conv-layer 34 times, processing the channels indep
n_classes = 3
# ----------------------------------------------------------------------------------------
input_var = T.tensor4('inputs')
in_layer = InputLayer((None, n_channel, 100, 100), input_var=input_var)
# ------- construct a prototyp conv-layer, replicate a couple of times----------
proto_params = {'name': 'conv1', 'num_filters': 512, 'filter_size': 3, 'pad': 'valid'}
dummy_in = InputLayer((None,1,100,100))
prototype = Conv2DLayer(dummy_in, **proto_params)
conv1_layers = []
for i in range(n_channel):
theslice = SliceLayer(in_layer, indices=slice(i, i + 1), axis=1, name='slice_' + str(i))
dup = Conv2DLayer(theslice, W=prototype.W, b=prototype.b, **proto_params) # duplicate with shared weights
conv1_layers.append(dup)
# ---------merge their outputs ---------------------
merger = ConcatLayer(conv1_layers, axis=1, name='concat')
reshaper = ReshapeLayer(merger, shape=([0], -1))
dense_last = DenseLayer(reshaper, n_classes, name='fcEND', nonlinearity=softmax)
pred = get_output(dense_last)
# ---------compiling ---------------------
t = time.time()
theano.function([input_var], [pred])
time.time() - t Heres the relevant part of the theano-profiler:
On a sidenote, if i do the same thing, but untying the weights (i.e. they are not shared across channels), compile/optimize time is down to 5sec Any idea what's going on? This is a small example (150 sec compilation time is ok), but if I scale this up (not a single convolution but a couple of them, i.e. conv-conv-maxpool-conv-conv-maxpool ...) this quickly goes beyond hours, which is not acceptable for me. I'm not an expert in theano and I'm not sure how to exactly read the profiler output, so help is greatly appreciated :) |
Sorry for the delay, it's probably not relevant any more now, but for that use case I wouldn't suggest slicing and doing multiple convolutions, but reshaping the tensor from (batchsize, channels, rows, cols) to (batchsize x channels, 1, rows, cols), then do a normal 2d convolution. This will use the same set of weights for all channels, since they're interpreted as different examples in a batch now. In the end, reshape back to (batchsize, ...) to restore the original grouping (note that if you want to do multiple such convolutions and only pool in between, you can keep the (batchsize x channels, ...) layout until the very end). |
@f0k
equal to
@justheuristic 's my_nn can has meanings that we can use it to think in concept. In your case my_nn is just details of implement. When working on a really complicated network, thinking in concept is important. In my example:
I would like to keep a transform_network instance which means 'transform the 64x1x1 input to 64x1x1 output in a special way we trained', I cannot accept keeping a instance of transform_network with input_layer replaced to conv_network, because that means 'transform the 64x1x1 point that is the conv result of 3x64x64 input to 64x1x1 output in a special way we trained', this meaning is useless for me. Generally, I would like to keep some lasagne networks which have clear meanings and later using them as templates to compose some high level concepts, and maybe use the high level ones to compose some more complicated ones, and so on. |
If you don't use batch normalization, results will be the same. The computation of the former will use more memory and be faster. However, it only works if the two input tensors are the same shape (except for the first dimension). So this is a useful recipe for Siamese networks, but not for all possible use cases.
Okay... so you want to keep an instance of transform_network, another of conv_network, and a third one that is a copy of the transform_network on top of the conv_network. For that purpose, reviving the clone functionality discussed in this thread or the MacroLayer (#720 (comment)) would be a possible solution. What was holding us back from merging the cloning code as it was were the API, the amount of code, and some doubts about how to handle parameter expressions. I'm still open to a clean |
I agree clone_network is perfect for the case not share params. let's focus on sharing params. A example: there is a
now let's do this:
The The result I prefer is, Currently, |
Okay, i'm currently working on a much simpler way to do this. The simple version that simply calls get_output:
The only problem is, it won't clone several layers at once. |
Not exactly on the topic of the thread, but perhaps here is a way around it. I came across this thread when working in reinforcement learning. It was annoying having multiple copies of the same layer, one for "stepping" while interacting with the environment, the other for training on a minibatch. One solution is to specify different behaviors in Seems to be working although I haven't done a lot of learning with it yet....looks right tho? Another benefit is that the input data formats (and ndims) stay the same whether there is recurrence or not in the network, because the reshaping to (n_batch, n_step, *dims) happens only where it is needed, inside the layer.
|
Greetings!
It's probably a small issue in most image-CNN-related cases, but when dealing with text, multi-input NNs, reinforcement learning or long-term memory networks some layers should be applied at multiple spots with same weights.
Some usage cases:
As far as i understood the docs, one natural way to do so in Lasagne is by passing other layer's params on creation, e.g.
or alternatively, create several networks and use .get_output
The problem with the first approach is that it takes immensely many lines of code to do so with larger networks, especially with e.g. LSTM with a lot more parameters. Not to mention it's easy to make a mistake that way.
The problem with the second approach is that it forces you to break outside lasagne network and makes routine tasks e.g. getting all params, regularizing, applying flags more complicated and verbose.
It may be a good idea to introduce some simpler method to do that.
For example, in blocks layers can be .apply --ed to as many spots as necessary with all params shared.
This complicates the code a lot and i personally don't think this is the best solution since most of layers are only applied once.
Another approach is to make some .reapply() method for layers with params that would work like a cloning constructor that reuses the original layer's weights, but that too makes cloning large nets complicated.
In one of our libraries for reinforcement learning we use a generic function to clone a lasagne network with or without keeping the params.
The source can be found here - https://github.com/yandexdataschool/AgentNet/blob/master/agentnet/utils/clone.py
Some usage examples - https://github.com/yandexdataschool/AgentNet/blob/master/tests/test_clone_and_targets.py#L45
The question is: is there any way to implement functionality of applying network parts multiple times in Lasagne?
p.s. if our clone_network can fit Lasagne spirit, we'd be glad to contribute the code (or you can just grab it at your will, so far as i understand the license - the code has almost no dependencies but for theano and lasagne)
The text was updated successfully, but these errors were encountered: