Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there interest for implementing flexible training loops, batch iteration schemes, etc. [offer of code] #756

Open
Britefury opened this issue Oct 21, 2016 · 17 comments

Comments

@Britefury
Copy link
Contributor

Would the Lasagne community / developers be interested in adding code for training loops, basic DNN construction, etc?

I found that code for training loops with features such as early termination, etc would involve a lot of copy+paste, so I put together a library that can be found at:

http://github.com/Britefury/britefury-lasagne-helpers

I find it to be very helpful when I need to throw a network together and train/use it.

I would like to know if the Lasagne developers would be interested in taking some of this code and incorporating it into Lasagne?

My code has no documentation beyond the doc-strings; this would need to be worked on.

What's there:

data_souce module

Defines the batch_iterator function that extracts mini-batches from a data set which can be NumPy arrays or other sources of data; the protocol is defined in the doc-strings.
Code for iterating though a dataset looks like:

for (batch_X, batch_y) in data_source.batch_iterator([trainX, train_y], batchsize=128)
    ...

where train_X and train_y can be NumPy arrays or other objects that support __len__ and __getitem__; for more info see the protocols defined in data_source

trainer module

Defines the Trainer class that implements a reasonably flexible training loop. It monitors the
validation score of the network and can terminate early if the score doesn't improve within a certain number of epochs. It also reports training progress and saves the state of the network when validation score improves. Its train method accepts data sources and uses the data_source module to get batches. Setting up the trainer looks like:

train_batch = theano.function(...) # should update params and return train loss
eval_batch = theano.function(...) # should evaluate the performance and return loss/err rate

trainer = trainer.Trainer()

# Provide training function
trainer.train_with(train_batch_fn=train_batch)

# Provide evaluation function
trainer.evaluate_with(eval_batch_fn=eval_batch)

# Set the verbosity; can be VERBOSITY_NONE (nothing), VERBOSITY_MINIMAL (final performance only),
VERBOSITY_EPOCH (report every epoch), VERBOSITY_BATCH (report each batch for those large datasets)
trainer.report(verbosity=trainer.VERBOSITY_EPOCH)

# Have the trainer save the state of the network after each improvement in validation score
trainer.retain_best_scoring_state_of_network(final_layer)

# Train for a maximum of 100 epochs, minimum of 50, if training proceeds for 35 epochs without any improvement in validation score, stop early
trainer.train_for(num_epochs=100, min_epochs=50, val_improve_num_epochs=35)

# Train it
trainer.train(train_set=[train_X, train_y], val_set=[val_X, val_y], test_set=[test_X, test_y], batchsize=128)

basic_dnn module

Defines the BasicDNN class that defines a neural network that can have multiple inputs and multiple targets. Uses objectives defined in the dnn_objective module to provide targets to optimise and generate the loss function. Also defines the functions classifier, simple_classifier, regressor and simple_regressor. So setting up an image classifier is as easy as:

# Define a function that build the network according to the architecture that we need
def build_network(input_vars):
    in_var = input_vars[0] if input_vars is not None else None
    input_layer = lasagne.layers.InputLayer(shape=(None, 3, 48, 48), input_var=in_var)
    ...
    return final_layer

# doing this will build the network, set up loss functions, optimisation, etc
clf = basic_dnn.simple_classifier(
    build_network,           # network builder function
    n_input_spatial_dims=2,           #images are two-dimensional
    n_target_spatial_dims=0,           #simple classifier, just one prediction
    target_channel_index=None,           #if we give an index, then the target variable needs to be of shape `(sample,channel)` where this index is passed as `channel`
    score='err',           # score measure; can also be 'jaccard', 'f1', etc
    updates_fn=lambda cost, params: lasagne.updates.adam(cost, params, learning_rate=lr)          # updates function
)

clf.trainer.train_for(num_epochs=100, min_epochs=20, val_improve_num_epochs=35)

# train_X: (sample,channel,height,width)
# train_y: (sample,)
# similar for val_* and test_*
clf.trainer.train([train_X, train_y], [val_X, val_y], [test_X, test_y], batchsize=128)

simple_classifier is quite flexible; it can also be used to produce two-dimensional (e.g. pixel-wise) predictions in order to build something like an FCN segmentation network; change the n_target_spatial_dims parameter to 2:

fcn_clf = basic_dnn.simple_classifier(
    build_network,
    n_input_spatial_dims=2,
    n_target_spatial_dims=2,           #predict per pixel
    target_channel_index=None,`(sample,channel)` where this index is passed as `channel`
    score='err',
    updates_fn=lambda cost, params: lasagne.updates.adam(cost, params, learning_rate=lr)
)

clf.trainer.train_for(num_epochs=100, min_epochs=20, val_improve_num_epochs=35)

# train_X: (sample,channel,height,width)
# train_y: (sample,height,width)
# similar for val_* and test_*
clf.trainer.train([train_X, train_y], [val_X, val_y], [test_X, test_y], batchsize=128)

The BasicDNN class allows for some more flexibility, so that multiple training objectives can be set up; e.g. a combination of classification and regression targets for e.g. Fast RCNN localiser. The objectives are defined in the dnn_objectives module.

image_window_extractor and tiling_scheme modules

This code is usable but slightly less developed so far. Provides classes for extracting patches from a set of images in a reasonably memory efficient manner. ImageWindowExtractor object support the __len__ and __getitem__ method so they can be passed as part of datasets, e.g. they can be passed to Trainer.train as train_X, or train_y; the batch iteration system will draw data from them as is.

work_pool module

This introduces a dependency on joblib and is probably beyond the scope of Lasagne. It provides functionality for spreading tasks among multiple processes and uses joblib's worker pool that uses shared memory to move data between processes quickly. Its intended use case is generating mini-batches of data in separate processes where this could be an expensive operation due to having to load data from disc, perform data augmentation, etc.

I don't know how much of this is of interest to you guys, but I would be happy to collaborate in order to incorporate it into Lasagne if you're up for it.

pretrained package

Provides code for building VGG and ResNet-50 (so far) networks, downloading weights, etc. The code is adapted from Lasagne-recipes. Main use cases is grabbing an existing network and using in a transfer learning scenario.

@Britefury Britefury changed the title Is there interest for flexible training loops, batch iteration schemes, etc Is there interest for implementing flexible training loops, batch iteration schemes, etc Oct 21, 2016
@Britefury Britefury changed the title Is there interest for implementing flexible training loops, batch iteration schemes, etc Is there interest for implementing flexible training loops, batch iteration schemes, etc. [offer of code] Oct 21, 2016
@f0k
Copy link
Member

f0k commented Oct 21, 2016

Would the Lasagne community / developers be interested in adding code for training loops, basic DNN construction, etc?

Yes! This is actually among our earliest issues: #12
Thank you for pushing this!

I found that code for training loops with features such as early termination, etc would involve a lot of copy+paste, so I put together a library [...]
I would be happy to collaborate in order to incorporate it into Lasagne if you're up for it.

We probably all have our own experiment framework built around Lasagne to avoid copy/pasting code. If we are to integrate training loops into Lasagne, ideally, they should be both easy to use for beginners and flexible enough so our frameworks on top can use them (that's not really a requirement -- the frameworks work as they are -- but more like a gauge for generalizability). So we need to be quite careful about the interface. This being said, what you describe seems like a very good start, more general than #599.

Code for iterating though a dataset looks like:

for (batch_X, batch_y) in data_source.batch_iterator([trainX, train_y], batchsize=128)
    ...

where train_X and train_y can be NumPy arrays or other objects that support __len__ and __getitem__;

I haven't looked at the code, but it seems it supports an arbitrary number of inputs, right? (This is a strict requirement.) The interface of __len__ and __getitem__ seems good, I can easily wrap my own framework to be compatible to that.

Defines the Trainer class

Looking at your example, it's not entirely clear why this should be a class, not a function -- couldn't this just be a train() function with a bunch of parameters? Do you need to access the trainer after training? I'm not yet saying it shouldn't be a class, just trying to understand what we need.
The train_with, evaluate_with interface reads nice, but shouldn't this all be part of the constructor (or function arguments, if reducing this to a function)? What happens in train if I omit one of these calls?
Furthermore, I'm not 100% sure about requiring the data and iteration function to be split up. Some users may have a generator function for the data that they would like to pass directly. Of course, this then requires some way to pass the epoch size (a feature I'd like to see in any case, as I often train on mini-epochs), and some way to restart the validation set iterator (or pass its size as well).
It should also be possible to perform some custom action after each epoch, like adjusting the learning rate. One way would be a callback function (in my framework, the callbacks gets passed the epoch count and the error history, and return whether training should stop -- this allows for learning rate adjustment, custom validation, and early stopping). Another, possibly simpler one would be providing a function for the inner training loop, doing one epoch, and another function wrapping it. Then users can decide whether to use the simple wrapper, or write the outer loop themselves. This is simpler to grasp than callback functions. (We could also have both.)

Defines the BasicDNN class that defines a neural network that can have multiple inputs and multiple targets.

It would be great to have a way to reduce the boilerplate for setting up the training and validation functions. But I think I don't like giving everything off to a class with a train() method -- this creates a lot of intransparency. Have a look at https://github.com/fchollet/keras/blob/70ebb15/keras/models.py#L557 and see how much you need to read to see what's going on, and to begin understanding how you can modify it. Thinking about it, I guess the key to transparency is returning to user code as often as feasible. So I'd rather provide a function that sets up the training and validation functions and returns them. This makes it easy for users to mix and match what to use from Lasagne, and what to implement themselves, without having to write any custom classes.

Uses objectives defined in the dnn_objective module to provide targets to optimise and generate the loss function.

Anything we integrate into lasagne should probably just use lasagne.objectives.

image_window_extractor

That might be something to keep out of Lasagne. It's hard to draw a line otherwise -- what about data augmentation, tools for inputting text or audio instead of images, ...?

work_pool

Again, any kind of data processing should probably be kept out of Lasagne. One of the design goals is to "do one thing and do it well". The idea would be to have the training loop code in Lasagne be general enough to work with any kind of any iteration, so your image patch iteration and multi-processing could live in its own independent module.

pretrained

It would be cool to turn Recipes into an installable module (Lasagne/Recipes#18), with an interface that makes it easy to access the model zoo. This could be a good basis.


So, to conclude: Yes, we're very interested, but we need to discuss the details of how to structure the implementation so it's generally useful. Alternatively, we can also discuss whether it needs to be generally useful, or just capture the most basic use cases. Let me know what you think about my comments above, I'm happy to discuss the interface in detail!

@Britefury
Copy link
Contributor Author

Concerning data_source.batch_iterator:

I haven't looked at the code, but it seems it supports an arbitrary number of inputs, right?

It handles an arbitrary number of inputs, yes.

Furthermore, I'm not 100% sure about requiring the data and iteration function to be split up. Some users may have a generator function for the data that they would like to pass directly.

It first checks that type of the input dataset and responds accordingly:

  • sequence of array-like (e.g. NumPy array) => return mini-batch iterator
  • has batch_iterator(batchsize, shuffle_rng) -> iterator method => this method will be called
  • is callable; assumed to have signature callable(batchsize, shuffle_rng) -> iterator effectively the same as batch_iterator method. Both options are there so you can pass a callable or an object

Concerning Trainer class

Looking at your example, it's not entirely clear why this should be a class, not a function

and

The train_with, evaluate_with interface reads nice, but shouldn't this all be part of the constructor (or function arguments, if reducing this to a function)? What happens in train if I omit one of these calls?

I agree my API needs work. I propose having a constructor and a train method. The constructor should take values for all the arguments. The train function should accept overrides for those arguments; in my BasicDNN class its quite useful to have it mostly set up the trainer for you (batch train/eval functions, network state retention, etc), only requiring you to specify how long you want it to train for, verbosity, datasets, etc. Even if the BasicDNN class doesn't make it over, I think that allowing people to initialise the trainer with sensible defaults in their own framework then overriding where necessary later on is quite a nice way to work.

I'll get to work on a PR.

@Britefury Britefury mentioned this issue Oct 23, 2016
@Britefury
Copy link
Contributor Author

Its in PR #759.

@f0k
Copy link
Member

f0k commented Oct 27, 2016

It first checks that type of the input dataset and responds accordingly: [...]

But what if I just have a generator, not a function that returns a generator? I think this might be a common use case we shouldn't dismiss too easily.

I agree my API needs work. I propose having a constructor and a train method. The constructor should take values for all the arguments. The train function should accept overrides for those arguments; [...]

Again, why should it be a class? If the train method accepts all arguments needed to influence training, couldn't it just be a function?

I'll get to work on a PR.

Thank you! Sorry for the delay, but I won't be able to read and think this through before the ICLR deadline (Nov 4).

@christopher-beckham
Copy link

What would be the difference between this and what's already implemented in nolearn? Is this meant to be a more 'lightweight' alternative?

@f0k
Copy link
Member

f0k commented Oct 27, 2016

Is this meant to be a more 'lightweight' alternative?

Ideally, yes. It should avoid taking too much out of the users' hands, or at least also offer something that's between Lasagne's MNIST example (all the boilerplate code) and nolearn (no boilerplate code), to make it more obvious how to customize things (in contrast to Keras). That's also why I'd like to avoid a model or trainer class if possible.

@benanne
Copy link
Member

benanne commented Oct 29, 2016

Thanks for this! I had a lot of ideas about what this should look like, but unfortunately I never managed to write any actual code for it :) This looks like a great start!

It should also be possible to perform some custom action after each epoch, like adjusting the learning rate. One way would be a callback function (in my framework, the callbacks gets passed the epoch count and the error history, and return whether training should stop -- this allows for learning rate adjustment, custom validation, and early stopping). Another, possibly simpler one would be providing a function for the inner training loop, doing one epoch, and another function wrapping it. Then users can decide whether to use the simple wrapper, or write the outer loop themselves. This is simpler to grasp than callback functions. (We could also have both.)

Good call. I actually really like iterators and generators in Python -- to me it always made sense that a "train" function would actually be a generator that performs an epoch at each step (which in the limit could be a single gradient step, e.g. for "infinite" datasets this would be useful). Then anything you want to do / print / .. just becomes the body of a for loop, which reads very naturally. For data loading, supporting arbitrary generators is also quite natural.

I did actually start writing some code for this at some point, but never finished it. The interface consisted of such a "train()" generator, and a "fit()" function that wrapped train() with a loop with sensible defaults (i.e. printing some stuff, validating every N epochs, ...).

More generally, the idea is to layer complex API functions (e.g. fit()) on top of simpler, more flexible API functions (e.g. train()), so that people can choose from a number of different flexibility levels and easily switch between them. The code for the high-level functions would be pretty readable, with short functions, because they build on the lower-level API, so it would be easy to copy/paste and adapt. Those functions would basically do double duty as API functions and as examples.

Thinking about it, I guess the key to transparency is returning to user code as often as feasible. So I'd rather provide a function that sets up the training and validation functions and returns them. This makes it easy for users to mix and match what to use from Lasagne, and what to implement themselves, without having to write any custom classes.

Also a good argument in favour of generators!

@f0k
Copy link
Member

f0k commented Oct 30, 2016

to me it always made sense that a "train" function would actually be a generator that performs an epoch at each step

That's a useful pattern if the train function needs to be stateful. Otherwise, having a for _ in range(num_epochs) or a while True as the outer loop and calling train in the body may be more flexible.

More generally, the idea is to layer complex API functions (e.g. fit()) on top of simpler, more flexible API functions (e.g. train()), so that people can choose from a number of different flexibility levels and easily switch between them. The code for the high-level functions would be pretty readable, with short functions, because they build on the lower-level API, so it would be easy to copy/paste and adapt. Those functions would basically do double duty as API functions and as examples.

This nails what I had in mind. If fit() does what you need, just use fit(), otherwise copy/paste the source code of fit() and adapt it. (And the same holds for the lower-level functions that fit() calls. If you expand everything, you would end up with what the MNIST example looks like.) This gets difficult when we start using classes, so we should try and see if they can be avoided.

@benanne
Copy link
Member

benanne commented Oct 30, 2016

I expected that we'd probably see eye to eye on this ;) doesn't have to be just two levels by the way, I just wanted to give an example to explain my point of view. fit() should probably be the top level, but what sits below that is definitely up for debate as far as I'm concerned.

re: statefulness of train(), I suppose it usually isn't? In which case I agree that turning it into a generator might be overkill. I'll try to think of some use cases for stateful train().

@f0k
Copy link
Member

f0k commented Oct 31, 2016

doesn't have to be just two levels by the way

Of course not!

I'll try to think of some use cases for stateful train().

Well, it depends on what level of abstraction train() is :) Something has to log the error history, to be able to do early stopping or implement some learning rate heuristics.

fit() should probably be the top level, but what sits below that is definitely up for debate as far as I'm concerned.

Yes, we need to think this through. I don't have a lot of time right now, so I can just offer some random thoughts:

  • It'd be nice to have something that you just pass your network to, along with two numpy arrays, and have it trained. This could be fit().
  • This would consist of compiling the training and possibly validation function, and running the training loop.
  • Compiling the training and validation functions involves setting up variables for the targets, getting the expression for the network output, setting up the loss and updates, and compiling the functions. This could be compile_functions() or something, and offer to compile any of training, validation and prediction functions.
  • Running the training loop involves running an inner training loop, computing the validation error, and possibly adapting some hyperparameters, for multiple epochs.
  • The inner training loop involves iterating over the training data, calling the compiled training function for a given number of steps, and returning the average training loss.
  • We might want to take care not to get carried away. It doesn't have to solve everything. But we may want to ensure we can extend it in the future until it can solve everything :)

@nouiz
Copy link

nouiz commented Oct 31, 2016

I can give you one. If the Theano function take no input, it will have less
overhead. OK, for medium model this is not significant. Also, it can be
called many consecutive times in one call. The loop can happen in C code:)

To select the mini batches, it is a shared variable that get incremented at
each call.

I agree, not the most useful use cases. But it is one.

Le 30 oct. 2016 17:15, "Sander Dieleman" notifications@github.com a
écrit :

I expected that we'd probably see eye to eye on this ;) doesn't have to be
just two levels by the way, I just wanted to give an example to explain my
point of view. fit() should probably be the top level, but what sits
below that is definitely up for debate as far as I'm concerned.

re: statefulness of train(), I suppose it usually isn't? In which case I
agree that turning it into a generator might be overkill. I'll try to think
of some use cases for stateful train().


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#756 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-0wdRHXT1pROtFhfSIOJXNLmr18Pks5q5QkOgaJpZM4KdIiJ
.

@f0k
Copy link
Member

f0k commented Oct 31, 2016

I agree, not the most useful use cases. But it is one.

Thanks for chiming in! However, what we meant by train() was not a Theano function, but a higher-level function encapsulating part of a training loop. The question was whether we'd do an API like the following, with a stateful function:

for loss in train(num_epochs, all_the_other_arguments):
    ...

or something like this, with a stateless function:

for _ in range(num_epochs):
    loss = train(all_the_other_arguments)
    ...

My argument was that the latter is a little more flexible and transparent since you control all of the loop, but it only works if train() does not need to keep any state between calls. Thus we're trying to find use cases for a stateful training loop function, to see if it's needed.

@Britefury
Copy link
Contributor Author

I've replaced the Trainer class with a single function as suggested by @f0k. I realised that the user cases I had in mind for the constructor - namely providing default parameters that can be overridden by passing replacement values to train - can be handled by functools.partial.

@benanne
Copy link
Member

benanne commented Nov 1, 2016

Cool :)

The train() generator I was working on had a flexible interface, allowing for the model and training setup to be specified in a variety of ways and using sensible defaults where possible. I figured I'd give a few examples to get the idea across:

train(model=model, loss=loss, data=data):
model is a top-level Lasagne layer (or multiple layers), loss is a Theano expression for the loss, everything else is taken care of: parameters are obtained with get_all_params(trainable=True), updates are determined by obtaining the gradient of loss w.r.t. the parameters and using a default algorithm (coudl be Adam for example), the structure of the data is assumed to be (X, y) and can be either a pair of numpy arrays, or a generator returning batches. Theano functions are compiled as needed.

train(model=model, loss=loss, data=data, updates=lasagne.updates.momentum):
as above but the function used to compute updates from gradients is overridden

train(model=model, loss=loss, data=data, updates=updates_dict):
as above, but updates are given directly (so no gradients need to be computed inside train())

train(model=model, loss=loss, params=params, data=data):
as above but now the parameters are not obtained using get_all_params() because they are given explicitly (and these are then used to compute the updates).

train(model=model, loss=loss, data=data, batch_size=batch_size):
batch size is given; is data is a pair of numpy arrays, extract batches of this size from them for training. If it is a generator, the generator is now assumed to return "chunks" which the train() function further subdivides into batches of size batch_size

train(model=model, loss=loss, data=data, monitor=[monitor_expr]):
one or more "monitor" expressions are specified: Theano expressions whose values should be computed by the train function and kept track of.

train(model=model, loss=loss, data=data, mapping=[input1_expr, input2_expr, ...])
The data is not in the (X, y) format, so a mapping is specified (in the form of a list of Theano expressions) to specify how the data should be fed into the network.

... and various combinations of the above.

It was maybe a bit too general and I never got around to implementing a working version, but I think the key idea of having very flexible API functions that accept various different combinations of arguments could be very nice from a UX point of view (admittedly, a bit less nice from a code cleanliness / maintainability point of view). In 90% of cases, the first invocation, specifying only model, loss and data, is probably sufficient, and this greatly simplifies the code I think.

So in summary: the train() function somehow needs to know about three things: the model, the data and the updates to apply. There are many different ways to specify these three. It would be nice if handling all that boilerplate was abstracted away a bit, by relying on sensible defaults.

@Britefury
Copy link
Contributor Author

Britefury commented Nov 2, 2016

@benanne The API you suggest sounds really rather nice; taking the model, loss, data and mappings then doing the whole lot. I could see an issue if you wanted to implement a GAN for example, as you have to alternate between updating the discriminator and generator.

The code that I have written (in PR #759) doesn't go that far; its just the training loop. It takes the data, selects mini-batches and hands it to batch training and validation functions that you have to provide. It handles early stopping, saving and restoring the state of the network parameters, reporting progress, etc. (Implementing slightly more complex models such as GANs is quite possible; for the batch training function you pass a Python function that calls the discriminator update then the generator update).

I think that the API that you have proposed could be implemented on top of the training loop; the train function you suggest could compile the relevant Theano functions and then pass them to the training loop function.

What do you reckon?

@benanne
Copy link
Member

benanne commented Nov 2, 2016

I could see an issue if you wanted to implement a GAN for example, as you have to alternate between updating the discriminator and generator.

True, if train() is a generator this would mean that you actually need to instantiate the iterators and call next() on them in an alternating fashion, this wouldn't work with a single for loop.

Alternatively, if train() is a function, you could just call it twice inside the for loop (once for each sub-model).

I think that the API that you have proposed could be implemented on top of the training loop; the train function you suggest could compile the relevant Theano functions and then pass them to the training loop function.

Not 100% clear on what you mean by this, to be honest -- but it sounds like it follows the modular API idea that we discussed earlier pretty well. It's probably easier to just look at some code and then discuss the issue :)

@Britefury
Copy link
Contributor Author

Britefury commented Nov 2, 2016

We could move the discussion to PR #759 where you can find the code :) (maybe thats not what you meant... :) )

You can see the API usage in the mnist_trainer example that is part of the PR. Admittedly in the case of MNIST its not saving an awful lot of code, but the training loop is more useful when slightly more complex scenarios arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants