-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there interest for implementing flexible training loops, batch iteration schemes, etc. [offer of code] #756
Comments
Yes! This is actually among our earliest issues: #12
We probably all have our own experiment framework built around Lasagne to avoid copy/pasting code. If we are to integrate training loops into Lasagne, ideally, they should be both easy to use for beginners and flexible enough so our frameworks on top can use them (that's not really a requirement -- the frameworks work as they are -- but more like a gauge for generalizability). So we need to be quite careful about the interface. This being said, what you describe seems like a very good start, more general than #599.
I haven't looked at the code, but it seems it supports an arbitrary number of inputs, right? (This is a strict requirement.) The interface of
Looking at your example, it's not entirely clear why this should be a class, not a function -- couldn't this just be a
It would be great to have a way to reduce the boilerplate for setting up the training and validation functions. But I think I don't like giving everything off to a class with a
Anything we integrate into lasagne should probably just use
That might be something to keep out of Lasagne. It's hard to draw a line otherwise -- what about data augmentation, tools for inputting text or audio instead of images, ...?
Again, any kind of data processing should probably be kept out of Lasagne. One of the design goals is to "do one thing and do it well". The idea would be to have the training loop code in Lasagne be general enough to work with any kind of any iteration, so your image patch iteration and multi-processing could live in its own independent module.
It would be cool to turn So, to conclude: Yes, we're very interested, but we need to discuss the details of how to structure the implementation so it's generally useful. Alternatively, we can also discuss whether it needs to be generally useful, or just capture the most basic use cases. Let me know what you think about my comments above, I'm happy to discuss the interface in detail! |
Concerning
|
Its in PR #759. |
But what if I just have a generator, not a function that returns a generator? I think this might be a common use case we shouldn't dismiss too easily.
Again, why should it be a class? If the
Thank you! Sorry for the delay, but I won't be able to read and think this through before the ICLR deadline (Nov 4). |
What would be the difference between this and what's already implemented in nolearn? Is this meant to be a more 'lightweight' alternative? |
Ideally, yes. It should avoid taking too much out of the users' hands, or at least also offer something that's between Lasagne's MNIST example (all the boilerplate code) and nolearn (no boilerplate code), to make it more obvious how to customize things (in contrast to Keras). That's also why I'd like to avoid a model or trainer class if possible. |
Thanks for this! I had a lot of ideas about what this should look like, but unfortunately I never managed to write any actual code for it :) This looks like a great start!
Good call. I actually really like iterators and generators in Python -- to me it always made sense that a "train" function would actually be a generator that performs an epoch at each step (which in the limit could be a single gradient step, e.g. for "infinite" datasets this would be useful). Then anything you want to do / print / .. just becomes the body of a for loop, which reads very naturally. For data loading, supporting arbitrary generators is also quite natural. I did actually start writing some code for this at some point, but never finished it. The interface consisted of such a "train()" generator, and a "fit()" function that wrapped train() with a loop with sensible defaults (i.e. printing some stuff, validating every N epochs, ...). More generally, the idea is to layer complex API functions (e.g. fit()) on top of simpler, more flexible API functions (e.g. train()), so that people can choose from a number of different flexibility levels and easily switch between them. The code for the high-level functions would be pretty readable, with short functions, because they build on the lower-level API, so it would be easy to copy/paste and adapt. Those functions would basically do double duty as API functions and as examples.
Also a good argument in favour of generators! |
That's a useful pattern if the
This nails what I had in mind. If |
I expected that we'd probably see eye to eye on this ;) doesn't have to be just two levels by the way, I just wanted to give an example to explain my point of view. re: statefulness of |
Of course not!
Well, it depends on what level of abstraction
Yes, we need to think this through. I don't have a lot of time right now, so I can just offer some random thoughts:
|
I can give you one. If the Theano function take no input, it will have less To select the mini batches, it is a shared variable that get incremented at I agree, not the most useful use cases. But it is one. Le 30 oct. 2016 17:15, "Sander Dieleman" notifications@github.com a
|
Thanks for chiming in! However, what we meant by for loss in train(num_epochs, all_the_other_arguments):
... or something like this, with a stateless function: for _ in range(num_epochs):
loss = train(all_the_other_arguments)
... My argument was that the latter is a little more flexible and transparent since you control all of the loop, but it only works if |
I've replaced the |
Cool :) The
... and various combinations of the above. It was maybe a bit too general and I never got around to implementing a working version, but I think the key idea of having very flexible API functions that accept various different combinations of arguments could be very nice from a UX point of view (admittedly, a bit less nice from a code cleanliness / maintainability point of view). In 90% of cases, the first invocation, specifying only model, loss and data, is probably sufficient, and this greatly simplifies the code I think. So in summary: the |
@benanne The API you suggest sounds really rather nice; taking the model, loss, data and mappings then doing the whole lot. I could see an issue if you wanted to implement a GAN for example, as you have to alternate between updating the discriminator and generator. The code that I have written (in PR #759) doesn't go that far; its just the training loop. It takes the data, selects mini-batches and hands it to batch training and validation functions that you have to provide. It handles early stopping, saving and restoring the state of the network parameters, reporting progress, etc. (Implementing slightly more complex models such as GANs is quite possible; for the batch training function you pass a Python function that calls the discriminator update then the generator update). I think that the API that you have proposed could be implemented on top of the training loop; the What do you reckon? |
True, if Alternatively, if
Not 100% clear on what you mean by this, to be honest -- but it sounds like it follows the modular API idea that we discussed earlier pretty well. It's probably easier to just look at some code and then discuss the issue :) |
We could move the discussion to PR #759 where you can find the code :) (maybe thats not what you meant... :) ) You can see the API usage in the mnist_trainer example that is part of the PR. Admittedly in the case of MNIST its not saving an awful lot of code, but the training loop is more useful when slightly more complex scenarios arise. |
Would the Lasagne community / developers be interested in adding code for training loops, basic DNN construction, etc?
I found that code for training loops with features such as early termination, etc would involve a lot of copy+paste, so I put together a library that can be found at:
http://github.com/Britefury/britefury-lasagne-helpers
I find it to be very helpful when I need to throw a network together and train/use it.
I would like to know if the Lasagne developers would be interested in taking some of this code and incorporating it into Lasagne?
My code has no documentation beyond the doc-strings; this would need to be worked on.
What's there:
data_souce
moduleDefines the
batch_iterator
function that extracts mini-batches from a data set which can be NumPy arrays or other sources of data; the protocol is defined in the doc-strings.Code for iterating though a dataset looks like:
where
train_X
andtrain_y
can be NumPy arrays or other objects that support__len__
and__getitem__
; for more info see the protocols defined indata_source
trainer
moduleDefines the
Trainer
class that implements a reasonably flexible training loop. It monitors thevalidation score of the network and can terminate early if the score doesn't improve within a certain number of epochs. It also reports training progress and saves the state of the network when validation score improves. Its train method accepts data sources and uses the
data_source
module to get batches. Setting up the trainer looks like:basic_dnn
moduleDefines the
BasicDNN
class that defines a neural network that can have multiple inputs and multiple targets. Uses objectives defined in thednn_objective
module to provide targets to optimise and generate the loss function. Also defines the functionsclassifier
,simple_classifier
,regressor
andsimple_regressor
. So setting up an image classifier is as easy as:simple_classifier
is quite flexible; it can also be used to produce two-dimensional (e.g. pixel-wise) predictions in order to build something like an FCN segmentation network; change then_target_spatial_dims
parameter to 2:The BasicDNN class allows for some more flexibility, so that multiple training objectives can be set up; e.g. a combination of classification and regression targets for e.g. Fast RCNN localiser. The objectives are defined in the
dnn_objectives
module.image_window_extractor
andtiling_scheme
modulesThis code is usable but slightly less developed so far. Provides classes for extracting patches from a set of images in a reasonably memory efficient manner.
ImageWindowExtractor
object support the__len__
and__getitem__
method so they can be passed as part of datasets, e.g. they can be passed toTrainer.train
astrain_X
, ortrain_y
; the batch iteration system will draw data from them as is.work_pool
moduleThis introduces a dependency on
joblib
and is probably beyond the scope of Lasagne. It provides functionality for spreading tasks among multiple processes and uses joblib's worker pool that uses shared memory to move data between processes quickly. Its intended use case is generating mini-batches of data in separate processes where this could be an expensive operation due to having to load data from disc, perform data augmentation, etc.I don't know how much of this is of interest to you guys, but I would be happy to collaborate in order to incorporate it into Lasagne if you're up for it.
pretrained
packageProvides code for building VGG and ResNet-50 (so far) networks, downloading weights, etc. The code is adapted from
Lasagne-recipes
. Main use cases is grabbing an existing network and using in a transfer learning scenario.The text was updated successfully, but these errors were encountered: