Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
I tried to add some optimizers. As for the definition of model in src/model.jl line 8:
Optimizers are essentially models, which have the signature of
a definition of optimizer looks like this:
and to train a model using optimizers (SGD with momentum and decay) looks like this:
Currently them works on MXNet backend. What's your opinions of this approch?
My first thought is, this is putting a fairly large burden on the backend to implement much of the optimisation process; I was expecting that a backend wouldn't have to do much more than forward
Being able to implement optimisers with
I think it would be a good idea to implement the most straightforward version of this that works in pure Julia mode; e.g. when calling
Hopefully you should find that straightforward to do, but let me know if it's not clear.
I'm not very clear about the semantic of
To address this, we need to clearly define what
If we go the first way, there should be a way to "turn off" the synchronization of params in the training process, otherwise transfering weights and grads every batch is inevitable. This will make things very complex.
If we choose 2 or 3, that will make things much more easier since we can convert
As for where to put the state, yes I can have the optimizer to keep a
And for the
Great questions. I'll try to explain my current thinking on this as much as I can.
In general, functions of models have "wrapper semantics"; think
I don't think any of those options should make things significantly more or less complex then; it's just a question of when
Keep the thoughts coming, especially if that's not clear.
2 times, most recently
Jun 27, 2017
Great, this looks much improved.
Although I started this out with a recursive
update!, I'm wondering if it would just be cleaner to grab all params up front – like
params(Affine(10,5)) == [Param(10,5), Param(1,5)],
opt = SGD(params(model)). Then you could call
update!(opt) to carry out the update. What do you think?
That would also make it easier to get rid of the nested closures and just have an
SGD object with the appropriate state, and an
I really like the way you can compose optimisers together, but it would be nice if that was built on top of the basic framework, rather than special cased at the bottom. For example, with the tweaks above:
struct Multi fs end update!(m::Multi) = foreach(update!, m.fs)
That would also avoid the need to repeat the decay stuff many times. If the user wants a decay they can easily just compose the basic optimiser with a decay themselves.