I tried to add some optimizers. As for the definition of model in src/model.jl line 8:
Optimizers are essentially models, which have the signature of
a definition of optimizer looks like this:
and to train a model using optimizers (SGD with momentum and decay) looks like this:
Currently them works on MXNet backend. What's your opinions of this approch?
The text was updated successfully, but these errors were encountered:
My first thought is, this is putting a fairly large burden on the backend to implement much of the optimisation process; I was expecting that a backend wouldn't have to do much more than forward
Being able to implement optimisers with
I think it would be a good idea to implement the most straightforward version of this that works in pure Julia mode; e.g. when calling
Hopefully you should find that straightforward to do, but let me know if it's not clear.
I'm not very clear about the semantic of
To address this, we need to clearly define what
If we go the first way, there should be a way to "turn off" the synchronization of params in the training process, otherwise transfering weights and grads every batch is inevitable. This will make things very complex.
If we choose 2 or 3, that will make things much more easier since we can convert
As for where to put the state, yes I can have the optimizer to keep a
And for the
Great questions. I'll try to explain my current thinking on this as much as I can.
In general, functions of models have "wrapper semantics"; think
I don't think any of those options should make things significantly more or less complex then; it's just a question of when
Keep the thoughts coming, especially if that's not clear.