The case for splitting model class #417

piiswrong · 2015-10-28T22:10:24Z

The current model class is specific to feedforward net, but it contains some common functions that all models can use, like save&load. Plus if we want learning rate multipliers for each parameter, it's better if we can do it once in base class rather than in every custom training loop. I propose the following changes:

Split model class into Model, FeedforwardModel (inherits Model), and Trainer (or TrainingPlan? Solver? Executor?) with following functions:
Model: handles saving, loading, etc. Should hold parameters.
FeedForwardModel: Subclass of model, handles feedforward specific parts
Trainer: Implements training loop. Calls optimizer. should have Trainer, ParallelTrainer, Distributed Trainner. learning rate multipliers should be handled by this
Merge the training loop part of LSTM example into LSTMModel
We can also have autoencoder model and RBM model if anyone is still using it.

futurely · 2015-10-29T03:04:38Z

The corresponding concepts in Caffe are net, solver and solvers. Caffe has merged the parallel features in BVLC/caffe#2870 and BVLC/caffe#2903. Distributed training was published by @cque7 from Intel yesterday.

antinucleon · 2015-10-29T04:57:46Z

I think explicit solver is unnecessary. In my view model is for ppl don't have any customize requirement. Symbolic interface is better for customized need. Also we are not making an extra caffe, if there is performance consideration we welcome to have benchmark,

tqchen · 2015-10-29T05:02:35Z

I guess @piiswrong is talking about potential re-use of some components in model for other types of purposes. This requirement is reasonable. I can see two possible ways to do this:

Do this after RNN model into mxnet/python, look for common components after wards
@piiswrong Can propose a PR on this, with an implementation of auto-encoder, to show there can be case for code reuse

antinucleon · 2015-10-29T05:29:16Z

@piiswrong I think the headache part for RNN model is hidden states. I will make an attention model tomorrow or the day after tomorrow, then I believe we will have a better sense of how to reuse current code.

piiswrong · 2015-10-29T06:46:07Z

@antinucleon we can debate about whether solvers are necessary, but I think FeedForwardModel/Model separation is a good idea. Without a base model class, you need to write save/load for every custom model to say the least. There are also stuff like listing/allocating parameter that can be reused.

This is also going to be bad for creating a pretrained model repo in the future, since you have to sed weights along with code to load them.

pluskid · 2015-11-23T04:40:25Z

Some code from FeedForward should definitely be refactored and put into super class once we have another model type. I vote for not having a separate trainer. Because different models will have different training logics. For example, RNN models will need to maintain states transition, gradient cut-off over time.

tqchen · 2015-12-15T04:13:56Z

closing due to inactive status

piiswrong changed the title ~~The case for refactoring model class~~ The case for splitting model class Oct 28, 2015

tqchen added enhancement Discussion labels Oct 29, 2015

tqchen closed this as completed Dec 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The case for splitting model class #417

The case for splitting model class #417

piiswrong commented Oct 28, 2015

futurely commented Oct 29, 2015

antinucleon commented Oct 29, 2015

tqchen commented Oct 29, 2015

antinucleon commented Oct 29, 2015

piiswrong commented Oct 29, 2015

pluskid commented Nov 23, 2015

tqchen commented Dec 15, 2015

The case for splitting model class #417

The case for splitting model class #417

Comments

piiswrong commented Oct 28, 2015

futurely commented Oct 29, 2015

antinucleon commented Oct 29, 2015

tqchen commented Oct 29, 2015

antinucleon commented Oct 29, 2015

piiswrong commented Oct 29, 2015

pluskid commented Nov 23, 2015

tqchen commented Dec 15, 2015