Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

The case for splitting model class #417

Closed
piiswrong opened this issue Oct 28, 2015 · 7 comments
Closed

The case for splitting model class #417

piiswrong opened this issue Oct 28, 2015 · 7 comments

Comments

@piiswrong
Copy link
Contributor

The current model class is specific to feedforward net, but it contains some common functions that all models can use, like save&load. Plus if we want learning rate multipliers for each parameter, it's better if we can do it once in base class rather than in every custom training loop. I propose the following changes:

  1. Split model class into Model, FeedforwardModel (inherits Model), and Trainer (or TrainingPlan? Solver? Executor?) with following functions:
    Model: handles saving, loading, etc. Should hold parameters.
    FeedForwardModel: Subclass of model, handles feedforward specific parts
    Trainer: Implements training loop. Calls optimizer. should have Trainer, ParallelTrainer, Distributed Trainner. learning rate multipliers should be handled by this
  2. Merge the training loop part of LSTM example into LSTMModel
  3. We can also have autoencoder model and RBM model if anyone is still using it.
@piiswrong piiswrong changed the title The case for refactoring model class The case for splitting model class Oct 28, 2015
@futurely
Copy link

The corresponding concepts in Caffe are net, solver and solvers. Caffe has merged the parallel features in BVLC/caffe#2870 and BVLC/caffe#2903. Distributed training was published by @cque7 from Intel yesterday.

@antinucleon
Copy link
Contributor

I think explicit solver is unnecessary. In my view model is for ppl don't have any customize requirement. Symbolic interface is better for customized need. Also we are not making an extra caffe, if there is performance consideration we welcome to have benchmark,

@tqchen
Copy link
Member

tqchen commented Oct 29, 2015

I guess @piiswrong is talking about potential re-use of some components in model for other types of purposes. This requirement is reasonable. I can see two possible ways to do this:

  • Do this after RNN model into mxnet/python, look for common components after wards
  • @piiswrong Can propose a PR on this, with an implementation of auto-encoder, to show there can be case for code reuse

@antinucleon
Copy link
Contributor

@piiswrong I think the headache part for RNN model is hidden states. I will make an attention model tomorrow or the day after tomorrow, then I believe we will have a better sense of how to reuse current code.

@piiswrong
Copy link
Contributor Author

@antinucleon we can debate about whether solvers are necessary, but I think FeedForwardModel/Model separation is a good idea. Without a base model class, you need to write save/load for every custom model to say the least. There are also stuff like listing/allocating parameter that can be reused.

This is also going to be bad for creating a pretrained model repo in the future, since you have to sed weights along with code to load them.

@pluskid
Copy link
Contributor

pluskid commented Nov 23, 2015

Some code from FeedForward should definitely be refactored and put into super class once we have another model type. I vote for not having a separate trainer. Because different models will have different training logics. For example, RNN models will need to maintain states transition, gradient cut-off over time.

@tqchen
Copy link
Member

tqchen commented Dec 15, 2015

closing due to inactive status

@tqchen tqchen closed this as completed Dec 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants