-
Notifications
You must be signed in to change notification settings - Fork 6.8k
The case for splitting model class #417
Comments
The corresponding concepts in Caffe are net, solver and solvers. Caffe has merged the parallel features in BVLC/caffe#2870 and BVLC/caffe#2903. Distributed training was published by @cque7 from Intel yesterday. |
I think explicit solver is unnecessary. In my view model is for ppl don't have any customize requirement. Symbolic interface is better for customized need. Also we are not making an extra caffe, if there is performance consideration we welcome to have benchmark, |
I guess @piiswrong is talking about potential re-use of some components in model for other types of purposes. This requirement is reasonable. I can see two possible ways to do this:
|
@piiswrong I think the headache part for RNN model is hidden states. I will make an attention model tomorrow or the day after tomorrow, then I believe we will have a better sense of how to reuse current code. |
@antinucleon we can debate about whether solvers are necessary, but I think FeedForwardModel/Model separation is a good idea. Without a base model class, you need to write save/load for every custom model to say the least. There are also stuff like listing/allocating parameter that can be reused. This is also going to be bad for creating a pretrained model repo in the future, since you have to sed weights along with code to load them. |
Some code from |
closing due to inactive status |
The current model class is specific to feedforward net, but it contains some common functions that all models can use, like save&load. Plus if we want learning rate multipliers for each parameter, it's better if we can do it once in base class rather than in every custom training loop. I propose the following changes:
Model: handles saving, loading, etc. Should hold parameters.
FeedForwardModel: Subclass of model, handles feedforward specific parts
Trainer: Implements training loop. Calls optimizer. should have Trainer, ParallelTrainer, Distributed Trainner. learning rate multipliers should be handled by this
The text was updated successfully, but these errors were encountered: