Add pause/resume capability for the models trainer #20

iustinam90 · 2018-03-27T11:57:17Z

Expected Result

Save the current trainer state (weights, costs, etc) to files on disk to be able to stop and resume training at a later time.

Actual Result

Currently, when the trainer is interrupted, no state is saved so that when you rerun the trainer it will start from scratch.

tiberiu44 · 2018-03-28T21:18:30Z

Actually, we do have checkpoints saved after each epoch, but there is no resume capability. This should not be too hard. Each network is able to load a pretrained model. We only have to restructure the entry training method to check if a model was provided via command line and load it after it initializes the network

ruxandraburtica · 2018-07-10T08:51:52Z

We need to do the following:

Keep track of model metrics, and save them to disk
Load model from the path where it was saved
Load model metrics from disk

embeddings = WordEmbeddings()
embeddings.read_from_file(params.embeddings, encodings.word_list)

# Load parser from file, instead of using BDRNNParser.
path = params.output_base + '.last'
print('Loading model from path: {}'.format(path))
parser.load(path)

trainer = ParserTrainer(parser, encodings, params.itters, trainset, devset, testset)
trainer.start_training(params.output_base, params.batch_size)

tiberiu44 · 2018-09-24T15:09:58Z

This is a rare use-case. It's not speech/voice processing. Training is a lot faster and it does not require resume capabilities.

tiberiu44 added enhancement New feature or request help wanted Extra attention is needed todo This needs to be done labels Mar 28, 2018

tiberiu44 added the wontfix This will not be worked on label Sep 24, 2018

tiberiu44 closed this as completed Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pause/resume capability for the models trainer #20

Add pause/resume capability for the models trainer #20

iustinam90 commented Mar 27, 2018 •

edited by ruxandraburtica

tiberiu44 commented Mar 28, 2018 •

edited

ruxandraburtica commented Jul 10, 2018

tiberiu44 commented Sep 24, 2018

Add pause/resume capability for the models trainer #20

Add pause/resume capability for the models trainer #20

Comments

iustinam90 commented Mar 27, 2018 • edited by ruxandraburtica

Expected Result

Actual Result

tiberiu44 commented Mar 28, 2018 • edited

ruxandraburtica commented Jul 10, 2018

tiberiu44 commented Sep 24, 2018

iustinam90 commented Mar 27, 2018 •

edited by ruxandraburtica

tiberiu44 commented Mar 28, 2018 •

edited