Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pause/resume capability for the models trainer #20

Closed
iustinam90 opened this issue Mar 27, 2018 · 3 comments
Closed

Add pause/resume capability for the models trainer #20

iustinam90 opened this issue Mar 27, 2018 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed todo This needs to be done wontfix This will not be worked on

Comments

@iustinam90
Copy link
Collaborator

iustinam90 commented Mar 27, 2018

Expected Result

Save the current trainer state (weights, costs, etc) to files on disk to be able to stop and resume training at a later time.

Actual Result

Currently, when the trainer is interrupted, no state is saved so that when you rerun the trainer it will start from scratch.

@tiberiu44 tiberiu44 added enhancement New feature or request help wanted Extra attention is needed todo This needs to be done labels Mar 28, 2018
@tiberiu44
Copy link
Contributor

tiberiu44 commented Mar 28, 2018

Actually, we do have checkpoints saved after each epoch, but there is no resume capability. This should not be too hard. Each network is able to load a pretrained model. We only have to restructure the entry training method to check if a model was provided via command line and load it after it initializes the network

@ruxandraburtica
Copy link
Contributor

We need to do the following:

  • Keep track of model metrics, and save them to disk
  • Load model from the path where it was saved
  • Load model metrics from disk
embeddings = WordEmbeddings()
embeddings.read_from_file(params.embeddings, encodings.word_list)

# Load parser from file, instead of using BDRNNParser.
path = params.output_base + '.last'
print('Loading model from path: {}'.format(path))
parser.load(path)

trainer = ParserTrainer(parser, encodings, params.itters, trainset, devset, testset)
trainer.start_training(params.output_base, params.batch_size)

@tiberiu44 tiberiu44 added the wontfix This will not be worked on label Sep 24, 2018
@tiberiu44
Copy link
Contributor

This is a rare use-case. It's not speech/voice processing. Training is a lot faster and it does not require resume capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed todo This needs to be done wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants