Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Gluon-ts support early stopping? #555

Closed
kmosby1992 opened this issue Jan 9, 2020 · 10 comments
Closed

Does Gluon-ts support early stopping? #555

kmosby1992 opened this issue Jan 9, 2020 · 10 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested

Comments

@kmosby1992
Copy link

Does Gluon-ts support early stopping (for example: stop training after the loss fails to reduce after 50 epochs)? I see that we can set the learning rate to reduce if the loss fails to go down but could not find any options for early stopping

@kmosby1992 kmosby1992 added the question Further information is requested label Jan 9, 2020
@lostella
Copy link
Contributor

lostella commented Jan 9, 2020

@kmosby1992 you're right, there is no such an option currently. Maybe one natural way of modifying the training loop to get that, is to have the learning rate reduction mechanism to actually halt training once the minimum learning rate is hit, and no progress is done withing the prescribed patience. What do you think?

@jackdent
Copy link

jackdent commented Jan 18, 2020

How about add an optional additional argument to the Trainer which is a function that gets called at the end of each epoch, and returns either True or False (whether to continue training). This function could use a closure to keep track of the historical losses for each epoch, which would also be useful if you wanted to plot the loss curves after training e.g.

train_losses = []
validation_losses = []

def cb(training_loss, validation_loss):
    train_losses.append(training_loss)
    validation_losses.append(validation_loss)

    if len(train_losses) >= 50:
        return abs(train_losses[-50] - train_losses[-1]) > epsilon
    else:
        return True

trainer = Trainer(..., cb=cb)

@lostella lostella added the enhancement New feature or request label Jan 19, 2020
@lostella
Copy link
Contributor

@jackdent I think a callback mechanism like that could be useful in general, I agree we could have that option in the trainer. But even before that, maybe the current mechanism of learning rate scheduling should be adjusted so that once the patience is exceeded and the minimum stepsize is reached, the iterations stop.

@MaximilianPavon
Copy link

I'd love to see a callback mechanism being implemented for the Trainer e.g. for external logging.

@lostella
Copy link
Contributor

@kmosby1992 the changes in #701 now allow for early stopping: essentially, the learning rate scheduler stop the training loop once the loss stops decreasing and the learning rate has reached the minimum

@jackdent @MaximilianProll I'll open a separate issue for the callback mechanism

@MaximilianPavon
Copy link

Thanks a lot, @lostella, can you link the issue here once it's created?

@lostella
Copy link
Contributor

@MaximilianProll see #706

@Arfea
Copy link

Arfea commented Jun 1, 2020

I'd also love to see an implementation of callbacks. Would you suggest where to start if I want to implement a callback method?

@kaijennissen
Copy link
Contributor

How exactly does the early stopping work? Is there a way to stop training if the validation loss does not decrease for a given number of iterations? As far as I understand the code, the patience arguement in Trainer controls the learning_rate_scheduler and not the early_stopping.

@jaheba
Copy link
Contributor

jaheba commented Jul 30, 2020

There is a mechanism for early_stopping in the lrs of the trainer:

https://github.com/awslabs/gluon-ts/blob/e52864f7ee5d173dac38e7a984b9ea615397e2f2/src/gluonts/mx/trainer/learning_rate_scheduler.py#L124-L129

Once min_lr is reached, training stops. With the current default settings this is the case after five cycles of learning-rate reductions.

That said, there are probably more intuitive ways to set this behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants