Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add early stopping and custom metrics #165

Merged
merged 10 commits into from Dec 9, 2019
Merged

Add early stopping and custom metrics #165

merged 10 commits into from Dec 9, 2019

Conversation

johann-petrak
Copy link
Contributor

This should fix most of #160
It allows to set up an EarlyStopping instance that knows which metric to use, where to save the best models, how much patience to have etc. and pass it on to the trainer.
It also allows to register additional new metrics functions.

There is a separation between using metrics and reporting and probably some work is needed there which I have not looked into yet. Thought it may be best to look at what we have here now and merge and test if it looks good or figure out how to improve if there are still issues.

Allow specification of metric, use saved best model for any test
evaluation.
At least for text classification, did not test for NER.
Also, still need to check what exactly gets logged now and
what else we should log if we do early stopping.
This can use a user-defined metrics function and any metric from
the result for early stopping. The early stopping metric can either
be the key from the first head evaluation result or a fcuntion that
calculates a value from the whole evaluation result.
User defined metrics functions need to get registered under a
string name so they can get accessed easily after storing and
re-loading the processor.
@tholor tholor changed the title Issue160 Add early stopping and custom metrics Dec 3, 2019
@tholor tholor self-requested a review December 3, 2019 13:52
@tholor tholor added enhancement New feature or request part: evaluator Evaluator part: trainer Trainer labels Dec 3, 2019
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work @johann-petrak! Looks good!
I only added a few minor comments / questions.
After resolving the comments and some minor clean ups, we should be good to merge. 🙂

There is a separation between using metrics and reporting and probably some work is needed there [...]

I don't fully understand your comment. Could you please elaborate what your concern is regarding the reporting. Do you mean that there's now support for custom metrics but not for custom reports?

farm/train.py Outdated Show resolved Hide resolved
farm/train.py Outdated Show resolved Hide resolved
farm/metrics.py Show resolved Hide resolved
examples/doc_classification.py Outdated Show resolved Hide resolved
@johann-petrak
Copy link
Contributor Author

OK, I pushed the changes that address your review.

Ignore my comment about the reporting, I was just confused by the fact that the "report" uses a different way to calculate f1 than what is predefined in the metrics or could now get defined and registered, but I guess it is good to always have something included that is specific to the kind of head used.

@johann-petrak
Copy link
Contributor Author

The travis CI job is failing but it does not look like something related to my changes?

@tholor
Copy link
Member

tholor commented Dec 9, 2019

Great! Thanks a lot for rounding everything up!

The travis CI is currently only failing due to low memory on the worker. So definitely unrelated to your changes. On a bigger worker all tests pass. I will merge it into master now.

@tholor tholor merged commit 3b0c521 into deepset-ai:master Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request part: evaluator Evaluator part: trainer Trainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants