-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple evaluation metrics from tuning objectives #384
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Factor out metric parsing and validation into a seprate function that is now used for both the "objectives" option and this new "metrics" option. - Add several checks to raise warnings and errors depending on the task and whether both "objectives" and "metrics" are specified. - Pass in "metrics" to `run_configuration()`.
- Used in the `evaluate()` and `cross_validate()` tasks. - Return the additional values in a dictionary that's part of the task results. - Change the argument name in `learning_curve()` to be metric instead of `objective` since that makes more sense (there's no tuning).
- Handle receiving the metrics and passing them to the appropriate Learner methods. - Deal with the additional output dictionary when creating results files. - For the learning curve task, even though objectives are not actually used a objectives, we piggyback on that set up since then we can parallelize all of the metrics.
- Since we need the metrics list to be populated for later.
- Also update the actual CFG file from the example
aoifecahill
reviewed
Nov 8, 2017
skll/config.py
Outdated
logger.warning("The \"objectives\" option " | ||
"is deprecated for the learning_curve " | ||
"task and will not be supported " | ||
"starting with the next release; please " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "after" instead of "starting with"?
skll/config.py
Outdated
elif task in ['evaluate', 'cross_validate']: | ||
# for other appropriate tasks, if metrics and objectives have | ||
# some overlaps - we will assume that the user meant to include | ||
# use the metric for tuning _and_ evaluation, not just evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use -->
@aoifecahill @mulhod @cml54 @benbuleong @Lguyogiro this PR is now ready for review. |
aoifecahill
approved these changes
Nov 8, 2017
benbuleong
approved these changes
Nov 8, 2017
Lguyogiro
approved these changes
Nov 8, 2017
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now, the only metrics computed for any SKLL experiment are the tuning objectives. However, one may want to compute additional evaluation metrics without tuning on them. This PR makes that possible.
For
evaluate
andcross_validate
tasks, one can now specify ametrics
list in the Output section and those will be computed and saved at the end of the results file under an "Additional Evaluation Metrics" section. Fortrain
andpredict
tasks,metrics
is ignored since it's not relevant.For
learning_curve
tasks, it doesn't actually make sense to haveobjectives
anyway so you can specifymetrics
instead. Usingobjectives
is still supported for backward compatibility but you will get a deprecation warning. (Note: as an implementation detail, we still use the "objectives" variable internally because then we can piggyback on that to parallelize the various jobs.)The parsing and validation of
objectives
andmetrics
is now factored out into a separate function to avoid code duplication.Update documentation.
Add several new tests and update existing tests to deal with the new additional metrics being produced in the outputs.