Decouple evaluation metrics from tuning objectives #384

desilinguist · 2017-11-07T17:54:48Z

Right now, the only metrics computed for any SKLL experiment are the tuning objectives. However, one may want to compute additional evaluation metrics without tuning on them. This PR makes that possible.

For evaluate and cross_validate tasks, one can now specify a metrics list in the Output section and those will be computed and saved at the end of the results file under an "Additional Evaluation Metrics" section. For train and predict tasks, metrics is ignored since it's not relevant.
For learning_curve tasks, it doesn't actually make sense to have objectives anyway so you can specify metrics instead. Using objectives is still supported for backward compatibility but you will get a deprecation warning. (Note: as an implementation detail, we still use the "objectives" variable internally because then we can piggyback on that to parallelize the various jobs.)
The parsing and validation of objectives and metrics is now factored out into a separate function to avoid code duplication.
Update documentation.
Add several new tests and update existing tests to deal with the new additional metrics being produced in the outputs.

- Factor out metric parsing and validation into a seprate function that is now used for both the "objectives" option and this new "metrics" option. - Add several checks to raise warnings and errors depending on the task and whether both "objectives" and "metrics" are specified. - Pass in "metrics" to `run_configuration()`.

- Used in the `evaluate()` and `cross_validate()` tasks. - Return the additional values in a dictionary that's part of the task results. - Change the argument name in `learning_curve()` to be metric instead of `objective` since that makes more sense (there's no tuning).

- Handle receiving the metrics and passing them to the appropriate Learner methods. - Deal with the additional output dictionary when creating results files. - For the learning curve task, even though objectives are not actually used a objectives, we piggyback on that set up since then we can parallelize all of the metrics.

- Since we need the metrics list to be populated for later.

- Also update the actual CFG file from the example

coveralls · 2017-11-07T18:44:54Z

Coverage increased (+0.3%) to 92.334% when pulling c36a519 on decouple-metrics-from-objectives into 28a80b1 on master.

aoifecahill · 2017-11-08T15:01:18Z

skll/config.py

+            logger.warning("The \"objectives\" option "
+                           "is deprecated for the learning_curve "
+                           "task and will not be supported "
+                           "starting with the next release; please "


maybe "after" instead of "starting with"?

aoifecahill · 2017-11-08T15:01:48Z

skll/config.py

+    elif task in ['evaluate', 'cross_validate']:
+        # for other appropriate tasks, if metrics and objectives have
+        # some overlaps - we will assume that the user meant to include
+        # use the metric for tuning _and_ evaluation, not just evaluation


coveralls · 2017-11-08T16:12:58Z

Coverage increased (+0.3%) to 92.334% when pulling 86031ab on decouple-metrics-from-objectives into abf1db5 on master.

desilinguist · 2017-11-08T18:27:35Z

@aoifecahill @mulhod @cml54 @benbuleong @Lguyogiro this PR is now ready for review.

desilinguist added 20 commits November 3, 2017 08:52

objective -> metric for learning curves everywhere.

0ebfc31

Update learning curve test template.

82936a5

Zero out objectives for learning curve

1ce5ae9

- Since we need the metrics list to be populated for later.

Add input tests for metric option.

0fd711b

Show list of additional metrics in results header.

daa5bbe

Fix bug in additional metric computation.

4f26e56

Add new tests for metrics.

5293f92

Fix another bug in metric computation.

0b5457e

Add regression-based test for metrics.

00881d8

Remove unneeded import.

dc28488

Update test.

98a48cb

Fix tests.

3a4a09e

Fix python 2.7 bug in test.

24c563d

Remove accidental comments.

36b9193

Fix Python 2.7 string weirdness.

4c55450

Add metrics to documentation.

947f027

Update README to show metrics example

c36a519

- Also update the actual CFG file from the example

desilinguist self-assigned this Nov 7, 2017

desilinguist requested review from cml54, aoifecahill, mulhod and benbuleong November 7, 2017 17:54

Merge branch 'master' into decouple-metrics-from-objectives

86031ab

aoifecahill reviewed Nov 8, 2017

View reviewed changes

Add config file to cleanup method.

dd2c1a8

desilinguist requested a review from Lguyogiro November 8, 2017 18:52

Fix log message and comment.

83f3726

aoifecahill approved these changes Nov 8, 2017

View reviewed changes

benbuleong approved these changes Nov 8, 2017

View reviewed changes

Lguyogiro approved these changes Nov 8, 2017

View reviewed changes

desilinguist mentioned this pull request Nov 9, 2017

Refactor travis build #387

Merged

desilinguist merged commit dd28d00 into master Nov 9, 2017

desilinguist deleted the decouple-metrics-from-objectives branch November 9, 2017 21:45

desilinguist mentioned this pull request Nov 9, 2017

Decouple tuning objective from evaluation and learning curve metrics #350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple evaluation metrics from tuning objectives #384

Decouple evaluation metrics from tuning objectives #384

desilinguist commented Nov 7, 2017 •

edited

Loading

coveralls commented Nov 7, 2017 •

edited

Loading

aoifecahill Nov 8, 2017

aoifecahill Nov 8, 2017

coveralls commented Nov 8, 2017 •

edited

Loading

desilinguist commented Nov 8, 2017 •

edited

Loading

Decouple evaluation metrics from tuning objectives #384

Decouple evaluation metrics from tuning objectives #384

Conversation

desilinguist commented Nov 7, 2017 • edited Loading

coveralls commented Nov 7, 2017 • edited Loading

aoifecahill Nov 8, 2017

Choose a reason for hiding this comment

aoifecahill Nov 8, 2017

Choose a reason for hiding this comment

coveralls commented Nov 8, 2017 • edited Loading

desilinguist commented Nov 8, 2017 • edited Loading

desilinguist commented Nov 7, 2017 •

edited

Loading

coveralls commented Nov 7, 2017 •

edited

Loading

coveralls commented Nov 8, 2017 •

edited

Loading

desilinguist commented Nov 8, 2017 •

edited

Loading