Add learning curves #332

desilinguist · 2017-01-23T20:03:21Z

Addresses automatic learning curves #221.
The way that this works is by having a new task type called learning_curve. This essentially ties in to a new learning_curve() method in the Learner class which is adapted from the scikit-learn method sklearn.model_selection.learning_curve(). The reason that I didn't just basically call the scikit-learn method directly is because that method works with estimator objects and raw feature arrays . We want to apply the whole SKLL pipeline (feature selection, transformation, hashing, etc.) that the user has specified when computing the learning curve results and so we need to use the SKLL API.
The process of computing the curve is as follows: Only a training set is required. For each point on the leaning curve, the training set is split into two partitions 80/20. The learner is trained on the subset of the 80% corresponding to the point of the learning curve and then evaluated on the 20% partition. This is repeated multiple times (using multiple different 80/20 partitions) and the results are averaged. This gives us the score for each point in the training curve. The whole process it then repeated for each point on the curve.
I consider the learning curve task to be orthogonal to ablation and finding the right hyper-parameters. Therefore, ablation and grid search are not allowed. Just like for the cross-validation task, no models are saved for this task.
Users can specify the various training set sizes and the number of 80/20 partitions for each point tin the curve (if they don't, there are reasonable defaults for both).
Users can also specify the number of cross-validation iterations to be used for averaging the results for a given training set size.
The output of the learning_curve task is a TSV file containing the training set size and the averaged scores for all combinations of featuresets, learners, and objectives. If pandas and seaborn are available, actual learning curves are generated as PNG files - one for each feature set. Each PNG file contains a faceted plot with objective functions on rows and learners on columns. Here's an example plot.

(Note: since grid search is disallowed, we don't really need to train the learner for each objective separately; we could simply train the learner once and then compute the scores using multiple functions. However, this doesn't fit into the current parallelization scheme that SKLL follows and so I didn't feel like changing that. The training jobs are run in parallel so it's not that big a deal anyway.)

- Needed for learning curve generation.

- instead of a method of `Learner` because it needs to be picklable.

desilinguist · 2017-01-23T20:05:32Z

Reviewers, please actually test this PR on your machines (on multiple datasets, if possible) to make sure things work as expected. Thanks! I have added an example config file for the Titanic example to the repository if you need a place to start and play around.

coveralls · 2017-01-23T20:27:08Z

Coverage decreased (-1.5%) to 89.977% when pulling 8a86262 on feature/add-learning-curves into b3228e2 on master.

- and coverage hopefully goes up.

coveralls · 2017-01-23T21:39:43Z

Coverage decreased (-1.8%) to 89.631% when pulling e7ea809 on feature/add-learning-curves into b3228e2 on master.

ghost · 2017-01-23T22:01:42Z

I ran on a dataset and got this nice plot.

desilinguist · 2017-01-23T22:03:51Z

Thanks @bndgyawali! Did you use the default values for the learning curve CV folds? Do you mind sharing your config file here? Also, do you expect the Lasso curves to look like that?

ghost · 2017-01-23T22:06:29Z

Here it is. I did use all the default values.

[General]
experiment_name=exp_name
task=learning_curve

[Input]
train_directory=path_to_feature_directory
featuresets = [['abc_feature'], ['def_feature']]
learners=['LinearRegression', 'LinearSVR', 'RandomForestRegressor', 'Lasso', 'Ridge', 'ElasticNet']
suffix=.csv
feature_scaling=both

[Tuning]
grid_search=true
objectives=[quadratic_weighted_kappa, linear_weighted_kappa, qwk_off_by_one]

[Output]
log=/tmp/skll_output
results=/tmp/skll_output

desilinguist · 2017-01-23T22:09:04Z

From @bndgyawali, it looks like this works pretty well on other datasets and for larger number of learners and objectives. I am currently trying to get the coverage issue resolved which is a little annoying because we don't want to require matplotlib.

ghost · 2017-01-23T22:10:06Z

You had mentioned that grid_search is disallowed, is it better to show warning message if grid_search=true?

coveralls · 2017-01-24T16:46:20Z

Coverage increased (+0.2%) to 91.725% when pulling 389777b on feature/add-learning-curves into b3228e2 on master.

desilinguist · 2017-01-24T16:59:46Z

Okay @bndgyawali @aloukina @dan-blanchard @benbuleong @cml54 @mheilman @aoifecahill @bwriordan @mulhod @dmnapolitano , this PR can now be considered complete including changes to the documentation. You guys can now start reviewing and testing it on your own datasets to make sure things work as expected.

Don't worry about the coverage decrease. It's only 0.003% compared to the last build and compared to master, it's actually 0.2% higher :)

desilinguist · 2017-01-30T14:49:50Z

@dan-blanchard will you have time to look at this PR this week?

mulhod · 2017-01-30T20:05:40Z

Unfortunately, if the training set sizes are a bit larger, the numbers get squished together on the same line in graph.

Unrelated to this example, I saw that the graph key is only on the first graph if you provide multiple learners and/or multiple objectives. This might be the intention, obviously, but I was thinking that maybe it would be better if it were not part of the first graph and was just at the top/sides/bottom when there are multiple graphs, if possible.

This is the config I used in the example above:

[General]
experiment_name = call_henry_wer_bias
task = learning_curve

[Input]
train_directory = ~/skll_learning_curves/call_henry_wer_bias/features
id_col = id
label_col = y
featuresets = [["length", "bias", "wer"]]
learners = ['RescaledSVR']
ids_to_floats = False
suffix = .jsonlines
feature_scaling = both
fixed_parameters = [{"C": 100.0, "gamma": 0.001}]

[Tuning]
min_feature_count = 1
feature_scaling = none
objectives = ["unweighted_kappa"]

[Output]
log = ~/skll_learning_curves/call_henry_wer_bias/log
results = ~/skll_learning_curves/call_henry_wer_bias/results

desilinguist · 2017-01-30T20:09:50Z

@mulhod thanks for looking at the PR!

Good call about the x-tick labels. I can rotate them so so that they are easily visible.
Yes, the legend is only on the first graph intentionally. I did try creating the legend outside of the plotting area but it never really worked that well for me. I'll look at it again but if it doesn't work, I think putting it in the first plot is a decent compromise.

mulhod · 2017-01-30T20:32:22Z

The system I used to conduct the experiment I reported on above was Linux Ubuntu:

$ uname -a
Linux frigga 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

mulhod

I looked over all the changes and don't have much to say other than the comment I left separately (re: X axis labels). Looks good!

mulhod · 2017-02-01T22:33:41Z

tests/test_output.py

-    # We itereate over each model with an expected
-    # accuracy score. T est proves that the report
+    # We iterate over each model with an expected
+    # accuracy score. Test proves that the report
    # written out at least as a correct format for


"as" is meant to be "has", right?

ghost

It looks good to me. I could not find anything to say.

aloukina · 2017-02-02T21:05:16Z

Some metrics like r2 could be negative when model performance is really bad. The plots seem to set the min of y axis to 0.

aloukina · 2017-02-02T21:54:28Z

If one supplies learning_curve_train_sizes = [1.0, 100.0, 200.0] this raises an error since the numbers are interpreted as floats but are not within (0, 1] range. This is the way this is handled in scikit-learn so we could leave it as is or be user-friendly and add a checking function to convert those to integers before passing them to the learning_curve functions?

aloukina · 2017-02-02T22:00:16Z

Same as Matt's comment above: When the number of different train sizes is reasonably large, x axis gets squished. The plot also shows what happens with negative r2.

desilinguist · 2017-02-02T22:04:29Z

Thanks @aloukina! Very useful comments. Will incorporate.

aloukina

The only problems that need fixing are:

Negative metrics
X axis

The floats vs. int issue is just a suggestion.

aloukina · 2017-02-02T20:23:39Z

skll/config.py

                         .format(grid_objectives))

    # check whether the right things are set for the given task
    if (task == 'evaluate' or task == 'predict') and not test_path:
        raise ValueError('The test set must be set when task is evaluate or '
                         'predict.')
-    if (task == 'cross_validate' or task == 'train') and test_path:


Why should these unused fields trigger a ValueError rather than a Warning?

Because if the user set the test_path, may be they wanted to do evaluate rather than cross_validate or train and forgot to change the task. The SKLL philosophy has always been to make no assumptions and let the user fix the config file lest they run a really long experiment they didn't really want to.

desilinguist · 2017-02-07T20:40:04Z

My latest commits (a) automatically rotate the x-tick labels if any of the sizes is >= 1000 and (b) automatically generate the correct y-limits for the learning curves based on the metrics.

Note that if you use the mean_squared_error metric for learning curve, the learning curve has negative values because scikit-learn essentially turns mean_squared_error into -1 * mean_squared_error internally so that it can be optimized just like any other function where higher is better. I will be submitting a PR that renames mean_squared_error to neg_mean_squared_error soon.

@aloukina and @mulhod can you please re-run your respective tests to make sure that the labels and y-limits are okay?

coveralls · 2017-02-07T21:03:07Z

Coverage increased (+0.004%) to 91.48% when pulling edb9e72 on feature/add-learning-curves into b3228e2 on master.

coveralls · 2017-02-07T22:09:05Z

Coverage increased (+0.3%) to 91.75% when pulling 3d863a6 on feature/add-learning-curves into b3228e2 on master.

desilinguist · 2017-02-07T22:09:50Z

Okay, I fixed the coverage issue. @aloukina and @mulhod, I am now waiting on you to test the changes and if everything looks good, I will merge.

mulhod · 2017-02-07T22:54:44Z

This looks good!

One nitpick: I understand why it's -1 to 1 for kappa on the y-axis, but is there a way to not make the minimum y-axis value -1 if none of the values are below 0?

desilinguist · 2017-02-08T13:29:35Z

Hmm, yeah I think we can do that. Let me see.

- and update its test.

desilinguist · 2017-02-08T18:24:33Z

Okay, I tweaked the plot generation code to hide unnecessary areas of the plot. Here's the new plot of your data, @mulhod. What do you think?

coveralls · 2017-02-08T18:48:42Z

Coverage increased (+0.3%) to 91.801% when pulling 0f5b3c4 on feature/add-learning-curves into b3228e2 on master.

desilinguist added 9 commits January 18, 2017 15:17

Add helper static method to FeatureSet class

4bc4946

- Needed for learning curve generation.

Add learning curve generation to Learner class.

c913deb

Add learning_curve as a possible task.

e909d12

Add learning_curve task.

86e194d

Add tests for learning_curve task.

08f418e

Add learning_curve config file to titanic example.

57f58e1

Fix docstring for test.

d8d680d

Make _train_and_score() a function

995373b

- instead of a method of `Learner` because it needs to be picklable.

Read TSV properly in test.

8a86262

desilinguist self-assigned this Jan 23, 2017

desilinguist requested review from mheilman, dan-blanchard, a user, mulhod, cml54, aoifecahill, benbuleong, bwriordan and aloukina January 23, 2017 20:03

desilinguist added 3 commits January 23, 2017 15:38

Refactor tests so that the plot generation is tested

e3ea329

- and coverage hopefully goes up.

Turn off interactive mode for matplotlib.

b6b1676

More avoiding interactive fixes.

e7ea809

Update README to talk about pandas and seaborn.

c9d8ef5

mulhod approved these changes Feb 1, 2017

View reviewed changes

ghost approved these changes Feb 2, 2017

View reviewed changes

aloukina approved these changes Feb 2, 2017

View reviewed changes

desilinguist added 3 commits February 6, 2017 12:52

Rotate x-axis labels if any sizes have >= 4 digits.

94e0871

Fix typo in comment.

2cae5f4

Compute y-limits for learning curve metrics.

edb9e72

Add test for ylimit utility function.

3d863a6

Improve y-limit computation to hide unneeded areas of graph.

0f5b3c4

- and update its test.

desilinguist merged commit 108b1a1 into master Feb 8, 2017

desilinguist deleted the feature/add-learning-curves branch February 8, 2017 19:07

desilinguist mentioned this pull request Feb 8, 2017

automatic learning curves #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add learning curves #332

Add learning curves #332

desilinguist commented Jan 23, 2017

desilinguist commented Jan 23, 2017 •

edited

Loading

coveralls commented Jan 23, 2017 •

edited

Loading

coveralls commented Jan 23, 2017 •

edited

Loading

ghost commented Jan 23, 2017

desilinguist commented Jan 23, 2017 •

edited

Loading

ghost commented Jan 23, 2017 •

edited by ghost

Loading

desilinguist commented Jan 23, 2017

ghost commented Jan 23, 2017

coveralls commented Jan 24, 2017 •

edited

Loading

desilinguist commented Jan 24, 2017 •

edited

Loading

desilinguist commented Jan 30, 2017

mulhod commented Jan 30, 2017

desilinguist commented Jan 30, 2017 •

edited

Loading

mulhod commented Jan 30, 2017

mulhod left a comment

mulhod Feb 1, 2017

ghost left a comment

aloukina commented Feb 2, 2017

aloukina commented Feb 2, 2017

aloukina commented Feb 2, 2017

desilinguist commented Feb 2, 2017

aloukina left a comment •

edited

Loading

aloukina Feb 2, 2017

desilinguist Feb 6, 2017

desilinguist commented Feb 7, 2017 •

edited

Loading

coveralls commented Feb 7, 2017 •

edited

Loading

coveralls commented Feb 7, 2017 •

edited

Loading

desilinguist commented Feb 7, 2017

mulhod commented Feb 7, 2017 •

edited

Loading

desilinguist commented Feb 8, 2017

desilinguist commented Feb 8, 2017

coveralls commented Feb 8, 2017 •

edited

Loading

Add learning curves #332

Add learning curves #332

Conversation

desilinguist commented Jan 23, 2017

desilinguist commented Jan 23, 2017 • edited Loading

coveralls commented Jan 23, 2017 • edited Loading

coveralls commented Jan 23, 2017 • edited Loading

ghost commented Jan 23, 2017

desilinguist commented Jan 23, 2017 • edited Loading

ghost commented Jan 23, 2017 • edited by ghost Loading

desilinguist commented Jan 23, 2017

ghost commented Jan 23, 2017

coveralls commented Jan 24, 2017 • edited Loading

desilinguist commented Jan 24, 2017 • edited Loading

desilinguist commented Jan 30, 2017

mulhod commented Jan 30, 2017

desilinguist commented Jan 30, 2017 • edited Loading

mulhod commented Jan 30, 2017

mulhod left a comment

Choose a reason for hiding this comment

mulhod Feb 1, 2017

Choose a reason for hiding this comment

ghost left a comment

Choose a reason for hiding this comment

aloukina commented Feb 2, 2017

aloukina commented Feb 2, 2017

aloukina commented Feb 2, 2017

desilinguist commented Feb 2, 2017

aloukina left a comment • edited Loading

Choose a reason for hiding this comment

aloukina Feb 2, 2017

Choose a reason for hiding this comment

desilinguist Feb 6, 2017

Choose a reason for hiding this comment

desilinguist commented Feb 7, 2017 • edited Loading

coveralls commented Feb 7, 2017 • edited Loading

coveralls commented Feb 7, 2017 • edited Loading

desilinguist commented Feb 7, 2017

mulhod commented Feb 7, 2017 • edited Loading

desilinguist commented Feb 8, 2017

desilinguist commented Feb 8, 2017

coveralls commented Feb 8, 2017 • edited Loading

desilinguist commented Jan 23, 2017 •

edited

Loading

coveralls commented Jan 23, 2017 •

edited

Loading

coveralls commented Jan 23, 2017 •

edited

Loading

desilinguist commented Jan 23, 2017 •

edited

Loading

ghost commented Jan 23, 2017 •

edited by ghost

Loading

coveralls commented Jan 24, 2017 •

edited

Loading

desilinguist commented Jan 24, 2017 •

edited

Loading

desilinguist commented Jan 30, 2017 •

edited

Loading

aloukina left a comment •

edited

Loading

desilinguist commented Feb 7, 2017 •

edited

Loading

coveralls commented Feb 7, 2017 •

edited

Loading

coveralls commented Feb 7, 2017 •

edited

Loading

mulhod commented Feb 7, 2017 •

edited

Loading

coveralls commented Feb 8, 2017 •

edited

Loading