automatic learning curves #221

mheilman · 2014-12-30T19:44:25Z

It would be nice to have functionality to automatically run experiments with different size training sets in order to plot performance as a function of training sample size.

Perhaps this could be a separate experiment type (e.g., like "evaluate"). This probably does not make sense for cross-validation.

Perhaps there should be an option not to save all the models, etc., since there could be hundreds.

Possible configuration options:

a list of sample sizes to consider (the default could be powers of 2 starting at 32 or 64)
the number of replications per sample size (this could default to 1, but higher values would produce smoother learning curves)

(This was also briefly mentioned in the discussion of #212.)

desilinguist · 2016-02-23T01:04:09Z

This should use the learning curves feature from scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.learning_curve.learning_curve.html

desilinguist · 2016-05-13T15:35:40Z

@langep, has there been any progress on this issue?

desilinguist · 2017-01-13T15:08:39Z

I'd like to include this in the upcoming v1.3.

Here's how I am thinking about this:

Have a new experiment type called learning_curves which will compute the learning curve using scikit-learn's built-in method and save the output as a CSV in the results directory. It will also save the actual learning curve plot as a PNG in the results directory if matplotlib is available.
The models for the various training sizes will not be saved.
Users will be able to specify the various training sizes and the number of cross-validation iterations to be used for averaging.
Since we generally want to do at least 10 folds of cross-validation to get a smooth learning curve, grid search will not be allowed within each fold since that would make it too slow.

Since this will be a new feature, I'd like to solicit input from all of you: @dan-blanchard @dmnapolitano @mheilman @bndgyawali @mulhod @benbuleong @aoifecahill @aloukina @cml54 @bwriordan.

Thanks!

desilinguist · 2017-02-08T19:28:51Z

addressed by #332.

mheilman added enhancement low-priority labels Dec 30, 2014

desilinguist added this to the 1.2 milestone Jul 18, 2015

aoifecahill modified the milestones: 2.0, 1.2 Feb 12, 2016

aoifecahill assigned langep Feb 12, 2016

desilinguist self-assigned this Dec 2, 2016

desilinguist modified the milestones: 1.3, 2.0 Dec 2, 2016

desilinguist unassigned langep Dec 2, 2016

desilinguist mentioned this issue Jan 23, 2017

Add learning curves #332

Merged

desilinguist closed this as completed Feb 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic learning curves #221

automatic learning curves #221

mheilman commented Dec 30, 2014

desilinguist commented Feb 23, 2016

desilinguist commented May 13, 2016

desilinguist commented Jan 13, 2017 •

edited

Loading

desilinguist commented Feb 8, 2017

automatic learning curves #221

automatic learning curves #221

Comments

mheilman commented Dec 30, 2014

desilinguist commented Feb 23, 2016

desilinguist commented May 13, 2016

desilinguist commented Jan 13, 2017 • edited Loading

desilinguist commented Feb 8, 2017

desilinguist commented Jan 13, 2017 •

edited

Loading