Add new learners (mostly regressors) #377

desilinguist · 2017-10-25T21:44:09Z

Add the following new regressors:
- BayesianRidge
- DummyRegressor
- HuberRegressors
- Lars
- MLPRegressor
- RANSACRegressor
- TheilSenRegressor
Add the following new classifiers:
- MLPClassifier
- RidgeClassifier
Add default parameter grids where appropriate for these new learners. Blank default parameter grids for learners that do not have any hyper-parameters that make sense to tune by default and the decision is better left to the user.
Update documentation.
- Add documentation for new learners.
- Fix incomplete documentation for previously existing learners.
- Add a note about empty default parameter grids.
Update tests.
- Add tests for new learners.
- Refactor some old tests for consistency and ease of debugging.

- Add imports, default parameter grids, and some keyword argument fixing.

- Also update fixed parameters and param search grids to include already missing learners.

- This way it's easier to tell which specific sub-test failed.

- There are too many parameters and it slows down grid search too much to specify all possible combinations.

coveralls · 2017-10-25T22:12:09Z

Coverage increased (+0.07%) to 91.994% when pulling dfca715 on feature/add-new-regressors into efa0a5b on master.

Lguyogiro · 2017-10-26T00:40:18Z

skll/learner.py

                        KNeighborsClassifier:
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        KNeighborsRegressor:
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
+                        MLPClassifier:
+                        [{}],


I know searching over hidden_layer_sizes takes too long to search over by default, but what about the other parameters like activation, solver, learning_rate, etc?

I didn't include the other parameters either because:

(a) Many of the hyper-parameters are conditionally dependent on each other. For example, learning_rate is only used when solver = sgd, learning_rate_init is only used when solver = sgd or adam, and early_stopping is only used when solver = sgd. So, it's very different from other learners in that sense and there isn't an easy way to put together a product over all the parameters in the grid that will all be valid together.

(b) Many of the hyper-parameters shouldn't just be blindly iterated over. For example, the lbfgs solver is not great when you have a lot of data but when you have a small amount of data, it's better but still quite slow. Whether or not to use early stopping interacts with whether you have sufficient data in the first place.

IMO, these decisions about which sets of hyper-parameters should be searched together depends on the experiment and the data and are best left to the user. However, if you have a suggestion for a default parameter grid that makes sense (without a lot of extra logic to deal with conditional dependence) then I am happy to try it out.

Hmm, an option could be to have sgd solver and a single hidden layer be the default for any MLP and then have a grid iterate over activation, learning_rate, and learning_rate_init. In that case, if you want to do something different, you will have to change the solver via fixed_parameters and then also specify your own set of parameters to search via param_grids. Thoughts? @Lguyogiro @aoifecahill @mulhod ?

Lguyogiro · 2017-10-26T00:46:57Z

tests/test_classification.py

+
+    # train an AdaBoostRegressor on the training data and evalute on the
+    # testing data
+    learner = Learner('MLPClassifier', model_kwargs={'solver': 'lbfgs'})


Should this comment say "train an MLPClassifier on the training data and evaluate on the testing data" ?

Lguyogiro

Looks good to me. I'm looking forward to trying out some of these new learners.

- And also some default fixed parameters that seem a little more reasonable.

- Also filter out convergence warnings since we don't want those to clutter the test logs.

coveralls · 2017-10-26T18:04:16Z

Coverage increased (+0.07%) to 92.003% when pulling 5102a96 on feature/add-new-regressors into efa0a5b on master.

desilinguist · 2017-10-26T18:17:53Z

@aoifecahill @mulhod this is now ready for review. @Lguyogiro you can probably take another look since I added the MLP grid this time around.

aoifecahill · 2017-10-30T13:55:28Z

skll/learner.py

 from sklearn.linear_model.base import LinearModel
 from sklearn.metrics import (accuracy_score,
                             confusion_matrix,
-                             precision_recall_fscore_support,
-                             SCORERS)


do we not need this any more?

It was imported but never used in that file so I removed it.

Ah, makes sense, thanks!

desilinguist added 22 commits October 24, 2017 11:04

Add new regressors

6b5b2de

- Add imports, default parameter grids, and some keyword argument fixing.

Add tests for new regressors.

29072a0

Merge branch 'master' into feature/add-new-regressors

f40c5cb

Update documentation for new regressors [ci skip]

dc798b0

- Also update fixed parameters and param search grids to include already missing learners.

Add a test for RidgeClassifier.

0bc9fe2

Update documentation.

b2a27e8

Add DummyRegressor.

7a05d7c

Sort list of learners for easier reading.

f23f4f5

Add test for DummyRegressor.

0c1da05

Refactor test for DummyClassifier

45ecea7

- This way it's easier to tell which specific sub-test failed.

Add DummyRegressor to documentation.

346bfee

Include DummyClassifier in fixed params list.

f9eb148

Fix omissions and bugs in note about class weights.

c96f65f

Make test consistent with classification version.

35f2f4d

Add MLPClassifier and MLPRegressor

6770471

Add test for MLPRegressor

bdccd20

Fix typo in comment.

7ad5844

Add test for MLPClassifier

15c621c

New test for MLPClassifier [ci skip]

e52b409

Remove parameter grids for MLP learners

cf60dd8

- There are too many parameters and it slows down grid search too much to specify all possible combinations.

Remove param grids from MLP from documentation.

436bd37

Add a note in param grids section about learners.

dfca715

desilinguist self-assigned this Oct 25, 2017

desilinguist requested review from aoifecahill, mulhod and Lguyogiro October 25, 2017 21:44

desilinguist added this to the 1.5 milestone Oct 25, 2017

Lguyogiro reviewed Oct 26, 2017

View reviewed changes

Lguyogiro approved these changes Oct 26, 2017

View reviewed changes

Fix typo in comment [ci skip]

39ed28c

desilinguist mentioned this pull request Oct 26, 2017

Handle case if default parameter grid is empty and no parameter grids are supplied #376

Closed

desilinguist added 6 commits October 26, 2017 11:52

Add param grid for MLP learners.

0a9eb53

- And also some default fixed parameters that seem a little more reasonable.

Update MLP tests to use grid search.

51baea1

- Also filter out convergence warnings since we don't want those to clutter the test logs.

Revert accidental change in test.

9c6c78d

Update expected value because of MLP grid addition.

d6e5888

Remove unneeded imports.

cfa718a

Fix test for MLP.

5102a96

EducationalTestingService deleted a comment from coveralls Oct 26, 2017

mulhod approved these changes Oct 26, 2017

View reviewed changes

Lguyogiro approved these changes Oct 26, 2017

View reviewed changes

desilinguist mentioned this pull request Oct 30, 2017

Do not do grid search if no param grids are provided #378

Merged

aoifecahill reviewed Oct 30, 2017

View reviewed changes

aoifecahill approved these changes Oct 30, 2017

View reviewed changes

desilinguist merged commit f2064eb into master Oct 30, 2017

desilinguist deleted the feature/add-new-regressors branch October 30, 2017 14:04

This was referenced Oct 30, 2017

Add support for Multi-layer Perceptron regressor and classifier #375

Closed

Make new regressors available #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new learners (mostly regressors) #377

Add new learners (mostly regressors) #377

desilinguist commented Oct 25, 2017 •

edited

Loading

coveralls commented Oct 25, 2017 •

edited

Loading

Lguyogiro Oct 26, 2017 •

edited

Loading

desilinguist Oct 26, 2017 •

edited

Loading

desilinguist Oct 26, 2017

Lguyogiro Oct 26, 2017

Lguyogiro left a comment

coveralls commented Oct 26, 2017 •

edited

Loading

desilinguist commented Oct 26, 2017

aoifecahill Oct 30, 2017

desilinguist Oct 30, 2017

aoifecahill Oct 30, 2017

Add new learners (mostly regressors) #377

Add new learners (mostly regressors) #377

Conversation

desilinguist commented Oct 25, 2017 • edited Loading

coveralls commented Oct 25, 2017 • edited Loading

Lguyogiro Oct 26, 2017 • edited Loading

Choose a reason for hiding this comment

desilinguist Oct 26, 2017 • edited Loading

Choose a reason for hiding this comment

desilinguist Oct 26, 2017

Choose a reason for hiding this comment

Lguyogiro Oct 26, 2017

Choose a reason for hiding this comment

Lguyogiro left a comment

Choose a reason for hiding this comment

coveralls commented Oct 26, 2017 • edited Loading

desilinguist commented Oct 26, 2017

aoifecahill Oct 30, 2017

Choose a reason for hiding this comment

desilinguist Oct 30, 2017

Choose a reason for hiding this comment

aoifecahill Oct 30, 2017

Choose a reason for hiding this comment

desilinguist commented Oct 25, 2017 •

edited

Loading

coveralls commented Oct 25, 2017 •

edited

Loading

Lguyogiro Oct 26, 2017 •

edited

Loading

desilinguist Oct 26, 2017 •

edited

Loading

coveralls commented Oct 26, 2017 •

edited

Loading