Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scikit-learn to 0.17.0 #288

Merged
merged 8 commits into from
Feb 13, 2016
Merged

Conversation

desilinguist
Copy link
Member

The only change that was required was to patch up learner objects from older versions of SKLL (and scikit-learn) that we might load using the from_file() class method.

From `scikit-learn` 0.17.0 onwards, all scalers have the `scale_` attribute instead of the `std_` attribute. So, we need to patch up all old learners we load from files to be compatible with this.
@desilinguist
Copy link
Member Author

I also ran the unit tests with gridmap on python 2.7 and 3.3. Everything passes.

@dan-blanchard
Copy link
Contributor

👍 Looks good to merge to me.

@desilinguist
Copy link
Member Author

I looked through the release notes for scikit-learn 0.17.0 and made any appropriate changes. But please take a look to see if I have missed anything.

@dan-blanchard
Copy link
Contributor

We should verify that we don't pass 1D arrays anywhere:

Passing 1D data arrays as input to estimators is now deprecated as it caused confusion in how the array elements should be interpreted as features or as samples. All data arrays are now expected to be explicitly shaped (n_samples, n_features). By Vighnesh Birodkar.

We should rely on this new attribute instead of doing the subclass check we do now:

Classifier and Regressor models are now tagged as such using the _estimator_type attribute.

I'd have to take a closer look if this is a problem or not for us:

Cross-validation iterators allways provide indices into training and test set, not boolean masks.

Those are the only things that jumped out at me.

We do not need to check `issubclass(..., RegressorMixin)` anymore.
@desilinguist
Copy link
Member Author

Thanks @dan-blanchard!

  1. I don't see any 1D arrays being passed to any .fit() or .transform() methods for estimators.
  2. I replaced all the issubclass(..., RegressorMixin) to check for _estimator_type instead.
  3. We have three cross-validation iterators: LeaveOneLabelOut, StratifiedKFold, and KFold. All of them are used in learner.py and all of them are being used as iterators over indices.

Custom classifiers (regressors) need to inherit from both `BaseEstimator` as well as `ClassifierMixin` (`RegressorMixin`)  in order for `_estimator_type` checks to work. Update documentation to make this explicit.
@dan-blanchard dan-blanchard self-assigned this Feb 13, 2016
dan-blanchard added a commit that referenced this pull request Feb 13, 2016
…-scikit-learn

Update scikit-learn to 0.17.0
@dan-blanchard dan-blanchard merged commit 927c460 into master Feb 13, 2016
@dan-blanchard dan-blanchard deleted the feature/update-scikit-learn branch February 13, 2016 20:20
@AlJohri
Copy link

AlJohri commented Feb 19, 2016

Can you push a minor version upgrade with this change if 1.2 is far off? Looking to try skll in my pipeline. Installing HEAD from git for now. Thanks!

pip install "git+git://github.com/EducationalTestingService/skll.git#egg=skll"

@desilinguist
Copy link
Member Author

@AlJohri actually we are planning to release 1.2 early next week :)

@AlJohri
Copy link

AlJohri commented Feb 19, 2016

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants