Upgrade scikit-learn to v0.18.1 #330

desilinguist · 2016-12-09T18:56:41Z

The major change in this version is that most the cross-validation and grid-search stuff has been moved to sklearn.model_selection AND the interface of the cross-validation generators has changed. They are no longer utterable directly but rather you have to initialize and then call the .split() method on them when you want the actual train/test indices.
There are also some other major updates in the underlying algorithms themselves and in make_classification() which required changing the expected values in several of the tests.
I also address When scikit-learn 0.18 comes out, we need to update our F1 metrics in __init__.py #231 by updating the F1-measure metrics in metrics.py and __init__.py.
Finally, there are some minor changes like using ruamel.yaml.safe_load() instead of ruamel.yaml.load() since using the latter is no longer considered safe and raises warnings.

- scikit-learn has changed what the CV generators look like now so we have to adapt to the new interfaces.

- The numbers seem to be a little different and also the new interfaces.

coveralls · 2016-12-09T19:18:38Z

Coverage increased (+0.004%) to 91.476% when pulling fd3144c on feature/upgrade-sklearn-018-1 into 9a31fae on master.

dan-blanchard

Other than some questionable test result changes, this looks good to me. I must admit I haven't been in the scikit-learn nitty gritty for quite some time now though.

dan-blanchard · 2016-12-09T20:28:05Z

tests/test_preprocessing.py

                              not use_scaling else
-                              [0.94930875576036866, 0.93989071038251359])
+                              [0.65217391304347827, 0.70370370370370372])


These are pretty huge drops in f-measure. Should we be concerned about this? I honestly don't know.

Yeah. I was wondering about this myself. The changes in the other tests aren't as drastic but I am not sure what we can do about this. Note also that the numbers look more reasonable to me now both with and without feature hashing/feature scaling and the direction continues to be what we'd expect (hashing/scaling leads to better performance). Perhaps the previous implementation had bugs?

@mheilman any thoughts?

Over the next few days, I am going to try and nail down what caused these differences. From what I can tell right now, It could be due to differences in:

make_classification_data: perhaps the data being generated is actually different now?

SGDClassifier - may be they changed something in the implementation?

FeatureHasher - may be they changed something in the implementation?

It's looking like make_classification() generates different results in 0.18.1 than it used to with 0.17.1 even if we pass in the same random state. I have filed an issue with the scikit-learn folks. Let's see what they say.

Yeah, looks like they changed the underlying function sample_without_replacement() which causes the samples to be different for the same random state. So, I think we can safely say that the difference in the results are because of that.

@dmnapolitano can you finish your review when you have time?

Okay, to double check that this is indeed the cause of the extreme changes, I generated the test data in a conda environment with scikit-learn 0.17.1 and serialized it to disk. Then in a second conda environment with scikit-learn v0.18.1, I loaded this serialized data and ran this specific test. The numbers match. So, the differences are all because of changes to make_classification().

Awesome. Thanks for doing that thorough investigation!

dmnapolitano · 2016-12-09T20:37:54Z

tests/test_preprocessing.py

@@ -255,13 +255,13 @@ def check_scaling_features(use_feature_hashing=False, use_scaling=False):

    # these are the expected values of the f-measures, sorted
    if not use_feature_hashing:
-        expected_fmeasures = ([0.77319587628865982, 0.78640776699029125] if
+        expected_fmeasures = ([0.55276381909547745, 0.55721393034825872] if


So for changes like this, what do you think this means for the reproduce-ability of experiments? And what do we tell people? "It's a new version, deal" or should people expect to be able to reproduce prior experiments?

I'm sure this comes up with literally every skll release, so pardon my ignorance of any existing discussion of this. 😅

desilinguist · 2017-01-12T21:04:41Z

This should NOT affect people's own experiments significantly unless they are creating fake data using `make_classification()`. Yes, some minor changes should be expected with a new version release. If people really want to reproduce older experiments exactly, they should be using pinned conda environments.

…

On Thu, Jan 12, 2017 at 3:39 PM Diane M. Napolitano < ***@***.***> wrote: ***@***.**** approved this pull request. ------------------------------ In tests/test_preprocessing.py <#330 (review)> : > @@ -255,13 +255,13 @@ def check_scaling_features(use_feature_hashing=False, use_scaling=False): # these are the expected values of the f-measures, sorted if not use_feature_hashing: - expected_fmeasures = ([0.77319587628865982, 0.78640776699029125] if + expected_fmeasures = ([0.55276381909547745, 0.55721393034825872] if So for changes like this, what do you think this means for the reproduce-ability of experiments? And what do we tell people? "It's a new version, deal" or should people expect to be able to reproduce prior experiments? I'm sure this comes up with literally every skll release, so pardon my ignorance of any existing discussion of this. 😅 — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#330 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJ3kEMVAVc0UheXRtx9L0tbGBw2hjvVks5rRo-JgaJpZM4LJP1-> .

desilinguist added 8 commits December 9, 2016 09:43

Update learner.py to deal with new CV APIs

3907af6

- scikit-learn has changed what the CV generators look like now so we have to adapt to the new interfaces.

Update tests.

ce9c10b

- The numbers seem to be a little different and also the new interfaces.

Fix f1-scoring metrics

01ff795

More changes to tests.

d4907e2

Update requirements files to use new scikit-learn.

d846b08

Update scikit-learn in travis YAML.

cffa4a0

Replace yaml.load with yaml.safe_load to avoid warnings.

235deff

More yaml.load -> yaml.safe_load.

fd3144c

desilinguist requested review from mheilman, dan-blanchard and dmnapolitano December 9, 2016 18:56

desilinguist self-assigned this Dec 9, 2016

desilinguist added this to the 1.3 milestone Dec 9, 2016

dan-blanchard approved these changes Dec 9, 2016

View reviewed changes

dmnapolitano approved these changes Jan 12, 2017

View reviewed changes

desilinguist merged commit b3228e2 into master Jan 12, 2017

desilinguist deleted the feature/upgrade-sklearn-018-1 branch January 12, 2017 21:26

This was referenced Jan 13, 2017

Upgrade scikit-learn to v0.18.1 #328

Closed

When scikit-learn 0.18 comes out, we need to update our F1 metrics in __init__.py #231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade scikit-learn to v0.18.1 #330

Upgrade scikit-learn to v0.18.1 #330

desilinguist commented Dec 9, 2016 •

edited

Loading

coveralls commented Dec 9, 2016 •

edited

Loading

dan-blanchard left a comment

dan-blanchard Dec 9, 2016

desilinguist Dec 9, 2016 •

edited

Loading

desilinguist Dec 9, 2016

desilinguist Jan 10, 2017

desilinguist Jan 10, 2017

desilinguist Jan 10, 2017

desilinguist Jan 12, 2017

dan-blanchard Jan 12, 2017

dmnapolitano Dec 9, 2016

desilinguist commented Jan 12, 2017 via email •

edited

Loading

Upgrade scikit-learn to v0.18.1 #330

Upgrade scikit-learn to v0.18.1 #330

Conversation

desilinguist commented Dec 9, 2016 • edited Loading

coveralls commented Dec 9, 2016 • edited Loading

dan-blanchard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desilinguist Dec 9, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desilinguist commented Jan 12, 2017 via email • edited Loading

desilinguist commented Dec 9, 2016 •

edited

Loading

coveralls commented Dec 9, 2016 •

edited

Loading

desilinguist Dec 9, 2016 •

edited

Loading

desilinguist commented Jan 12, 2017 via email •

edited

Loading