Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade scikit-learn to v0.18.1 #330

Merged
merged 8 commits into from
Jan 12, 2017
Merged

Conversation

desilinguist
Copy link
Member

@desilinguist desilinguist commented Dec 9, 2016

  • The major change in this version is that most the cross-validation and grid-search stuff has been moved to sklearn.model_selection AND the interface of the cross-validation generators has changed. They are no longer utterable directly but rather you have to initialize and then call the .split() method on them when you want the actual train/test indices.

  • There are also some other major updates in the underlying algorithms themselves and in make_classification() which required changing the expected values in several of the tests.

  • I also address When scikit-learn 0.18 comes out, we need to update our F1 metrics in __init__.py #231 by updating the F1-measure metrics in metrics.py and __init__.py.

  • Finally, there are some minor changes like using ruamel.yaml.safe_load() instead of ruamel.yaml.load() since using the latter is no longer considered safe and raises warnings.

@coveralls
Copy link

coveralls commented Dec 9, 2016

Coverage Status

Coverage increased (+0.004%) to 91.476% when pulling fd3144c on feature/upgrade-sklearn-018-1 into 9a31fae on master.

@desilinguist desilinguist self-assigned this Dec 9, 2016
@desilinguist desilinguist added this to the 1.3 milestone Dec 9, 2016
Copy link
Contributor

@dan-blanchard dan-blanchard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than some questionable test result changes, this looks good to me. I must admit I haven't been in the scikit-learn nitty gritty for quite some time now though.

not use_scaling else
[0.94930875576036866, 0.93989071038251359])
[0.65217391304347827, 0.70370370370370372])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty huge drops in f-measure. Should we be concerned about this? I honestly don't know.

Copy link
Member Author

@desilinguist desilinguist Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was wondering about this myself. The changes in the other tests aren't as drastic but I am not sure what we can do about this. Note also that the numbers look more reasonable to me now both with and without feature hashing/feature scaling and the direction continues to be what we'd expect (hashing/scaling leads to better performance). Perhaps the previous implementation had bugs?

@mheilman any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over the next few days, I am going to try and nail down what caused these differences. From what I can tell right now, It could be due to differences in:

  • make_classification_data: perhaps the data being generated is actually different now?
  • SGDClassifier - may be they changed something in the implementation?
  • FeatureHasher - may be they changed something in the implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking like make_classification() generates different results in 0.18.1 than it used to with 0.17.1 even if we pass in the same random state. I have filed an issue with the scikit-learn folks. Let's see what they say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like they changed the underlying function sample_without_replacement() which causes the samples to be different for the same random state. So, I think we can safely say that the difference in the results are because of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmnapolitano can you finish your review when you have time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, to double check that this is indeed the cause of the extreme changes, I generated the test data in a conda environment with scikit-learn 0.17.1 and serialized it to disk. Then in a second conda environment with scikit-learn v0.18.1, I loaded this serialized data and ran this specific test. The numbers match. So, the differences are all because of changes to make_classification().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks for doing that thorough investigation!

@@ -255,13 +255,13 @@ def check_scaling_features(use_feature_hashing=False, use_scaling=False):

# these are the expected values of the f-measures, sorted
if not use_feature_hashing:
expected_fmeasures = ([0.77319587628865982, 0.78640776699029125] if
expected_fmeasures = ([0.55276381909547745, 0.55721393034825872] if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for changes like this, what do you think this means for the reproduce-ability of experiments? And what do we tell people? "It's a new version, deal" or should people expect to be able to reproduce prior experiments?

I'm sure this comes up with literally every skll release, so pardon my ignorance of any existing discussion of this. 😅

@desilinguist
Copy link
Member Author

desilinguist commented Jan 12, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants