-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade scikit-learn to v0.18.1 #330
Conversation
- scikit-learn has changed what the CV generators look like now so we have to adapt to the new interfaces.
- The numbers seem to be a little different and also the new interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than some questionable test result changes, this looks good to me. I must admit I haven't been in the scikit-learn nitty gritty for quite some time now though.
not use_scaling else | ||
[0.94930875576036866, 0.93989071038251359]) | ||
[0.65217391304347827, 0.70370370370370372]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are pretty huge drops in f-measure. Should we be concerned about this? I honestly don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I was wondering about this myself. The changes in the other tests aren't as drastic but I am not sure what we can do about this. Note also that the numbers look more reasonable to me now both with and without feature hashing/feature scaling and the direction continues to be what we'd expect (hashing/scaling leads to better performance). Perhaps the previous implementation had bugs?
@mheilman any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over the next few days, I am going to try and nail down what caused these differences. From what I can tell right now, It could be due to differences in:
make_classification_data
: perhaps the data being generated is actually different now?SGDClassifier
- may be they changed something in the implementation?FeatureHasher
- may be they changed something in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking like make_classification()
generates different results in 0.18.1 than it used to with 0.17.1 even if we pass in the same random state. I have filed an issue with the scikit-learn folks. Let's see what they say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like they changed the underlying function sample_without_replacement()
which causes the samples to be different for the same random state. So, I think we can safely say that the difference in the results are because of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmnapolitano can you finish your review when you have time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, to double check that this is indeed the cause of the extreme changes, I generated the test data in a conda environment with scikit-learn 0.17.1 and serialized it to disk. Then in a second conda environment with scikit-learn v0.18.1, I loaded this serialized data and ran this specific test. The numbers match. So, the differences are all because of changes to make_classification()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thanks for doing that thorough investigation!
@@ -255,13 +255,13 @@ def check_scaling_features(use_feature_hashing=False, use_scaling=False): | |||
|
|||
# these are the expected values of the f-measures, sorted | |||
if not use_feature_hashing: | |||
expected_fmeasures = ([0.77319587628865982, 0.78640776699029125] if | |||
expected_fmeasures = ([0.55276381909547745, 0.55721393034825872] if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for changes like this, what do you think this means for the reproduce-ability of experiments? And what do we tell people? "It's a new version, deal" or should people expect to be able to reproduce prior experiments?
I'm sure this comes up with literally every skll
release, so pardon my ignorance of any existing discussion of this. 😅
This should NOT affect people's own experiments significantly unless they
are creating fake data using `make_classification()`.
Yes, some minor changes should be expected with a new version release. If
people really want to reproduce older experiments exactly, they should be
using pinned conda environments.
…On Thu, Jan 12, 2017 at 3:39 PM Diane M. Napolitano < ***@***.***> wrote:
***@***.**** approved this pull request.
------------------------------
In tests/test_preprocessing.py
<#330 (review)>
:
> @@ -255,13 +255,13 @@ def check_scaling_features(use_feature_hashing=False, use_scaling=False):
# these are the expected values of the f-measures, sorted
if not use_feature_hashing:
- expected_fmeasures = ([0.77319587628865982, 0.78640776699029125] if
+ expected_fmeasures = ([0.55276381909547745, 0.55721393034825872] if
So for changes like this, what do you think this means for the
reproduce-ability of experiments? And what do we tell people? "It's a new
version, deal" or should people expect to be able to reproduce prior
experiments?
I'm sure this comes up with literally every skll release, so pardon my
ignorance of any existing discussion of this. 😅
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#330 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJ3kEMVAVc0UheXRtx9L0tbGBw2hjvVks5rRo-JgaJpZM4LJP1->
.
|
The major change in this version is that most the cross-validation and grid-search stuff has been moved to
sklearn.model_selection
AND the interface of the cross-validation generators has changed. They are no longer utterable directly but rather you have to initialize and then call the.split()
method on them when you want the actual train/test indices.There are also some other major updates in the underlying algorithms themselves and in
make_classification()
which required changing the expected values in several of the tests.I also address When scikit-learn 0.18 comes out, we need to update our F1 metrics in __init__.py #231 by updating the F1-measure metrics in
metrics.py
and__init__.py
.Finally, there are some minor changes like using
ruamel.yaml.safe_load()
instead ofruamel.yaml.load()
since using the latter is no longer considered safe and raises warnings.