-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Condense our copy of DictVectorizer
to just the one method we still need.
#374
Conversation
Delete our fork.
This reverts commit f0421aa.
- We need to add `sorted()` because by default when we instantiate the `DictVectorizer` for a `FeatureSet`, we set `sort` to be `False`.
DictVectorizer
DictVectorizer
to just the one method we still need.
As part of reviewing, please also run this version of SKLL on an existing experiment you have access to and make sure that the results don't change. |
1 similar comment
Looks like the code coverage for |
Okay, I have figured out why we lost the two lines of coverage in this branch. Essentially, until this branch came along, our copy of However, in scikit-learn's However, I think if we include non-sparse FeatureSets in the test, we should be able to trigger those lines. That's what I am going to try next. |
D'oh!
@dan-blanchard @aoifecahill thoughts? I am personally leaning towards the second option for this release and then removing the lines in a subsequent release once we are satisfied that nothing weird is happening. |
I like the option of removing redundant lines of code better. What is the "just to be safe" scenario? |
Hmm, so the decrease went from 0.2% to 0.1%. Ugh. Stay tuned :) |
Ah, I think the decreased coverage is basically the result of getting rid of the 4 lines. I compared the coverage HTMLs for So, @aoifecahill @dan-blanchard @bndgyawali this branch is now ready for review. |
@dan-blanchard do you think you will have a chance to look at this? I really want your input since you filed the original issue :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This looks good to me. I double-checked the scikit-learn code to see how they're doing sorting now, and this all makes sense.
I must admit that I haven't touched SKLL since leaving ETS, so as more time goes on, I'm going to probably be less and less useful for reviews.
Thanks @dan-blanchard! I recognize that limitation and so I usually only request reviews from you for cases where I think you can have specific insight that none of us might have. Don't worry, I won't bug you too much :) |
DictVectorizer
additions have been merged into scikit-learn, all we need is just the__eq__()
method. The rest of the code is unnecessary and has been removed.