New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Replicate Private SKLearn Classes #2328
[FIX] Replicate Private SKLearn Classes #2328
Conversation
- The minimum version required is 0.17 since there is no module `sklearn.exceptions`
- sklearn will remove importing of private classes from v0.24 - we use the methods and need to implement them locally - serves the purpose of making us the proprietor of the code and only dependent on public API of sklearn which doesn't change very frequently
…ollar/vowpal_wabbit into fix_deprecated_imports
Generally this is bad practice, but I'm not sure if there's a better way to solve the problem. It might be worth holding off on this until 0.24 is closer to release to see if any official guidance (or something that doesn't require us to manage the code on our side) comes out |
@peterychang Sounds good to me. We can wait a few weeks then. |
sklearn is 3 clause bsd. I think we'll need to update VW's notice if we merge this. |
Yes, this will have to keep in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change for now, but wanted to have this as a review item.
@thealphadollar We cannot duplicate the internal SKlearn classes to fix the deprecation. Is there direction from SKlearn about how to migrate? Can you come up with another way to solve this issue without duplicating their code. |
@peterychang @jackgerrits I was sick for the past two weeks and hence no update here. I'll use the method suggested by @peterychang and attempt the resolve the issue and update the same PR. |
…into fix_deprecated_imports
- This is required since we cannot copy code directly from SKLearn - This makes it easy for us to use the public API instead of copying everything over here. - `fit` method is removed from TLCM in order to use the `fit` method defined in VW
@jackgerrits @peterychang I've updated the Pull Request with new changes which do not require copying the source from SKLearn. I've used LinearSVC as it already inherits from BE, SCM, and LCM. SCM has two new methods ( UPDATE FOR BELOW QUOTED TEXT: I've reversed the order since I went through all of the methods in both TLCM (including inherited once, in old one) and figured that there was no overlap of method and hence a swap can be made without any issue.
By avoiding to call super init we also make sure that we do not invoke for the |
79f9257
to
aa13167
Compare
- We are inheriting first from VW and then TLCM, `fit` method from VM is used and hence it is not required to remove fit from TLCM.
@peterychang @jackgerrits @lalo Can you please review this? |
It looks ok to me, but I'm not sure if there are any unintended unintended consequences based on how python does inheritance. IIRC inherited function implementations are chosen left-to-right on the superclass list, so changing the order of inheritance on @gramhagen are you aware of any problems with modifying the order of inheritance here? Or with using |
this seems like a good approach to handling the api changes, although LinearSVC seems like an odd choice. Why not use LogisticRegression? |
@gramhagen One outright reason is that @peterychang suggested it and that created a bias towards it. But I also went forward with my own research about the same. As I saw the classes, everything but their number of methods was different. LinearSVC has two methods (which we would need to check for conflict). On the other hand, LogisticRegression has 4 and hence I outrightly chose the former. |
fair enough =). I am overhauling the sklearn_vw module a bit in #2368 though which will collide with these changes? would you mind if we pull in some of these modifications there? |
Sure. Please include it in the mentioned PR. |
Oh sorry, I only suggested LinearSVC because it looked similar to what we needed, I didn't see LogisticRegression when I was searching through the sklearn repo. If LogisticRegression makes more sense, then please use it instead. I'm still really new on the ML side of things so I'm mostly making suggestions and comments based on an engineering standpoint |
@thealphadollar should we close this and coordinate effort on #2368? |
Yes, sure. |
Closing this based on the comments |
As per #2240 , sklearn has deprecated the importing of
LinearClassifierMixin
andSparseCoefMixin
and they are now part of privatelinear_model/_base.py
and will be discontinued for importing from v0.24.This PR addresses the issue by replicating the objects and storing them within VowpalWabbit which gives us full control over the classes and along we use public API of sklearn to implement them which have very dormant rate of breaking changes.
The exhaustive list of changes in the PR as follows:
make sklearn_vw independent of sklearn private classes
set minimum version of sklearn as 0.18
sklearn.exceptions
in v0.17make python files adhere to VW yapf formatting
fixes #2240