-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/ENH: Data processing refactor #88
Conversation
Codecov Report
@@ Coverage Diff @@
## master #88 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 3 5 +2
Lines 526 552 +26
=========================================
+ Hits 526 552 +26
Continue to review full report at Codecov.
|
@stsievert please feel free to wait until I at least try to split up these changes into 2 PRs before reviewing the diff, but it would be helpful to get your input on at least the general approach before spending more time on it. |
I successfully split up the PR, diff is now very reasonable and all tests are passing (failure is due to lack of coverage on new |
Got all of the tests for new errors implemented 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are attributes set inside preprocess_y
? As a user I wouldn't think about setting attributes inside preprocess_y
, I'd only be concerned with preprocessing the targets.
Same goes for the reset
parameter – I don't understand why any preprocessing function needs to receive a reset
parameter. Why does preprocessing need to be concerned about state?
By that measure, I think the interface you had earlier is cleaner (where preprocess_y
also returned a dictionary of meta attributes), and I think I'm coming around the idea of having BaseWrapper.__init__
accept feature_encoder
and target_encoder
keywords.
We need to set those attributes somewhere, do you think it would be better if we returned a dictionary and set them in
For re-use of transformers. X1, y1 = np.array([[1, 2, 3]]).T, np.array([1, 2, 3])
est = KerasClassifier(loss="categorical_crossentropy")
est.fit(X1, y1) # y1 gets passed through OneHotEncoder to match categorical_crossentropy
X2, y2 = X1, np.array([1, 2, 2])
est.partial_fit(X2, y2) # y2 re-uses the same OneHotEncoder instance, giving 3 columns instead of 2 if it used a new instance
I'm not opposed to going back to the returning a dictionary and storing it in
Let's for now work on the interface we have. It should be possible in the future to package it all up into transformers which could be passed as * That said, some of these parameters are checked by other things within the Scikit-Learn ecosystem, but I think if a user is overriding |
Collecting the pending topics from last round of reviews: Adding the
|
The future is less than 12 hours away. I'll make a PR to this branch that that cleans up the code. It simplifies the logic around initialization and only makes internal changes. |
That is great, I look forward to it! |
Yes, that API should be supported. I don't think SciKeras should break the API that TF implemented. |
Thanks for your work over in #93 @stsievert . We should be up to date here now. I think the only pending thing is |
Codecov Report
@@ Coverage Diff @@
## master #88 +/- ##
==========================================
- Coverage 99.80% 99.45% -0.36%
==========================================
Files 3 5 +2
Lines 526 548 +22
==========================================
+ Hits 525 545 +20
- Misses 1 3 +2
Continue to review full report at Codecov.
|
…der; remove lambda from BaseWrapper._initialize
I did a quick once-over on the diff, found a couple of typos and minor fixes. @stsievert , when you have a chance, can you take a look at my responses/fixes to the last couple of comments, and let me know if we can resolve those or if there is more work needed? Thanks! |
Closes #78, Closes #100, Closes #83.
This PR implements a sklearn transformer based interface for data processing that replaces
{pre,post}process_{X,y}
.