-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000
Comments
The error in So we could raise early an error for this strategy. However, I can also see some other strategy leading to having a single class present when fitting the calibrator. I assume that it should be safer to raise an error in this case as well otherwise we get a ill-fitted calibrator anyway. ping @lucyleeow @ogrisel that might have more insight on this part of the calibrator and to know their opinions. |
I think i agree on both accounts but did not check the details in the code yet. |
Isn't it the case that
Yeah, exactly. There are lots of ways to end up with poorly-fit calibrators, and I'm not sure the code's current check (even when it does apply) really covers that. |
LeaveOneOut does not have different groups like k-folds cv (https://www.cs.cmu.edu/~schneide/tut5/node42.html). More accurately, it sets each sample as a 'fold.' It trains on (n-1) training data at a time (where train data size = n) making it computationally expensive but very reliable. K-folds, on the other hand, divides the training data into k groups and trains the model k times, leaving one group at a time. Perhaps this clarification was not the main issue, but I thought it might be helpful :) |
Describe the bug
Calling
CalibratedClassifierCV()
withcv=KFold(n_samples=n)
(where n is the number of samples) can give different results than usingcv=LeaveOneOut()
, but the docs forLeaveOneOut()
say these should be equivalent.In particular, the
KFold
class has an"n_splits"
attribute, which means this branch runs when setting up sigmoid calibration, and then this error can be thrown. WithLeaveOneOut()
,n_folds
is set toNone
and that error is never hit.I'm not sure whether that error is correct/desirable in every case (see the code to reproduce for my use case where I think(?) the error may be unnecessary) but, either way, the two different
cv
values seem like they should behave equivalently.Steps/Code to Reproduce
Expected Results
pipeline
andpipeline2
should function identically. Instead,pipeline.fit()
succeeds andpipeline2.fit()
throws.Actual Results
Versions
The text was updated successfully, but these errors were encountered: