Handle new oversampler nullable type incompatibility in X#4068
Handle new oversampler nullable type incompatibility in X#4068
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4068 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 349 349
Lines 37559 37583 +24
=======================================
+ Hits 37441 37465 +24
Misses 118 118
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
dccb304 to
219d78c
Compare
| ), | ||
| ], | ||
| ) | ||
| def test_oversampler_category_dtype_incompatibility( |
There was a problem hiding this comment.
This is actually the same bug as the nullable type incompatibility, and it's why we convert categorical columns to "object" in base_sampler. There is a fix in the works scikit-learn/scikit-learn#25814 (because of the wonderful @thomasjpfan who has been a key contributor to our ability to support nullable types so widely), so I wanted to add this xfailing test to remind us to also remove that logic at the same time.
There was a problem hiding this comment.
@eccabay converting caegory to object was a bug fix of yours from a while back, so I just want to make sure you get a chance to take a look at this test in case I'm missing something here.
|
Note - this doesn't need to get in for the upcoming release |
chukarsten
left a comment
There was a problem hiding this comment.
Seems like a solid approach to me. Thanks for being so thorough in the documentation of the tests and what we need to undo when it succeeds.
38671f8 to
70b7935
Compare
70b7935 to
19e53a2
Compare
closes #3974
This incompatibility only shows up when string categorical columns are present with the nullable pandas dtypes, but I chose not to add a special
_handle_nullable_typesimplementation since it's dataset dependent vs the time series imputer that needed a different handling but that would be applied to any dataset. This is a bit of a gray area, though, so I'm open to changing it.