-
Notifications
You must be signed in to change notification settings - Fork 92
Add dictionary support for Undersampler base class #2235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2235 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 280 280
Lines 24294 24330 +36
=======================================
+ Hits 24266 24302 +36
Misses 28 28
Continue to review full report at Codecov.
|
| y_dict = y.value_counts().to_dict() | ||
| new_dic = {} | ||
| for k, v in self.sampling_ratio_dict.items(): | ||
| new_dic[k] = max(y_dict[k] - v, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 The doc said that we'd accept ratios as the values of the dictionary but it seems like we're still expecting counts. Is that coming in the part 2 pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! I plan on adding support for converting the ratios to the number of samples in the base_sampler class so both the over and undersamplers have access to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think I understand the plan now!
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great @bchen1116 !
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left nit-picky comment about docstring and how sampling_ratio interacts with sampling_ratio_dict but otherwise 🚢
Part 1 of #2105
This PR adds support for accepting dictionary input in the undersampler base class. I will add support for adding the dictionary support for Undersampler in a separate PR, since I want to handle the dictionary processing/conversion from ratios to number of samples in the sampler base class, which requires this doc to be approved and implemented.