Skip to content

Dictionary support for Undersampler #2271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 18, 2021
Merged

Dictionary support for Undersampler #2271

merged 23 commits into from
May 18, 2021

Conversation

bchen1116
Copy link
Contributor

fix #2105

@bchen1116 bchen1116 self-assigned this May 13, 2021
@codecov
Copy link

codecov bot commented May 14, 2021

Codecov Report

Merging #2271 (f7404a6) into main (59e9dd0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #2271     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         280      280             
  Lines       24336    24392     +56     
=========================================
+ Hits        24314    24370     +56     
  Misses         22       22             
Impacted Files Coverage Δ
...s/components/transformers/samplers/base_sampler.py 100.0% <100.0%> (ø)
...s/components/transformers/samplers/undersampler.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_undersampler.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59e9dd0...f7404a6. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review May 14, 2021 18:20
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good work, Bryan. I think I need some chat with what this means to have a ratio for both the majority and minority class. That's a little confusing to me now that I'm looking at it. I think we also need to drop an update to the docstring and figure out what _dictionary_to_params() should be doing without 'sampling_ratio_dict' defined in the params.

@bchen1116 bchen1116 requested a review from chukarsten May 18, 2021 13:50
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work! Ship it!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This looks good to me!

@@ -59,6 +59,52 @@ def transform(self, X, y=None):
y = infer_feature_types(y)
return X, None

def _convert_dictionary(self, sampling_dict, y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method live in the base class? It would be different for undersamplers vs oversampler right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan on doing an if-statement in the base class to handle the differences between over and undersamplers! Should be a one-line difference

@bchen1116 bchen1116 merged commit c47a6a1 into main May 18, 2021
@chukarsten chukarsten mentioned this pull request May 24, 2021
@chukarsten chukarsten mentioned this pull request Jun 2, 2021
@freddyaboulton freddyaboulton deleted the bc_2105_undersampler branch May 13, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undersampler accept dictionary as sampling argument
3 participants