Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix k_neighbors issue with SMOTE oversamplers #2375

Merged
merged 20 commits into from Jun 16, 2021
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jun 11, 2021

fix #2324

Design doc here

@bchen1116 bchen1116 self-assigned this Jun 11, 2021
@codecov
Copy link

codecov bot commented Jun 11, 2021

Codecov Report

Merging #2375 (d2f606e) into main (af00017) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2375     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        281     281             
  Lines      24934   24963     +29     
=======================================
+ Hits       24837   24866     +29     
  Misses        97      97             
Impacted Files Coverage Δ
...s/components/transformers/samplers/oversamplers.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_components.py 100.0% <ø> (ø)
...s/components/transformers/samplers/base_sampler.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_oversamplers.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af00017...d2f606e. Read the comment docs.

n_jobs=-1,
random_seed=0,
**kwargs
**kwargs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this got added by black

@bchen1116 bchen1116 marked this pull request as ready for review June 15, 2021 14:28
Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it - it's quick, intuitive solution with a good test. Is there anyway we can communicate what's happening to the user? I'm still not sure how we're using warnings or logging, but I'd prefer, philosophically, for us to let the user know everytime we change the behavior they expect under the covers.

f"Minority class needs more than 1 sample to use SMOTE!, received {min_counts} sample"
)
if min_counts <= neighbors:
neighbors = min_counts - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the solution. I just would prefer if the user knew what was happening. Do we have a way that we can communicate to the user, who thinks that k_neighbors_default is one value that it's being reduced due to their input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could introduce a k_neighbors parameter that becomes the value we set it to in the parameters dictionary? This way, users can see the parameter when they look at the param dic, but I don't think our other components currently log any changes they make internally. We don't raise any changes here and here.

Does this sound fine to you @chukarsten?

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! left some nitpicking :]

sampler_params = {
k: v
for k, v in copy.copy(self.parameters).items()
if k not in ["sampling_ratio", "sampling_ratio_dict"]
for k, v in self.parameters.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we copied before and why its okay that we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something Karsten pointed out, but since we're making a copied dictionary from the parameters, I don't think it's necessary for us to copy the dictionary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a lot of copying throughout the code base....I feel like it might be someone (rightfully) scared of overwriting the originals.

evalml/tests/component_tests/test_oversamplers.py Outdated Show resolved Hide resolved
@bchen1116 bchen1116 merged commit a53c7d2 into main Jun 16, 2021
@chukarsten chukarsten mentioned this pull request Jun 22, 2021
@freddyaboulton freddyaboulton deleted the bc_2324_kneighbors branch May 13, 2022 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix SMOTE Oversampler k_neighbors arg
3 participants