-
Notifications
You must be signed in to change notification settings - Fork 90
Fix k_neighbors issue with SMOTE oversamplers #2375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2375 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 281 281
Lines 24934 24963 +29
=======================================
+ Hits 24837 24866 +29
Misses 97 97
Continue to review full report at Codecov.
|
| n_jobs=-1, | ||
| random_seed=0, | ||
| **kwargs | ||
| **kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this got added by black
…c_2324_kneighbors
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it - it's quick, intuitive solution with a good test. Is there anyway we can communicate what's happening to the user? I'm still not sure how we're using warnings or logging, but I'd prefer, philosophically, for us to let the user know everytime we change the behavior they expect under the covers.
| f"Minority class needs more than 1 sample to use SMOTE!, received {min_counts} sample" | ||
| ) | ||
| if min_counts <= neighbors: | ||
| neighbors = min_counts - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the solution. I just would prefer if the user knew what was happening. Do we have a way that we can communicate to the user, who thinks that k_neighbors_default is one value that it's being reduced due to their input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could introduce a k_neighbors parameter that becomes the value we set it to in the parameters dictionary? This way, users can see the parameter when they look at the param dic, but I don't think our other components currently log any changes they make internally. We don't raise any changes here and here.
Does this sound fine to you @chukarsten?
angela97lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! left some nitpicking :]
evalml/pipelines/components/transformers/samplers/base_sampler.py
Outdated
Show resolved
Hide resolved
| k: v | ||
| for k, v in copy.copy(self.parameters).items() | ||
| if k not in ["sampling_ratio", "sampling_ratio_dict"] | ||
| for k, v in self.parameters.items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we copied before and why its okay that we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was something Karsten pointed out, but since we're making a copied dictionary from the parameters, I don't think it's necessary for us to copy the dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do a lot of copying throughout the code base....I feel like it might be someone (rightfully) scared of overwriting the originals.
fix #2324
Design doc here