New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dictionary support for oversamplers #2288
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2288 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 280 280
Lines 24360 24490 +130
=======================================
+ Hits 24333 24463 +130
Misses 27 27
Continue to review full report at Codecov.
|
parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio} | ||
self._frozen_pipeline_parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio} | ||
elif self._sampler_name in parameters: | ||
parameters[self._sampler_name].update({"sampling_ratio": self.sampler_balanced_ratio}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to default to the sampler_balanced_ratio
value that was provided for AutoMLSearch to set the sampling_ratio
, which means it overrides the sampling_ratio
that a user might input to the pipeline_params via
pipeline_parameters = {sampler : {"sampling_ratio": 0.5}}
Updated the docstring above to mention this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I don't think you need an elif
here, it can just be an else
since I don't think we'd have another logic path but total nitpick.
Also I like the way you set the logic for the elif
block. You could do the same to the if
block and move setting the self._frozen_pipeline_parameters
after:
if self._sampler_name not in parameters:
parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio}
else:
parameters[self._sampler_name].update({"sampling_ratio": self.sampler_balanced_ratio})
self._frozen_pipeline_parameters[self._sampler_name] = parameters[self._sampler_name]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Bryan, this is good work. I think you and I discussed a bunch of things in slack parallel to this and I like the addition of docs and the test for the string keys. I still personally think the binary case with the key/val in the dict for the majority class in the oversampler case or the minority class in the undersampler case is a little confusing that they don't do anything. Personally, I just struggle with the dictionary representation, but I think that's a personal problem X-D.
@@ -748,3 +748,87 @@ def test_automl_search_sampler_method(sampler_method, categorical_features, prob | |||
sampler_method = 'Undersampler' | |||
assert 'Could not import imblearn.over_sampling' in caplog.text | |||
assert all(any(sampler_method in comp.name for comp in pipeline.component_graph) for pipeline in pipelines) | |||
|
|||
|
|||
@pytest.mark.parametrize("sampling_ratio", [0.1, 0.2, 0.5, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I like this test. Very readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job with this, excellent test coverage
parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio} | ||
self._frozen_pipeline_parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio} | ||
elif self._sampler_name in parameters: | ||
parameters[self._sampler_name].update({"sampling_ratio": self.sampler_balanced_ratio}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I don't think you need an elif
here, it can just be an else
since I don't think we'd have another logic path but total nitpick.
Also I like the way you set the logic for the elif
block. You could do the same to the if
block and move setting the self._frozen_pipeline_parameters
after:
if self._sampler_name not in parameters:
parameters[self._sampler_name] = {"sampling_ratio": self.sampler_balanced_ratio}
else:
parameters[self._sampler_name].update({"sampling_ratio": self.sampler_balanced_ratio})
self._frozen_pipeline_parameters[self._sampler_name] = parameters[self._sampler_name]
fix #2142