-
Notifications
You must be signed in to change notification settings - Fork 92
Warn users when pipeline parameters aren't used when initializing pipelines #2564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2564 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 287 287
Lines 26338 26402 +64
=======================================
+ Hits 26304 26368 +64
Misses 34 34
Continue to review full report at Codecov.
|
| self.sampler_balanced_ratio, | ||
| ) | ||
| if self._sampler_name not in parameters: | ||
| if self._sampler_name not in parameters and self._sampler_name is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this to make sure we don't add the sampling ratio to the param dictionary if the sampler_name is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I think so. It wasn't doing anything except adding {None: {sampler_ratio: 0.25}} to the param dictionary, which doesn't do anything but also isn't necessary and could cause problems in the future (like with this PR). It should be fixed now!
| custom_hyperparameters=custom_hyperparameters, | ||
| ) | ||
|
|
||
| def _catch_warnings(self, warning_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows us to catch all warnings thrown during pipeline initialization and only throw the warnings for the component names that appear in all warnings. For instance, if we had pipelines:
p1 = ['Imputer', 'Standard Scaler', 'Logistic Regression Classifier']
p2 = ['Imputer', 'Random Forest Classifier']
params_passed_in = {"Logistic Regression Classifier": {"some value": 2},
{"Undersampler": {"another val": 1}}then we would only want to raise warning for Undersampler since it isn't used in any of the pipelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh when I first read the issue I didn't foresee we would need this in AutoMLSearch but I think it's the right call!
| "pipeline_parameters,set_values", | ||
| [ | ||
| ({"Logistic Regression Classifier": {"penalty": "l1"}}, {}), | ||
| ({"Logistic Regression": {"penalty": "l1"}}, {"Logistic Regression"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should issue a warning since the component name we've defined in the graph is Logistic Regression Classifier rather than Logistic Regression
| ) | ||
|
|
||
| def _catch_warnings(self, warning_list): | ||
| if len(warning_list) == len(self.allowed_pipelines) and len(warning_list) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 I think we can simplify this implementation a bit if we raise our own custom warning that subclasses UserWarning.
class ParameterNotUsedWarning(UserWarning):
def __init__(self, components):
self.components = components
msg = f"Parameters for components {components} will not be used to instantiate the pipeline since they don't appear in the pipeline"
super().__init__(msg)ComponentGraph now becomes:
diff = param_set.difference(set(self.component_instances.keys()))
if len(diff):
warnings.warn(ParameterNotUsedWarning(diff))_catch_warnings now becomes:
def _catch_warnings(self, warning_list):
if len(warning_list) == len(self.allowed_pipelines) and len(warning_list) > 0:
# we find the value(s) that we must throw the warning for
final_message = set([])
for idx, msg in enumerate(warning_list):
if idx == 0:
final_message = final_message.union(msg.message.components)
else:
final_message.intersection(msg.message.components)
warnings.warn(ParameterNotUsedWarning(list(final_message)))I think this is cleaner because it defines an api for accessing the components we want to warn about rather than messing with the string. I worry that'll be harder to maintain if we ever decide to change the message. I think the implementation is easier to understand as well. We can also explicitly filter for our custom warning in AutoMLSearch which seems cleaner too.
Something like:
with warnings.catch_warnings(record=True) as w:
warnings.filterwarnings("always", category=ParameterNotUsedWarning)Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea! Yeah, I can change the impl here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton added the suggestion you raised. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bchen1116 I will take a look!
| """Exception when a particular classification label for the 'positive' class cannot be found in the column index or unique values""" | ||
|
|
||
|
|
||
| class ParameterNotUsedWarning(UserWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows up on the documentation automatically! https://feature-labs-inc-evalml--2564.com.readthedocs.build/en/2564/autoapi/evalml/exceptions/index.html#evalml.exceptions.ParameterNotUsedWarning
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 This looks great to me!
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, Bryan. I am sorry this scope creeded up so much. Thanks for being flexible and talking me through it.
| for estimator in allowed_estimators | ||
| ] | ||
| self._catch_warnings(w) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this addition is so confusing to me. This else branch is intended for if self.allowed_component_graphs is None, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!

fix #1839
I opted to use warnings rather than raise errors since pipeline initialization occurs during AutoMLSearch as well. In the case where users pass in a valid
pipeline_param, we don't want to error it out since we pass the entirety ofpipeline_parametersto every pipeline we initialize.It would also be too much logic to include in
AutoMLSearchto filter out only the appropriate components we need from pipeline parameters into the pipelines we are initializing. (ie determining which components are necessary in which pipelines beforemake_pipelinesso we can trim and pass the appropriate pipeline parameters)In this scenario, we don't want to raise an error since that would prevent
AutoMLSearchfrom happening.However, raising a warning allows the search to continue while also giving the user feedback on whether or not the component parameters are being used.