New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update generate_pipeline_code
to return pipeline instances without generating custom pipeline class
#2227
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2227 +/- ##
=======================================
- Coverage 99.9% 99.9% -0.0%
=======================================
Files 280 280
Lines 24152 24089 -63
=======================================
- Hits 24127 24064 -63
Misses 25 25
Continue to review full report at Codecov.
|
@@ -720,15 +720,15 @@ def _get_baseline_pipeline(self): | |||
if self.problem_type == ProblemTypes.BINARY: | |||
baseline = BinaryClassificationPipeline(component_graph=["Baseline Classifier"], | |||
custom_name="Mode Baseline Binary Classification Pipeline", | |||
custom_hyperparameters={"strategy": ["mode"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the odd dask issue that came up--it wasn't happy with this line. This is incorrect anyways (doesnt specify which component) but funny that this didn't come up in other tests... I think it's because the default is mode/mean anyways so automl just did its thing and ignored this, but now that i'm adding custom hyperparameters to repr, it bugged out 😂
I think it makes sense to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the dask stacktrace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here ya go!
def test_automl_max_iterations(self):
""" Making sure that the max_iterations parameter limits the number of pipelines run. """
X, y = self.X_y_binary
max_iterations = 4
par_automl = AutoMLSearch(X_train=X, y_train=y, problem_type="binary", engine=self.parallel_engine,
max_iterations=max_iterations)
> par_automl.search()
evalml\tests\automl_tests\test_automl_dask.py:57:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evalml\automl\automl_search.py:580: in search
self._add_baseline_pipelines()
evalml\automl\automl_search.py:753: in _add_baseline_pipelines
computation = self._engine.submit_evaluation_job(self.automl_config, baseline, self.X_train, self.y_train)
evalml\automl\engine\dask_engine.py:89: in submit_evaluation_job
logger=logger)
c:\users\runneradmin\miniconda3\envs\curr_py\lib\site-packages\distributed\client.py:1565: in submit
key = funcname(func) + "-" + tokenize(func, kwargs, *args)
c:\users\runneradmin\miniconda3\envs\curr_py\lib\site-packages\dask\base.py:795: in tokenize
return md5(str(tuple(map(normalize_token, args))).encode()).hexdigest()
c:\users\runneradmin\miniconda3\envs\curr_py\lib\site-packages\dask\utils.py:512: in __call__
return meth(arg, *args, **kwargs)
c:\users\runneradmin\miniconda3\envs\curr_py\lib\site-packages\dask\base.py:806: in normalize_dict
return normalize_token(sorted(d.items(), key=str))
evalml\pipelines\pipeline_base.py:528: in __repr__
custom_hyperparameters_repr = ', '.join([f"'{component}':{{{repr_component(hyperparameters)}}}," for component, hyperparameters in self.custom_hyperparameters.items()]) if self.custom_hyperparameters else None
evalml\pipelines\pipeline_base.py:528: in <listcomp>
custom_hyperparameters_repr = ', '.join([f"'{component}':{{{repr_component(hyperparameters)}}}," for component, hyperparameters in self.custom_hyperparameters.items()]) if self.custom_hyperparameters else None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
parameters = ['mode']
def repr_component(parameters):
> return ', '.join([f"'{key}': {safe_repr(value)}" for key, value in parameters.items()])
E AttributeError: 'list' object has no attribute 'items'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I wanted to know how dask ended up calling repr
'\n\nparameters = json.loads("""{\n\t"My Custom Estimator": {\n\t\t"random_arg": false\n\t}\n}""")' \ | ||
'\npipeline = MockAllCustom(parameters)' | ||
pipeline = generate_pipeline_code(mockAllCustom) | ||
mock_pipeline_with_custom_components = BinaryClassificationPipeline([CustomTransformer, CustomEstimator]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted a lot here, I don't think we need to test one with transformer, one with estimator, one with both--just testing one with both.
@@ -591,7 +591,7 @@ class MockRegressionPipeline(RegressionPipeline): | |||
generate_pipeline_code([Imputer(), LogisticRegressionClassifier()]) | |||
|
|||
|
|||
def test_generate_code_pipeline_json_errors(): | |||
def test_generate_code_pipeline_json_with_objects(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer errors :)
With that, just kept one or two tests of different types of objects :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I think there's a bug somewhere? I get a syntax error when I specify a pipeline with more than one key in custom_hyperparameters
:
Just specifying the random forest classifier seems to work:
I also noticed that the skopt imports are not added to the code string which I believe would cause a stacktrace if you ran the code in a different environment.
Did we ever decide what the long-term plan for custom_hyperparameters
is? I thought we had talked about deleting it but I don't see it in the epic (that's been closed?).
@freddyaboulton Aiyah, thank you--I'll take a look at this, and add a test. Good catch! 😅 I think it's okay that the imports aren't added and we require the user to add the external imports themselves. What do you think? Also, long term for custom_hyperparameters is tracked here: #2130 We closed the epic since we figured it was just one-off improvement issues and didn't require an overarching epic :) |
Sounds good @angela97lin ! Thanks for pointing me to #2130 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I like this fix! I do think that it would be preferred by users to have formatted code returned, as this was an issue filed here. Functionality-wise, this looks good to me (beyond the bug @freddyaboulton found).
Let me know what you think!
@freddyaboulton @bchen1116 Fixed! lol turns out I just had an extra floating comma lying around, thanks for bringing it up! Also updated the repr test to make sure, and added a statement in our notebook that external imports are necessary to execute the code :)) Also, formatted code is nice but functionality wise not necessary. Since there's already an issue for it, happy to leave that as is and not address it here 😂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed the reformatting issue here.
Solid tests and changes! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @angela97lin ! Thanks for the fix!
Closes #2160.
This is done by updating our pipelines'
repr
(since its a bit outdated and only contained parameters) and then using that updatedrepr
ingenerate_pipeline_code
:)Also fixes mini bug with baseline pipelines in AutoMLSearch.