New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pipeline API to accept component graph and other class attributes as parameters #2091
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I left a couple questions. Will review tests shortly and then approve!
parameters = copy.copy(self.pipeline_parameters) | ||
if self.problem_configuration: | ||
parameters.update({'pipeline': self.problem_configuration}) | ||
self._frozen_pipeline_parameters.update({'pipeline': self.problem_configuration}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin why is this necessary? IterativeAlgorithm._transform_parameters
currently handles adding pipeline-level parameters. Unless I'm missing some details I think we should either use that approach or this approach, but not both. I'm not saying the solution in _transform_parameters
is the best, haha, but I think it does the same thing as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the logic right now in main
:
We pass the pipeline parameters, noticeably without the drop-columns parameters, to make_pipelines
. This means that there are no custom hyperparameters set for the drop-columns component. Then, we use IterativeAlgo to handle all the wrangling of this specific case. In main, in we move the creation of allowed_pipelines
from L297 to L302, we would error out:
ValueError: Default parameters for components in pipeline Decision Tree Classifier w/ Imputer + Drop Columns Transformer not in the hyperparameter ranges: Point (['most_frequent', 'mean', ['index_col'], 'gini', 'auto', 6]) is not within the bounds of the space ([('most_frequent',), ('mean', 'median', 'most_frequent'), ('index_col',), ('gini', 'entropy'), ('auto', 'sqrt', 'log2'), (4, 10)]).
(https://github.com/alteryx/evalml/blob/main/evalml/automl/automl_search.py).
Here, I had passed in parameters
(rather than a copy) to the custom hyperparameters, which caused the same error. It's still a little odd to me that this logic is dependent on us not passing extra information for the drop columns transformer here and then passing that information to the IterativeAlgorithm to handle later, and I was hoping to ease some of that, because if we have more components that take in lists as parameters, we'll have to hardcode more code... but also okay with not doing this for now if it convolutes the logic even further!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Actually, running some more tests, I think this might still be a good idea. Before, allowed_pipelines
created classes. Now, we create instances. For time-series problems, we need to pass in the parameters. These parameters are found via pipeline_parameters
. That's why this block of code was moved up, so that we can pass in the appropriate parameters to the time series pipeline instances. However, this logic that we currently have, pipeline_parameters
also potentially contains smaller custom hyperparameter spaces, which will error out as parameters.
I noticed this when test_automl_pipeline_params_kwargs
failed 😓 Open to filing an issue to somehow clean this up, or maybe this is the cleanup we need to be aware of as we restructure and rethink IterativeAlgorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, understood. If this looks good to you, I am on board. Yep, I hope we can revisit this after this PR and find a good way to clarify how we pass parameters to the automl algo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll take another pass at this before merging, and filed #2187 to revisit this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -173,6 +174,9 @@ def test_fast_permutation_importance_matches_sklearn_output(mock_supports_fast_i | |||
class PipelineWithDimReduction(BinaryClassificationPipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
For #2184 I'd suggest we set the custom_name
field!
Closes #1956
Would also close #1984 and close #652 :)
Pros:
Cons:
gap
andmax_delay
need to be specified for input pipelines.General notes:
allowed_pipeline
now takes in pipeline instances._pipelines_searched
on the AutoMLSearch object. Can't add to_results
because we deepcopy results often, and deepcopying LightGBM models causes a lot of printout 😬frozen_parameters
to IterativeAlgorithm which represents parameters that should not be changed throughout AutoMLSearch.Defining a pipeline class:
Without defining a new pipeline class:
Pickling:
Saving in one notebook:

Loading in another notebook:

To discuss:
pipeline_params
passed in ascustom_hyperparameters
but also used to instantiate what parameters the first pipeline should use.generate_pipeline_code
won't work well with overridden__init__
; this is true even in main now, but overriding init makes a lot more sense for custom pipeline classes now. But do we still need something likegenerate_pipeline_code
?make_pipeline_from_components
is really no longer necessary, but will keep for now, can file something else to remove.Post-discussion with @freddyaboulton @chukarsten @dsherry:
allowed_pipelines
when we don't use the parameters. We only usecomponent_graph
,name
/custom_name
, andcustom_hyperparameters
. Perhaps a better API would be to pass in component graphs to AutoMLSearch instead:This would make it more clear that we are not using the parameters from the pipeline instance passed in
custom_hyperparameters
for pipelines in AutoML: via the pipeline class'custom_hyperparameters
and viapipeline_params
. We will need to revisit whether it makes sense for the custom hyperparameter ranges to be stored on the pipeline. It makes sense for each component to be aware of the hyperparameter ranges.