-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PipelineBase Refactoring #421
Conversation
…14_pipelinebase
@@ -36,7 +36,6 @@ def test_init(X_y): | |||
assert isinstance(automl.rankings, pd.DataFrame) | |||
assert isinstance(automl.best_pipeline, PipelineBase) | |||
assert isinstance(automl.best_pipeline.feature_importances, pd.DataFrame) | |||
assert automl.best_pipeline.n_jobs == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert automl.best_pipeline.parameters.get(n_jobs) == 4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember why I deleted it. There's no guarantee that the best pipeline would have a n_jobs parameter (in the case of CatBoost and XGBoost). In "test_pipelines.py.test_correct_parameters", I test for correct parameters using a logistic regression pipeline however since n_jobs
instead in the parameter dictionary of the components nor exists as a instance variable I can't access it. #474 is supposed to add all the arguments of a component (like n_jobs) into the component's parameter dictionary. Do you think it should rather be done here? I'm more of a fan of doing it else where as this PR is huge as it is.
@@ -54,11 +54,11 @@ def test_describe_component(): | |||
enc = OneHotEncoder() | |||
imputer = SimpleImputer("mean") | |||
scaler = StandardScaler() | |||
feature_selection = RFClassifierSelectFromModel(n_estimators=10, number_features=5, percent_features=0.3, threshold=10) | |||
feature_selection = RFClassifierSelectFromModel(n_estimators=10, number_features=5, percent_features=0.3, threshold=-np.inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threshold=10
isn't actually part of the hyperparameters for that component so I changed it for clarity. Sorry I can fix this in a separate PR but changed it since I stumbled upon it.
…14_pipelinebase
parameters=parameters) | ||
|
||
with pytest.raises(ValueError, match="Error received when instantiating component *."): | ||
TestPipeline(parameters={}, objective='precision') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error in this case is that parameter a
wasn't provided with a value in the params dict? If so, perhaps you can add
with pytest.raises(ValueError, match="Error received when instantiating component *."):
TestPipeline(parameters={'Mock Component': {}}, objective='precision')
TestPipeline(parameters={'Mock Component': {'a': 42}}, objective='precision')
because creation should succeed once a
is provided, right?
TestPipeline(parameters={}, objective='precision') | ||
|
||
|
||
def test_initiate_components(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks that we can't pass a parameter to a component unless the component accepts that parameter, right? This is a constraint I hope we can relax one day :) I'd suggest renaming the test to test_init_components_invalid_parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih this looks great! Nice going :) I don't have any more impl comments, just some test suggestions:
PipelineBase._validate_problem_types
: do we have a test which triggers theValueError
in that code?PipelineBase._instantiate_component
: I think we're covered but just confirming that we have tests which trigger both of the component error types which we rethrowAutoBase. _transform_parameters
: I'm ok keeping the test you have in order for us to get this merged, but since it's a unit test of a private method, I'd love it if you could file a ticket to update it to use mocking or something. Happy to brainstorm on that too. In general I think we should move away from unit-testing private methods directly. That ok?- Custom pipeline definition: I think I saw we're covered on this front too, i.e. check all required fields, check that we can still use custom pipelines, etc.
Yes! It's under
Yes! One for not supplying parameters and one for incorrect parameters.
Sounds good. I'll file a ticket! |
Implements #414