Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineBase Refactoring #421

Merged
merged 83 commits into from Mar 13, 2020
Merged

PipelineBase Refactoring #421

merged 83 commits into from Mar 13, 2020

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Feb 26, 2020

Implements #414

@@ -36,7 +36,6 @@ def test_init(X_y):
assert isinstance(automl.rankings, pd.DataFrame)
assert isinstance(automl.best_pipeline, PipelineBase)
assert isinstance(automl.best_pipeline.feature_importances, pd.DataFrame)
assert automl.best_pipeline.n_jobs == 4
Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert automl.best_pipeline.parameters.get(n_jobs) == 4 ?

Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I remember why I deleted it. There's no guarantee that the best pipeline would have a n_jobs parameter (in the case of CatBoost and XGBoost). In "test_pipelines.py.test_correct_parameters", I test for correct parameters using a logistic regression pipeline however since n_jobs instead in the parameter dictionary of the components nor exists as a instance variable I can't access it. #474 is supposed to add all the arguments of a component (like n_jobs) into the component's parameter dictionary. Do you think it should rather be done here? I'm more of a fan of doing it else where as this PR is huge as it is.

@@ -54,11 +54,11 @@ def test_describe_component():
enc = OneHotEncoder()
imputer = SimpleImputer("mean")
scaler = StandardScaler()
feature_selection = RFClassifierSelectFromModel(n_estimators=10, number_features=5, percent_features=0.3, threshold=10)
feature_selection = RFClassifierSelectFromModel(n_estimators=10, number_features=5, percent_features=0.3, threshold=-np.inf)
Copy link
Collaborator

@dsherry dsherry Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Contributor Author

@jeremyliweishih jeremyliweishih Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold=10 isn't actually part of the hyperparameters for that component so I changed it for clarity. Sorry I can fix this in a separate PR but changed it since I stumbled upon it.

@jeremyliweishih jeremyliweishih requested a review from dsherry Mar 13, 2020
parameters=parameters)

with pytest.raises(ValueError, match="Error received when instantiating component *."):
TestPipeline(parameters={}, objective='precision')
Copy link
Collaborator

@dsherry dsherry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error in this case is that parameter a wasn't provided with a value in the params dict? If so, perhaps you can add

with pytest.raises(ValueError, match="Error received when instantiating component *."):
    TestPipeline(parameters={'Mock Component': {}}, objective='precision')
TestPipeline(parameters={'Mock Component': {'a': 42}}, objective='precision')

because creation should succeed once a is provided, right?

TestPipeline(parameters={}, objective='precision')


def test_initiate_components():
Copy link
Collaborator

@dsherry dsherry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks that we can't pass a parameter to a component unless the component accepts that parameter, right? This is a constraint I hope we can relax one day :) I'd suggest renaming the test to test_init_components_invalid_parameters

Copy link
Collaborator

@dsherry dsherry left a comment

@jeremyliweishih this looks great! Nice going :) I don't have any more impl comments, just some test suggestions:

  • PipelineBase._validate_problem_types: do we have a test which triggers the ValueError in that code?
  • PipelineBase._instantiate_component: I think we're covered but just confirming that we have tests which trigger both of the component error types which we rethrow
  • AutoBase. _transform_parameters: I'm ok keeping the test you have in order for us to get this merged, but since it's a unit test of a private method, I'd love it if you could file a ticket to update it to use mocking or something. Happy to brainstorm on that too. In general I think we should move away from unit-testing private methods directly. That ok?
  • Custom pipeline definition: I think I saw we're covered on this front too, i.e. check all required fields, check that we can still use custom pipelines, etc.

@jeremyliweishih
Copy link
Contributor Author

jeremyliweishih commented Mar 13, 2020

@dsherry

  • PipelineBase._validate_problem_types: do we have a test which triggers the ValueError in that code?

Yes! It's under test_pipelines.test_problem_types

  • PipelineBase._instantiate_component: I think we're covered but just confirming that we have tests which trigger both of the component error types which we rethrow

Yes! One for not supplying parameters and one for incorrect parameters.

  • AutoBase. _transform_parameters: I'm ok keeping the test you have in order for us to get this merged, but since it's a unit test of a private method, I'd love it if you could file a ticket to update it to use mocking or something. Happy to brainstorm on that too. In general I think we should move away from unit-testing private methods directly. That ok?

Sounds good. I'll file a ticket!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants