-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tests to use new pipeline API instead of defining custom pipeline classes #3172
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3172 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 326 326
Lines 31390 31135 -255
=======================================
- Hits 31286 31039 -247
+ Misses 104 96 -8
Continue to review full report at Codecov.
|
@@ -1393,11 +1387,7 @@ def fit(self, X, y=None): | |||
|
|||
|
|||
@pytest.mark.parametrize("component_class", all_components()) | |||
def test_component_equality_all_components( | |||
component_class, | |||
logistic_regression_binary_pipeline_class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing because unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Thank you for this! I think it's cool that we're using the new api for tests and that we can reduce the number of lines in the test files as well.
Left some minor comments on ways to remove some of the existing class-style api uses as well as some typos in the component names in some parameter dicts.
}[problem_type_value] | ||
baseline_pipeline_class = { | ||
ProblemTypes.BINARY: "evalml.pipelines.BinaryClassificationPipeline", | ||
ProblemTypes.MULTICLASS: "evalml.pipelines.MulticlassClassificationPipeline", | ||
ProblemTypes.REGRESSION: "evalml.pipelines.RegressionPipeline", | ||
ProblemTypes.TIME_SERIES_REGRESSION: "evalml.pipelines.TimeSeriesRegressionPipeline", | ||
}[problem_type_value] | ||
pipeline_class = _get_pipeline_base_class(problem_type_value) | ||
|
||
class DummyPipeline(pipeline_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of the old-style class api definition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Ah yeah I was trying to figure out a way to replace this but was having a hard time 😭
I think the reason was related to these lines:
mock_score_1 = MagicMock(return_value={objective.name: pipeline_scores[0]})
mock_score_2 = MagicMock(return_value={objective.name: pipeline_scores[1]})
Pipeline1.score = mock_score_1
Pipeline2.score = mock_score_2
If the pipeline classes are removed, it's hard to set each of the mock score methods for each pipeline, gah. Will take another stab at it and see if there's a way around this still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, I think it gets pretty tricky in some of these cases: we mock the pipelines searched to return a specific score, and then compare them to the baseline scores. If we don't have a specific class to mock, then the pipeline searched classes and the baseline classes become the same and are hard to mock.
X_y_binary, | ||
AutoMLTestEnv, | ||
): | ||
# Test that percent-better-than-baseline is correctly computed when scores differ across folds | ||
X, y = X_y_binary | ||
|
||
class DummyPipeline(dummy_binary_pipeline_class): | ||
class DummyPipeline(BinaryClassificationPipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I don't think we need DummyPipeline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as above with difficulty mocking pipeline vs baseline without a specific pipeline class)
evalml/tests/model_understanding_tests/test_partial_dependence.py
Outdated
Show resolved
Hide resolved
@@ -25,7 +25,6 @@ | |||
|
|||
class DoubleColumns(Transformer): | |||
"""Custom transformer for testing permutation importance implementation. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok if we keep the class-style api for test_fast_permutation_importance_matches_slow_output
. Just curious if you've tried this:
- Mark the test as needing non-core dependencies (
pytest.mark.noncore_dependency
) - Refactor
test_cases
so that it's a tuple of component_graph, parameters - Instantiate the pipeline class with the tuple in
test_cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I tried to do something along the lines of f480a93
I guess where it failed was that I was initializing the pipelines in the parameterization. I then tried to mark just the case with the target encoder as a noncore_dependency, rather than the whole test. I can go back and try to use a (component_graph, parameters) tuple instead--I chose to use pipeline classes because we also have a regression pipeline test / graph that we test, which might be hard to capture just using the component graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, I tried again, trying to keep the pipeline instances as parameters so that we don't have to determine which pipeline class to use to initialize the parameters and component graph. It doesn't work because our the noncore dependency Target Encoder--I'll just keep it as the class-style API :')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! 🚢
|
||
assert len(automl.rankings) == 2 | ||
assert len(automl.full_rankings) == 2 | ||
assert 0.1234 in automl.rankings["mean_cv_score"].values | ||
|
||
with env.test_context(score_return_value={"Log Loss Binary": 0.5678}): | ||
test_pipeline_2 = dummy_binary_pipeline_class( | ||
test_pipeline_2 = dummy_binary_pipeline.new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of new
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #2184.
evalml/tests/conftest.py
is a good place to look at the changes made, as all other changes are likely to address the fixture changes.Also does general cleanup and removes fixtures added to parameters where they are not used. Happy to argue or discuss any place where we may favor the previous implementation!
In some cases, such as the time series pipeline classes and dask pipelines in
evalml/tests/automl_tests/dask_test_utils.py
, I kept the test classes because I thought it was beneficial: Time series pipeline classes require specific parameters for initialization that we may not always want as part of the fixture definition, and the dask classes override methods to test specific functionality.There were a lot of classes in
evalml/tests/model_understanding_tests/test_permutation_importance.py
that I wanted to replace, but it was surprisingly difficult to get a clean implementation because we use Target Encoder in one of the classes, and Target Encoder is only available for our non-core dependencies. In the current setup, classes are not evaluated at runtime, so we don't run into an issue. However, if we try to replace these pipeline classes with instances instead, we'll actually evaluate the pipeline initialization and in the case of Target Encoder and core-dependencies is True, we'll get an error seen here: https://github.com/alteryx/evalml/runs/4710317986?check_suite_focus=trueI still wonder if there's a way to clean this up but for now, it's not urgent.
I have mild trepidation about using fixtures that return instances and then calling fit/transform in individual tests. We can always call .new() to create a duplicate instance of the pipeline instance, but it seems like each test does not step on each other's toes. According to https://docs.pytest.org/en/6.2.x/fixture.html: "Fixtures are created when first requested by a test, and are destroyed based on their scope: function: the default scope, the fixture is destroyed at the end of the test."
That means that each function should have its own fixture scope and we don't need to worry about the same instance being used in multiple methods!