Skip to content

Refactor and fix make_pipeline tests to assert correct order and address new Woodwork Unknown type #2572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Aug 7, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jul 29, 2021

Closes #2553.

  • Tests needed to be updated to handle Woodwork's defaulting to "Unknown" rather than "NaturalLanguage". This is the case when our input is pandas type, since calling infer_feature_type will convert text columns to Unknown type, which returns a different pipeline.
  • Pipeline tests needed to address the new LogTransformer component that could be added to pipeline
  • Complete refactor of if statements and tests: many of the make_pipeline tests had similar logic, but different data. Having to update these tests for the updates in make_pipeline logic meant I had to update it in many places. It may be hard to tell from the diff, but I refactored all the tests into one parameterized test and made a fixture for the data to use. Although having one large parameterized method may seem scary, it's definitely easier to add new test cases via new columns in the df. I primarily kept all of the existing tests as is with some touchups but open to changing / removing :)
  • Removed Time Series Baseline Estimator in parameters dict passed. Thanks to @bchen1116's great work, a warning was raised indicating that the parameters for this component weren't used since it's not in the pipeline! 😁

@angela97lin angela97lin self-assigned this Jul 29, 2021
@codecov
Copy link

codecov bot commented Jul 29, 2021

Codecov Report

Merging #2572 (6afabd5) into main (5410bb5) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2572     +/-   ##
=======================================
- Coverage   99.9%   99.9%   -0.0%     
=======================================
  Files        295     295             
  Lines      27093   26811    -282     
=======================================
- Hits       27047   26765    -282     
  Misses        46      46             
Impacted Files Coverage Δ
evalml/pipelines/utils.py 99.2% <100.0%> (+0.1%) ⬆️
evalml/tests/pipeline_tests/test_pipeline_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5410bb5...6afabd5. Read the comment docs.

@angela97lin angela97lin changed the title Fix make_pipeline tests to assert correct order and address new Woodwork Unknown type Refactor and fix make_pipeline tests to assert correct order and address new Woodwork Unknown type Aug 2, 2021
component.name for component in expected_components
]


@pytest.mark.parametrize("input_type", ["pd", "ww"])
@pytest.mark.parametrize("problem_type", ProblemTypes.all_problem_types)
def test_make_pipeline_no_datetimes(input_type, problem_type):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see value in this compared to test_make_pipeline_text_columns so deleting :)

@@ -181,7 +180,7 @@ def make_pipeline(
problem_type,
parameters=None,
sampler_name=None,
extra_components=[],
extra_components=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be a good idea to change this default value to None, rather than an empty list. Right now, we don't modify this parameter but if we did, it could lead to unexpected behavior:

From https://docs.python.org/3/reference/compound_stmts.html#function-definitions:

Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Thankfully this didn't cause any bugs since it's not used

Comment on lines -584 to -590
if is_time_series(problem_type):
delayed_features = [DelayedFeatureTransformer]
if estimator_class.model_family == ModelFamily.LINEAR_MODEL:
estimator_components = [StandardScaler, estimator_class]
elif estimator_class.model_family == ModelFamily.CATBOOST:
estimator_components = [estimator_class]
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself being confused by the different if/else statements (some of which are related, others which are not!) so I thought this would be a clearer breakdown but let me know if you have thoughts/opinions!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: We shouldn't support plain numpy arrays anymore.

if estimator_class.model_family == ModelFamily.ARIMA:
expected_components = [Imputer] + estimator_components
@pytest.fixture
def get_test_data_from_configuration():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixture that contains the data for the make_pipeline test!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this, anything that consolidates fixtures is good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

if is_time_series(problem_type):
parameters = {
"pipeline": {"date_index": "soem dates", "gap": 1, "max_delay": 1},
"Time Series Baseline Estimator": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using Time Series Baseline Estimator so this is unnecessary!

("email with other features", ["email", "numerical", "categorical"]),
],
)
def test_make_pipeline_master(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_description isn't being used by the code but since we no longer have the title of tests to tell us what is going on, I thought this would be useful to quickly understand what each test case is for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use it in the assert error message?

@angela97lin angela97lin marked this pull request as ready for review August 3, 2021 03:43
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid, solid PR. I love the mass deletion and consolidation into the fixture. Very clean, very nice.

Comment on lines +167 to +174
[OneHotEncoder]
if estimator_class.model_family != ModelFamily.CATBOOST
and (
"categorical" in column_names
or (
any(ltype in column_names for ltype in ["url", "email"])
and input_type == "ww"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this formatting gave me a stroke!

if estimator_class.model_family == ModelFamily.ARIMA:
expected_components = [Imputer] + estimator_components
@pytest.fixture
def get_test_data_from_configuration():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this, anything that consolidates fixtures is good.

"this is another string",
"this is just another string",
"evalml should handle string input",
"cats are gr8",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're a woman of both taste and distinction

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Thank you for this!

@@ -181,7 +180,7 @@ def make_pipeline(
problem_type,
parameters=None,
sampler_name=None,
extra_components=[],
extra_components=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Thankfully this didn't cause any bugs since it's not used

if estimator_class.model_family == ModelFamily.ARIMA:
expected_components = [Imputer] + estimator_components
@pytest.fixture
def get_test_data_from_configuration():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

("email with other features", ["email", "numerical", "categorical"]),
],
)
def test_make_pipeline_master(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use it in the assert error message?

component.name for component in expected_components
]

@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Non-blocking food-for-thought comment hehe but did you consider including the expected answer in the parametrization rather than computing the answer within the test?

Pro: It'd be much easier to add future test cases that are independent of the other ones. I remember having to "fight" with the answer-generation-logic when I was adding test cases for URL and Email featurizers.

Con: There are a lot of test cases so it's a lot of data to specify. There will also be some duplicate answers across these tests.

I think this is great as-is but just wanted to put that out there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh, I love this idea a lot. The reason I decided to go with logic, though, is because the estimator also affects what the pipeline looks like, and we currently iterate through each of the estimators in the test. There's room to argue whether or not it's really necessary to loop through all of the estimators as we do or if we should just write out test cases for the special estimators such as CatBoost which can handle categorical cols natively and thus don't require a OneHotEncoder... but maybe this will be worth another iteration of refactoring.

I do agree though, I had to fight and wrangle the logic quite some bit to get all of these tests to pass, and if we add more logic, I'm not sure how well everything will continue to hold up 😂

Comment on lines -584 to -590
if is_time_series(problem_type):
delayed_features = [DelayedFeatureTransformer]
if estimator_class.model_family == ModelFamily.LINEAR_MODEL:
estimator_components = [StandardScaler, estimator_class]
elif estimator_class.model_family == ModelFamily.CATBOOST:
estimator_components = [estimator_class]
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: We shouldn't support plain numpy arrays anymore.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I love the parametrization of test cases to reduce code and improve readability!

"test_description, column_names",
[
("all nan is not categorical", ["all_null", "numerical"]),
("all types", ["all_null", "categorical", "dates", "numerical"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't all_types also include text, email, and url? They're all being tested anyways so definitely not a big deal, but should the test_description be changed?

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the cleanup you did here! The new test is very through. Left a comment on some extra simplification, but also left a comment just for my understanding.

]

@pytest.mark.parametrize(
"test_description, column_names",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is test_description just to allow people who look at the test to understand what each test is param iteration is testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Now that we can't use the test names to understand, I thought having test_description would make it easier to see at a quick glance what each test case is for 😁

@angela97lin angela97lin merged commit f46b94d into main Aug 7, 2021
@angela97lin angela97lin deleted the 2553_make_pipeline_tests branch August 7, 2021 02:12
@chukarsten chukarsten mentioned this pull request Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Our make_pipeline tests don't actually verify the pipeline is correct!
5 participants