Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polynomial Detrending Component #1992

Merged
merged 13 commits into from Mar 23, 2021
Merged

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fix #1944


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Mar 17, 2021

Codecov Report

Merging #1992 (e01ee05) into main (6eaa1dc) will decrease coverage by 0.1%.
The diff coverage is 98.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1992     +/-   ##
=========================================
- Coverage   100.0%   100.0%   -0.0%     
=========================================
  Files         276      278      +2     
  Lines       22571    22714    +143     
=========================================
+ Hits        22565    22705    +140     
- Misses          6        9      +3     
Impacted Files Coverage Δ
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_dependencies.py 85.8% <50.0%> (-14.2%) ⬇️
evalml/pipelines/components/component_base_meta.py 100.0% <100.0%> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <100.0%> (ø)
.../components/transformers/preprocessing/__init__.py 100.0% <100.0%> (ø)
...transformers/preprocessing/polynomial_detrender.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
...tests/component_tests/test_polynomial_detrender.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_utils.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 100.0% <100.0%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6eaa1dc...e01ee05. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 1944-polynomial-detrending-component branch 2 times, most recently from b83517e to 5a6083b Compare March 18, 2021 18:35
@@ -19,6 +19,8 @@ def _check_for_fit(self, X=None, y=None):
klass = type(self).__name__
if not self._is_fitted and self.needs_fitting:
raise ComponentNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
elif method.__name__ == 'inverse_transform':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a short-term fix to make the component metaclass aware of the inverse_transform method. The current implementation will not work because if y is None, the method will run with only the X argument which is not what we want.

I think we can change this implementation to something like:

def _check_for_fit(self, *args, **kwargs):
    klass = type(self).__name__
    if not self._is_fitted and self.needs_fitting:
        raise ComponentNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
    return method(self, *args, **kwargs)

I think this would be more maintainable in the long-term. I will file an issue for this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense! Just for my own edification, why would this new line be necessary? Why not raise a ValueError in transform for when y is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh I think we're currently raising an exception in inverse_transform if y is None. The problem is that the meta class doesn't pass the y argument if it's None to inverse_transform in the current implementation. This causes an error because we're expecting a y argument, even if it's None.

@freddyaboulton freddyaboulton force-pushed the 1944-polynomial-detrending-component branch from 092a02e to f27b798 Compare March 19, 2021 14:23
@freddyaboulton freddyaboulton marked this pull request as ready for review March 19, 2021 16:59
y_series = _convert_woodwork_types_wrapper(y_dt.to_series())
y_t = self._component_obj.transform(y_series)
y_t = ww.DataColumn(pd.Series(y_t, index=y_series.index))
return X, y_t
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm returning X and y because our ComponentGraph assumes targets are returned as the second value in a tuple

@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Mar 19, 2021

FYI it's impossible to get 100% coverage on the patch:

        # We don't expect to install sktime in python 3.9. Let's verify it's not present:
        if 'sktime' in module and is_running_py_39_or_above:
            with pytest.raises(ModuleNotFoundError):
                import_module(module)
            continue

Those lines only run when python 3.9 is installed and we don't measure coverage for python 3.9. But the fact that the 3.9 tests pass means that those lines are run 😄

@freddyaboulton freddyaboulton force-pushed the 1944-polynomial-detrending-component branch from f27b798 to 29b973d Compare March 19, 2021 19:17
Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great, nicely done!

If 2, quadratic model is fit, etc.
random_seed (int): Seed for the random number generator. Defaults to 0.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error check for degree if passed in as a float? Perhaps casting it to an int and then adding a test for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!


Arguments:
X (ww.DataTable, pd.DataFrame, optional): Ignored.
y (ww.DataColumn, pd.Series): Target variable to detrend. Ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is y ignored here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 😂 Good catch!

@@ -490,7 +491,8 @@ def all_init(self, *args, **kwargs):
component = component_class(test_arg="test")
component_with_different_kwargs = component_class(diff_test_arg="test")
assert component.parameters['test_arg'] == "test"
assert component._component_obj.test_arg == "test"
if not isinstance(component, PolynomialDetrender):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is PolynomialDetrender left out of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raises an sklearn exception!

@@ -19,6 +19,8 @@ def _check_for_fit(self, X=None, y=None):
klass = type(self).__name__
if not self._is_fitted and self.needs_fitting:
raise ComponentNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
elif method.__name__ == 'inverse_transform':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense! Just for my own edification, why would this new line be necessary? Why not raise a ValueError in transform for when y is None?


@pytest.mark.parametrize("input_type", ['np', 'pd', 'ww'])
@pytest.mark.parametrize("use_int_index", [True, False])
@pytest.mark.parametrize("degree", [1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! The test coverage is solid.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pretty darn cool stuff--the tests were great in understanding what's going on. Left some nitpicky comments per usual but 👍👍👍

"\n",
"## Python 3.9 support\n",
"\n",
"Evalml can still be installed with pip in python 3.9 but note that `sktime`, one of our dependencies, will not be installed because that library does not yet support python 3.9. This means the ``PolynomialDetrending`` component will not be usable in python 3.9. You can try to install `sktime` [from source](https://www.sktime.org/en/latest/installation.html#building-from-source) in python 3.9 to use the ``PolynomialDetrending`` component but be warned that we only test it in python 3.7 and 3.8."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

def test_polynomial_detrender_raises_value_error_target_is_none(ts_data):
X, y = ts_data

with pytest.raises(ValueError, match="y cannot be None for PolynomialDetrender!"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realllly nit-picking now but do we want to add tests for fit/transform too for completeness :')

Comment on lines +50 to +70
if input_type == 'np':
X = X_input.values
y = y_input.values
elif input_type == 'ww':
X = ww.DataTable(X_input)
y = ww.DataColumn(y_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the make_data_type fixture for this I believe? hehe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip! I didn't know about make_data_type! I'm not sure if it's worth it in this case because there is no pandas to numpy conversion and the woodwork conversion is straight-forward enough in this case. But I will keep it in mind in the future.

@freddyaboulton freddyaboulton force-pushed the 1944-polynomial-detrending-component branch from 29b973d to b9e825d Compare March 23, 2021 15:15
@@ -11,3 +11,4 @@ seaborn>=0.11.1
category_encoders>=2.0.0
statsmodels >= 0.12.2
imbalanced-learn>=0.7.0
sktime>=0.5.3;python_version<"3.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yep good call putting this in requirements and note core-requirements, lgtm

@@ -99,6 +100,11 @@ def has_minimal_dependencies(pytestconfig):
return pytestconfig.getoption("--has-minimal-dependencies")


@pytest.fixture
def is_running_py_39_or_above():
return sys.version_info >= (3, 9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty!

if module == 'sktime' and is_running_py_39_or_above:
with pytest.raises(ModuleNotFoundError):
import_module(module)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. So the reason this doesn't appear in our coverage report is because we're currently only generating coverage on python 3.8. 👍

detrender = detrend.Detrender(trend.PolynomialTrendForecaster(degree=degree))

super().__init__(parameters=params,
component_obj=detrender,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to set component_obj for this component? It appears you're doing custom logic for fit/transform/fit_transform, and that was the main utility of passing component_obj down to the base class

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 !

@dsherry dsherry merged commit a1d04e0 into main Mar 23, 2021
@dsherry dsherry mentioned this pull request Mar 24, 2021
@freddyaboulton freddyaboulton deleted the 1944-polynomial-detrending-component branch June 3, 2021 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polynomial detrending transformer
5 participants