Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get PolynomialDecomposer to Parity with STLDecomposer #3768

Merged
merged 4 commits into from
Oct 25, 2022

Conversation

chukarsten
Copy link
Contributor

@chukarsten chukarsten commented Oct 20, 2022

A follow up PR to move more of the tests over to test_decomposer.py and parametrize them between the two child decomposers in order to prepare for more decomposers.

@codecov
Copy link

codecov bot commented Oct 20, 2022

Codecov Report

Merging #3768 (791f4f3) into main (ed2a248) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3768     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        343     343             
  Lines      35680   35696     +16     
=======================================
+ Hits       35544   35560     +16     
  Misses       136     136             
Impacted Files Coverage Δ
...omponents/transformers/preprocessing/decomposer.py 99.4% <100.0%> (+0.2%) ⬆️
...ransformers/preprocessing/polynomial_decomposer.py 100.0% <100.0%> (ø)
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <100.0%> (ø)
...omponent_tests/decomposer_tests/test_decomposer.py 99.7% <100.0%> (+0.2%) ⬆️
...sts/decomposer_tests/test_polynomial_decomposer.py 100.0% <100.0%> (ø)
...nent_tests/decomposer_tests/test_stl_decomposer.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 97.9% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -16,6 +16,10 @@
decomposer_list = [STLDecomposer, PolynomialDecomposer]


def get_trend_dataframe_format_correct(df):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved this up from the child class test modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more a utility for the tests to use, just to make sure that get_trend_df() returns a dataframe with the proper columns. get_trend_dataframe_format_correct() will be used in the get_trend_dataframe() tests.

@@ -34,6 +38,21 @@ def test_set_time_index(decomposer_child_class):
assert isinstance(y_time_index.index, pd.DatetimeIndex)


@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all this up from child class test modules.

@@ -481,3 +500,125 @@ def test_decomposer_bad_target_index(
match="doesn't support target data with index of type",
):
dec._choose_proper_index(y)


@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all this up from child class test modules....

pd.Series(np.zeros(len(output_y_t))).set_axis(y_new.index),
output_y_t,
check_exact=False,
atol=0.1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have to loosen this tolerance (atol) here a bit. Part of the problem was that the different decomposers are able to more or less properly capture the trend. Based on how much of the trend they capture, the answers will be closer or further away from a horizontal line. Maybe this isn't a good way to test the transform...you tell me!

output_y,
check_dtype=False,
check_names=False,
atol=0.2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tolerance was still good enough for both decomposers.

period=seasonal_period,
freq_str="D",
set_time_index=True,
seasonal_scale=0.05, # Increasing this value causes the decomposer to miscalculate trend
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to try and get the decomposer to perform a little better and do a better job capturing the trend. This just reduces the amplitude of the sinusoidal oscillations added to the synthetic signal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does increasing seasonal_scale cause the decomposer to miscalculate the trend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has yet to be determined. What we've noticed in some datasets is that there is a tendency for the periodic signal to "leak" into the trend. Loosely, I've noticed that a larger relative seasonal signal vs. trend exacerbates this to a degree.

)


def build_test_target(subset_y, seasonal_period, transformer_fit_on_data, to_test):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been moved up to the base test module class unchanged.

@@ -348,25 +326,3 @@ def test_polynomial_decomposer_uses_time_index(
assert not isinstance(y.index, pd.DatetimeIndex)
else:
assert isinstance(y.index, pd.DatetimeIndex)


@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all this up to child class test modules in the base STLDecomposer PR. This just needed to be removed.

@chukarsten chukarsten force-pushed the polynomial_decomposer_parity branch from 797ea73 to 4084bf4 Compare October 24, 2022 15:10
@chukarsten chukarsten marked this pull request as ready for review October 25, 2022 04:32
@chukarsten chukarsten force-pushed the polynomial_decomposer_parity branch 3 times, most recently from 0902f77 to d535e7e Compare October 25, 2022 04:53
@@ -95,6 +95,7 @@ def fit(self, X: pd.DataFrame, y: pd.Series = None) -> PolynomialDecomposer:
"""
self.original_index = y.index if y is not None else None
X, y = self._check_target(X, y)
self._map_dt_to_integer(self.original_index, y.index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent behind this function is to build a mapping between integer indices and datetime indices and store it so that it's easier to handle out of sample calls to inverse_transform() that have integer indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_map_dt_to_integer is called in both STL and PD but _convert_int_index_to_dt_index is only called for the PD? What's the reason for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, to be fair, it's probably not needed in STL. Yet. It kind of depends on how we want to move forward with standardizing the fit functions and what attributes we want to be defined in the class. I sort of want to do an additional PR to completely standardize both fit functions and see if I can pass that up into.

Comment on lines +223 to +225
trend = self._component_obj.inverse_transform(
y_t_in_sample.set_axis(y_t_dt_ind),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been the main offender and is one of the main differences between the STLDecomposer and PolynomialDecomposer. With respect to the trend, the PD's underlying object has the same fit/transform/inverse_transform API that our components do, so it fits well. The STLD does not have this so requires a lot of code to do manual trend projection. I believe that both required the detrended target signal to have datetime indices, which is challenging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform and not depend on PD's native inverse_transform?

@chukarsten chukarsten force-pushed the polynomial_decomposer_parity branch 2 times, most recently from 222977d to 5ff4982 Compare October 25, 2022 15:03
Base automatically changed from stl_decomposer to main October 25, 2022 15:26
@chukarsten chukarsten force-pushed the polynomial_decomposer_parity branch from 0e17897 to 1853983 Compare October 25, 2022 15:52
@chukarsten chukarsten force-pushed the polynomial_decomposer_parity branch from 1853983 to 39e6913 Compare October 25, 2022 15:55
Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some random questions but I think overall it seems to be on track, nice work!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM - seeing the tests pass like this gives me good confidence in parity 👍

Comment on lines +223 to +225
trend = self._component_obj.inverse_transform(
y_t_in_sample.set_axis(y_t_dt_ind),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform and not depend on PD's native inverse_transform?

@chukarsten chukarsten merged commit 8074870 into main Oct 25, 2022
@chukarsten chukarsten deleted the polynomial_decomposer_parity branch October 25, 2022 23:27
@chukarsten chukarsten mentioned this pull request Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants