Skip to content

Improve automatic periodicity determination#3912

Merged
eccabay merged 19 commits into
mainfrom
388_determine_period
Jan 23, 2023
Merged

Improve automatic periodicity determination#3912
eccabay merged 19 commits into
mainfrom
388_determine_period

Conversation

@eccabay

@eccabay eccabay commented Jan 6, 2023

Copy link
Copy Markdown
Contributor

Simplifies and improves performance of our decomposer.determine_period method by removing the need to pre-fit the model for detrending, using a moving average to detrend instead.

@codecov

codecov Bot commented Jan 6, 2023

Copy link
Copy Markdown

Codecov Report

Merging #3912 (c65da15) into main (62029a0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3912     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        347     347             
  Lines      36755   36745     -10     
=======================================
- Hits       36625   36619      -6     
+ Misses       130     126      -4     
Impacted Files Coverage Δ
...sts/decomposer_tests/test_polynomial_decomposer.py 100.0% <ø> (ø)
...nent_tests/decomposer_tests/test_stl_decomposer.py 100.0% <ø> (ø)
...omponents/transformers/preprocessing/decomposer.py 100.0% <100.0%> (+0.8%) ⬆️
...omponent_tests/decomposer_tests/test_decomposer.py 100.0% <100.0%> (+0.4%) ⬆️
evalml/tests/conftest.py 98.2% <0.0%> (+0.3%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review January 12, 2023 21:46

@jeremyliweishih jeremyliweishih left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarification questions and quick fixes but otherwise looks good to me!

"""Determines the relative maxima of the target's autocorrelation."""
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y)))
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))]
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we need this change!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! We were seeing that datasets with no seasonality were still getting a result due to having some very small autocorrelations. This is to filter those out so we correctly say these datasets have no seasonality. Datasets with a seasonal period will be more on the scale of 0.1-0.6, so we aren't filtering out anything significant here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.

Comment thread evalml/pipelines/components/transformers/preprocessing/decomposer.py Outdated

@chukarsten chukarsten left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments! Nice work!!

"""Determines the relative maxima of the target's autocorrelation."""
acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y)))
filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))]
filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.

Comment thread evalml/tests/component_tests/decomposer_tests/test_decomposer.py
Comment thread evalml/pipelines/components/transformers/preprocessing/decomposer.py Outdated
"""Uses a moving average to determine the target's trend and remove it."""
# A larger moving average will be less likely to remove the seasonal signal
# but we need to make sure we're not passing in a window that's larger than the data
moving_avg = min(51, len(y) // 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 51 and 3 seem a little arbitrary and kind of trigger my "should this be hardcoded" sense, but I think in this case it's probably fine.

@eccabay eccabay enabled auto-merge (squash) January 20, 2023 22:35
@eccabay eccabay merged commit 21e0baa into main Jan 23, 2023
@eccabay eccabay deleted the 388_determine_period branch January 23, 2023 19:16
@chukarsten chukarsten mentioned this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants