Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Implement missing value imputation for TimeSeriesDataFrame #2781

Merged
merged 6 commits into from
Feb 3, 2023

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented Jan 30, 2023

Description of changes:

  • Two new methods for imputing the missing values. These should be called before passing the data to the TimeSeriesPredictor.
    • to_regular_index(freq) - fills gaps in an irregularly-sampled time series with NaNs
    • fill_missing_values(method) - drop leading NaNs & replace other NaNs (middle/trailing) using the chosen method (forward fill or interpolation)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur requested a review from tonyhoo January 30, 2023 15:29
@github-actions
Copy link

Job PR-2781-9c1586a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2781/9c1586a/index.html

----------
method : {"ffill", "interpolate"}, default = "ffill"
Method used to impute missing values.
"ffill" - propagate last valid observation forward.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support all the filling methods available in pandas.

Copy link
Collaborator Author

@shchur shchur Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two potential problems with bfill/backfill:

  • it doesn't fill the trailing NaNs, and these are the ones that are actually important to ensure that all models can generate predictions over the forecast horizon. E.g, after we bfill [1, 1, NaN, NaN], we again get [1, 1, NaN, NaN] and this cannot be processed by TimeSeriesPredictor. We cannot just drop the trailing NaNs as easily as we can drop leading NaNs with ffill.
  • it introduces information leakage from the test/val set, which might affect model selection.

Do you think there is a strong potential use case for bfill that we should support & find a way around these problems?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information leakage is a critical point, agree that we should drop bfill for now. Curious for interpolate, will linear interpolation cause leakage as well by any chance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point regarding interpolation. I've checked how other libraries (sktime, darts) handle this and updated the functionality to be more in line with them.

  • Do not change the index (never drop the leading NaNs)
  • By default, use ffill to fill gaps + trailing NaNs, then use bfill to fill the leading NaNs.
  • Added options constant and bfill.
  • Added warnings for bfill and interpolate that these may lead to data leakage.

@github-actions
Copy link

Job PR-2781-faa668b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2781/faa668b/index.html

@github-actions
Copy link

Job PR-2781-6b6a994 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2781/6b6a994/index.html

@github-actions
Copy link

Job PR-2781-7ab5a94 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2781/7ab5a94/index.html

@shchur shchur force-pushed the ffill branch 2 times, most recently from 605b03b to 4acf1a8 Compare February 1, 2023 17:05
@github-actions
Copy link

github-actions bot commented Feb 1, 2023

Job PR-2781-44bf527 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2781/44bf527/index.html

@shchur shchur requested a review from tonyhoo February 2, 2023 17:04
@shchur shchur merged commit 746f796 into autogluon:master Feb 3, 2023
@shchur shchur deleted the ffill branch February 3, 2023 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants