Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Fix pandas groupby bug + GluonTS index bug #2420

Merged
merged 7 commits into from
Nov 23, 2022

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented Nov 16, 2022

Description of changes:

  • Fix where pandas groupby would sort item ids even when sort=False(related to BUG: df.groupby(sort=False) sorts multi-index-frames pandas-dev/pandas#17537)
  • Fix incorrect forecast index generated by GluonTS and Sktime models in some cases.
    • For example, if data has timestamps "2022-01-01 12:00", "2022-01-02 12:00" (frequency is daily), GluonTS will produce forecasts with timestamps "2022-01-01 00:00", "2022-01-02 00:00", which results in an AssertionError inside the TimeSeriesEvaluator here
  • Add tests covering these cases

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur added module: timeseries related to the timeseries module bug Something isn't working labels Nov 16, 2022
@github-actions
Copy link

Job PR-2420-28d8f49 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2420/28d8f49/index.html

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions
Copy link

Job PR-2420-9523ddf is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2420/9523ddf/index.html

@github-actions
Copy link

Job PR-2420-35c3780 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2420/35c3780/index.html

@github-actions
Copy link

Job PR-2420-e992f1b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2420/e992f1b/index.html

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -76,7 +77,7 @@ def __iter__(self) -> Iterator[Dict[str, Any]]:
df = self.target_df.loc[item_id]
time_series = {
FieldName.ITEM_ID: item_id,
FieldName.TARGET: df.squeeze().to_numpy(dtype=self.float_dtype),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a performance related change?

Copy link
Collaborator Author

@shchur shchur Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code may lead to a bug. If df has length 1, then df.squeeze() will be a float instead of pd.Series, and calling to_numpy on this float throws an exception.

There is no clean way to convert a DataFrame to a Series (squeeze is the only function recommended for this purpose), so we rather do the flattening into 1D after converting to numpy with ravel.

@shchur shchur merged commit 3b62931 into autogluon:master Nov 23, 2022
@shchur shchur deleted the ts-bugfix branch November 23, 2022 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: timeseries related to the timeseries module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants