Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken aggregate function in main branch #211

Closed
candalfigomoro opened this issue Jun 15, 2023 · 4 comments · Fixed by #232
Closed

Broken aggregate function in main branch #211

candalfigomoro opened this issue Jun 15, 2023 · 4 comments · Fixed by #232
Labels

Comments

@candalfigomoro
Copy link

candalfigomoro commented Jun 15, 2023

What happened + What you expected to happen

If I use hierarchicalforecast v0.3.0 there's a problem with "unbalanced" time series, see: #189

However, if you exclude this specific issue, it works fine.

If I use the latest version from the main branch (maybe after this commit? c107217) I have huge issues.

I have unbalanced time series.

If I set the new "is_balanced" parameter to False (the default behavior), the returned dataframe has many missing dates (ds). For some unique_ids I expect many years of data but I just get 2 rows.

If I set the new "is_balanced" parameter to True, I get a reshaping error.

The same code works with version 0.3.0. I had to revert to the released 0.3.0 version because the version in the main branch is not usable for me.

Versions / Dependencies

hierarchicalforecast main/master branch
Python 3.10
Linux
pandas 2.0.2

Reproduction script

Y_df, S_df, tags = aggregate(Y_df, spec)

Unfortunately, I can't share my data set.

Issue Severity

None

@candalfigomoro
Copy link
Author

Just to add some context:

I have time series with the same ending date but different starting dates.

I think it's ok for reconciliation to fill missing dates with zeros.

However, for model training data I don't want to add leading zeros to my time series.

@NudnikShpilkis
Copy link

Doesn't aggregate have a is_balanced parameter not a is_unbalanced? The function assumes the data is not balanced by default.

@candalfigomoro
Copy link
Author

Doesn't aggregate have a is_balanced parameter not a is_unbalanced? The function assumes the data is not balanced by default.

Yes I misspelled in the message above. The parameter is called is_balanced and the default value is False. I edited the message above. The issue is the same.

@patleg78
Copy link

patleg78 commented Aug 3, 2023

I was looking into the source code for new version and old version
here I find some the order of code logic
new version with sparse (say snippet1 )
" Y_bottom_df = Y_bottom_df.groupby(['unique_id', 'ds'])['y'].sum().reset_index()
Y_bottom_df.unique_id = Y_bottom_df.unique_id.astype('category')
Y_bottom_df.unique_id = Y_bottom_df.unique_id.cat.set_categories(S_df.columns)
"

here we are aggregating first and then getting column names as index for Y_bottom_df.

but is old version (say snippet2 )

" Y_bottom_df.unique_id = Y_bottom_df.unique_id.astype('category')
Y_bottom_df.unique_id = Y_bottom_df.unique_id.cat.set_categories(S_df.columns)
Y_bottom_df = Y_bottom_df.groupby(['unique_id', 'ds'])['y'].sum().reset_index()
"
we were first creating all the unique_id's first that is helping us to get the complete time series for all the unique_id's

this small change has resolved this issue for me
just change (say snippet1 ) with (say snippet2 ) in new updated version
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants