Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate_seasonal_error bug for multivariate data #2494

Open
gorold opened this issue Dec 14, 2022 · 5 comments
Open

calculate_seasonal_error bug for multivariate data #2494

gorold opened this issue Dec 14, 2022 · 5 comments
Labels
bug Something isn't working multivariate This concerns multivariate models

Comments

@gorold
Copy link
Contributor

gorold commented Dec 14, 2022

Description

calculate_seasonal_error is buggy when the inputs are multivariate data.
past_data = extract_past_data(time_series, forecast) results in (dim, time) shaped array rather than a (time,) shaped array, which is then passed into calculate_seasonal_error, which calls the follow code snippet.

date_before_forecast = forecast.index[0] - forecast.freq
return np.atleast_1d(
np.squeeze(time_series.loc[:date_before_forecast].transpose())
)

Rather than calculating the seasonal error, it is differencing between multivariate dimensions.

y_t = past_data[:-forecast_freq]
y_tm = past_data[forecast_freq:]

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

from datetime import timedelta
import pandas as pd
import numpy as np
from gluonts.evaluation import MultivariateEvaluator
from gluonts.model.forecast import QuantileForecast

dim = 5
time = 20
past = 40

forecast = QuantileForecast(
    forecast_arrays=np.random.randn(3, dim, time),
    forecast_keys=['mean', '0.95', '0.05'],
    start_date=pd.to_datetime('today').to_period('H')
)

time_series = pd.DataFrame(zip(*[np.arange(past + time) for _ in range(dim)]), 
                            pd.period_range(start=pd.to_datetime('today') - timedelta(hours=past), periods=past+time, freq='H'))
evaluator = MultivariateEvaluator(seasonality=1)
evaluator.get_metrics_per_ts(time_series, forecast)

Error message or code output

'seasonal_error': 0.0

Desired message or code output

'seasonal_error': 1.0

Environment

  • GluonTS version: '0.11.4'
@gorold gorold added the bug Something isn't working label Dec 14, 2022
@lostella lostella added this to the v0.12 milestone Dec 15, 2022
@lostella lostella self-assigned this Dec 16, 2022
@lostella
Copy link
Contributor

@gorold I believe the issue is that the layout of the forecast array should be different: (num_stat, time, dimensions) instead of (num_stat, dimensions, time) as you have it. This can be seen on the dev branch by how the copy_dim method was implemented:

def copy_dim(self, dim: int) -> "QuantileForecast":
if len(self.forecast_array.shape) == 2:
forecast_array = self.forecast_array
else:
target_dim = self.forecast_array.shape[2]
assert dim < target_dim, (
f"must set 0 <= dim < target_dim, but got dim={dim},"
f" target_dim={target_dim}"
)
forecast_array = self.forecast_array[:, :, dim]
return QuantileForecast(
forecast_arrays=forecast_array,
start_date=self.start_date,
forecast_keys=self.forecast_keys,
item_id=self.item_id,
info=self.info,

Still, it seems like even swapping the dimensions in your MWE results in shape issues both on the v0.11.x as well as dev branch, so something is off in the multivariate case anyway. So I think this has to be addressed as follows:

  • Clarify in the docstring for the QuantileForecast what the required layout should be
  • Make sure the example you gave runs fine when forecast_arrays=np.random.randn(3, time, dim)

@gorold if you don't mind the question: did you uncover this while working on your own quantile-regression based multivariate model? Because I don't think there are any in GluonTS as of now

@gorold
Copy link
Contributor Author

gorold commented Dec 16, 2022

@lostella Thanks for getting back, the issue doesn't seem to arise from the QuantileForecast object. The dependency on forecasts in extract_past_data seems to only be to get the date index. seasonal_error is the in-sample error of the ground truth time series (calculate_seasonal_error is only a function of past_data and not forecasts).

Regarding the dimensions of the QuantileForecast object, I was following this. Seems like the convention changes from v0.11.x to dev branch.

def dim(self) -> int:
if self._dim is not None:
return self._dim
else:
if (
len(self.forecast_array.shape) == 2
): # 1D target. shape: (num_samples, prediction_length)
return 1
else:
# 2D target. shape: (num_samples, target_dim,
# prediction_length)
return self.forecast_array.shape[1]

I was combining multiple univariate quantile forecasters on a multivariate time series and using the MultivariateEvaluator to aggregate the metrics over the dimensions.

@gorold
Copy link
Contributor Author

gorold commented Dec 16, 2022

In fact, one more thing is that while MASE and MSIS may not have been defined for multivariate time series, the straightforward extension would be to calculate the seasonal error independently across dimensions. So the seasonal_error should be multivariate, but it is averaged across dimensions here it seems:

return np.mean(abs(y_t - y_tm))

@gorold
Copy link
Contributor Author

gorold commented Jan 1, 2023

Sorry, I just realised that MultivariateEvaluator.get_metrics_per_ts is not meant to be used in this manner. It seems like get_metrics_per_ts has a precondition, that the time series and forecasts should be univariate MultivariateEvaluator calls this function only after selecting one particular dimension:

agg_metrics, metrics_per_ts = super().__call__(
self.extract_target_by_dim(ts_iterator_set[dim], dim),
self.extract_forecast_by_dim(fcst_iterator_set[dim], dim),
)

Perhaps an assertion can be made in the get_metrics_per_ts function that inputs are univariate.

There is still an error in 0.11.x with multivariate QuantileForecast , namely copy_dim function is not defined, but seems to be fixed in dev - #2352.

from datetime import timedelta
import pandas as pd
import numpy as np
from gluonts.evaluation import MultivariateEvaluator
from gluonts.model.forecast import SampleForecast, QuantileForecast
from itertools import tee

dim = 5
time = 20
past = 40

quantile_forecast = QuantileForecast(
    forecast_arrays=np.random.randn(3, dim, time),  # following 0.11.x convention
    forecast_keys=['mean', '0.95', '0.05'],
    start_date=pd.to_datetime('today').to_period('H')
)

sample_forecast = SampleForecast(
    samples=np.random.randn(100, time, dim),
    start_date=pd.to_datetime('today').to_period('H')
)

time_series = pd.DataFrame(zip(*[np.arange(past + time) for _ in range(dim)]), 
                            pd.period_range(start=pd.to_datetime('today') - timedelta(hours=past), periods=past+time, freq='H'))
evaluator = MultivariateEvaluator(seasonality=1)

# ok
sample_agg, sample_item = evaluator([time_series], [sample_forecast])

# not ok
quantile_agg, quantile_item = evaluator([time_series], [quantile_forecast])

Raises error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[115], line 31
     28 sample_agg, sample_item = evaluator([time_series], [sample_forecast])
     30 # not ok
---> 31 quantile_agg, quantile_item = evaluator([time_series], [quantile_forecast])

File /usr/local/lib/python3.8/dist-packages/gluonts/evaluation/_base.py:720, in MultivariateEvaluator.__call__(self, ts_iterator, fcst_iterator, num_series)
    715 fcst_iterator_set = tee(
    716     fcst_iterator, target_dimensionality + len(self.target_agg_funcs)
    717 )
    719 for dim in eval_dims:
--> 720     agg_metrics, metrics_per_ts = super().__call__(
    721         self.extract_target_by_dim(ts_iterator_set[dim], dim),
    722         self.extract_forecast_by_dim(fcst_iterator_set[dim], dim),
    723     )
    725     all_metrics_per_ts.append(metrics_per_ts)
    727     for metric, value in agg_metrics.items():

File /usr/local/lib/python3.8/dist-packages/gluonts/evaluation/_base.py:230, in Evaluator.__call__(self, ts_iterator, fcst_iterator, num_series)
    228         mp_pool.join()
    229     else:
--> 230         for ts, forecast in it:
    231             rows.append(self.get_metrics_per_ts(ts, forecast))
    233 assert not any(
    234     True for _ in ts_iterator
    235 ), "ts_iterator has more elements than fcst_iterator"

File /usr/local/lib/python3.8/dist-packages/tqdm/std.py:1195, in tqdm.__iter__(self)
   1192 time = self._time
   1194 try:
-> 1195     for obj in iterable:
   1196         yield obj
   1197         # Update and possibly print the progressbar.
   1198         # Note: does not call self.update(1) for speed optimisation.

File /usr/local/lib/python3.8/dist-packages/gluonts/evaluation/_base.py:597, in MultivariateEvaluator.extract_forecast_by_dim(forecast_iterator, dim)
    592 @staticmethod
    593 def extract_forecast_by_dim(
    594     forecast_iterator: Iterator[Forecast], dim: int
    595 ) -> Iterator[Forecast]:
    596     for forecast in forecast_iterator:
--> 597         yield forecast.copy_dim(dim)

File /usr/local/lib/python3.8/dist-packages/gluonts/model/forecast.py:456, in Forecast.copy_dim(self, dim)
    447 def copy_dim(self, dim: int):
    448     """
    449     Returns a new Forecast object with only the selected sub-dimension.
    450 
   (...)
    454         The returned forecast object will only represent this dimension.
    455     """
--> 456     raise NotImplementedError()

NotImplementedError: 

@lostella
Copy link
Contributor

lostella commented Jan 4, 2023

Perhaps an assertion can be made in the get_metrics_per_ts function that inputs are univariate.

Thanks for the further inspection. I agree, that would be the fix here

@lostella lostella removed this from the v0.12 milestone Jan 20, 2023
@lostella lostella added the multivariate This concerns multivariate models label Mar 21, 2023
@lostella lostella removed their assignment Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working multivariate This concerns multivariate models
Projects
None yet
Development

No branches or pull requests

2 participants