Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate time series forecasting question #652

Closed
CMobley7 opened this issue Feb 19, 2020 · 17 comments
Closed

Multivariate time series forecasting question #652

CMobley7 opened this issue Feb 19, 2020 · 17 comments
Labels
question Further information is requested

Comments

@CMobley7
Copy link

My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.

Due to the fact gluon-ts is making use of DL based approaches, dealing with non-stationarity in training datasets is not necessary, unlike when using AR/MA and VAR based models, correct? This appears to be outlined here.

Also, I am working with a multivariate time series dataset in which the target/dependent variable is related and/or dependent on other features/independent variables. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.

First, since the relationship between the target variable and other features is important, are the most applicable models deepvar and gpvar or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?

Second, if I’m using deepvar or gpvar, I’m assuming that when making the dataset, the target should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting, target should be a vector of the target variable and I should store the other features as vectors of vectors in either dynamic_feat or cat, right?

Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.

@CMobley7 CMobley7 added the question Further information is requested label Feb 19, 2020
@ehsanmok
Copy link
Contributor

DL based methods can handle non-stationary, multivariate time-series with missing values and categorical features. In multivariate case, the target is at least 2 dimensional, where one dim is the number of variates (number of time-series). Normally, when you use any of the provided *Estimators like DeepVAREstimator the requirements will be checked and the builtin transformations will create the required features automatically.

Note that in multivariate case, you can use MultivariateGrouper to group the target into 2 dim, like

from gluonts.dataset.artificial import constant_dataset
from gluonts.dataset.common import TrainDatasets
from gluonts.dataset.multivariate_grouper import MultivariateGrouper

def load_multivariate_constant_dataset():
    metadata, train_ds, test_ds = constant_dataset()
    grouper_train = MultivariateGrouper(max_target_dim=10)
    grouper_test = MultivariateGrouper(max_target_dim=10)
    return TrainDatasets(
        metadata=metadata,
        train=grouper_train(train_ds),
        test=grouper_test(test_ds),
    )

dataset = load_multivariate_constant_dataset()

@mbohlkeschneider
Copy link
Contributor

Hi @CMobley7,

as @ehsanmok wrote already, you can use the MultivariateGrouper to convert any univariate time series dataset into multivariate time series.

Which model is the right one for your task depends. If you know the values of your related time series in the future (because they are time series indicators of holidays or known promotion), using these as dynamic features in univariate models (like DeepAR) does a fine job.

If this is not the case, I would recommend using gpvar, as this is the multivariate time series model so far for which we have the most empirical evidence that it works well (see this paper).

Hope that heps.

@CMobley7
Copy link
Author

@ehsanmok and @mbohlkeschneider , thank you for your advice thus far, I really appreciate. Unfortunately, I’m still slightly confused regarding which model I should choose and consequently how to create the training and test sets.

I planned to recreate the following notebook, but instead of using straight gluon or keras, I'd used gluon-ts. The author creates a model to forecast pollution given previous pollution, as well as other factors like rain, wind speed and temperature. So, which models do you think best fits this type of data and what is the best method to take the dataframe in cell 8 and turn it into both a training and test set given the model chosen. In addition, while dealing with non-stationarity may not be a problem with the DL based approaches in gluon-ts, I’m assuming scaling the features still is. Are there methods inside gluon-ts to deal with this or should I just use scikit-learn or similar library to do this prior to creating the dataframe in cell 8? Thank you again in advance.

@CMobley7
Copy link
Author

@ehsanmok and @mbohlkeschneider, I've looked through gluon-ts's extended tutorial and understand how to make a traditional dataset, but I'm still not sure exactly how to create a dataset for gpvar. The MultivariateGrouper is only useful for converting univariate datasets to multivariate, right? After looking at gpvar, it seems like it won't use any feature beside target. It looks like I need to group the target and all features into the target field or am I mistaken and I should use the traditional dataset with the target in the target field and all features in their appropriate fields (feat_static_cat, feat_static_real, feat_dynamic_cat, feat_dynamic_real)?

@mbohlkeschneider
Copy link
Contributor

Hi @CMobley7 ,

you are correct this is what the MultivariateGrouper is doing. Essentially, multivariate time series should have target fields that look like this. Then, the data should be loadable with our standard loaders.

You are right that GPVar is not using additional features atm. Let me breakdown why:

feat_static_cat: In our paper, we addressed the use-case of having a single multivariate dataset with shape (time, dim). Thus, the concept of a feat_static_cat (which is a way to mark different time series) does not make sense because every time series is "the same".

feat_static_real: We have not looked into this in the paper, but this could be implemented.

feat_dynamic_cat: Currently, I think GluonTS provides the functionality to pass feat_dynamic_cat to models but no model is using this so far. Feel free to experiment and share your findings!

feat_dynamic_real: We have not looked into this in the paper. This could be quite challenging depending on how your data looks like. The two cases are:

  • Dynamic features are the same for all (marginal) time series: This is the case we have in for using our standard time features here. We don't really have the infrastructure from loading the data from files atm, I think. This case is straightforward.

  • Dynamic features are different for all (marginal) time series: This comes with a lot of practical issues: What values should features have if the time series are not same length (time series could be longer or shorter). Also, every feature introduced this way will add target_dim inputs to the model, so my gut feeling is that this blows up fairly quickly and becomes hard to train.

@jaschau
Copy link

jaschau commented Feb 26, 2020

I had the same issue that I had a dataset with feat_dynamic_real.
Although gpvar and deepvar ignore feat_dynamic_real in principle, my trainings initially still crashed. I figured out that the root cause for this was the fact that the TrainingDataLoader would try to batch the feat_dynamic_real which, however, were not cut to the approriate length by InstanceSplitter in the default transformation.
I fixed this by replacing the code in https://github.com/awslabs/gluon-ts/blob/master/src/gluonts/model/gpvar/_estimator.py#L253,

VstackFeatures(
    output_field=FieldName.FEAT_TIME,
    input_fields=[FieldName.FEAT_TIME],
)

by

VstackFeatures(
    output_field=FieldName.FEAT_TIME,
    input_fields=[FieldName.FEAT_TIME, FieldName.FEAT_DYNAMIC_REAL],
)

This works because VstackFeatures will by default drop the input_fields from the dataset.
Maybe this is of help.

@CMobley7
Copy link
Author

Thanks @mbohlkeschneider and @jaschau. Unfortunately, feature engineering is talking longer than I anticipated; so, it will probably be another week before I'm able to test gluon-ts with my dataset. I'll close this issue now since I believe all my question have been answered and post back later with results or potentially additional questions. Thanks again.

@vblagoje
Copy link

@mbohlkeschneider can any other model be used for multivariate series prediction or just gpvar?

@mbohlkeschneider
Copy link
Contributor

Technically, DeepAR and DeepVAR should work as well. However, GPVAR is the model I would recommend.

@pratikgehlott
Copy link

@mbohlkeschneider do you have an example notebook on how to make multivariate time series forecasting using gluon-ts?

@mbohlkeschneider
Copy link
Contributor

@Pratik325, I don't have a notebook, but this test does show the setup. Let me know if you have questions.

@pratikgehlott
Copy link

@mbohlkeschneider I need to know how to use this on custom datasets. It would be beneficial if you performed on the simple dataset and shared the notebook because none of the platforms have any good explanation of gluon-ts for multivariate. It would help many of the learners. Thank you.

@mbohlkeschneider
Copy link
Contributor

mbohlkeschneider commented Apr 27, 2021

Hi @Pratik325,

Basically, the data preparation is the same as for all other models. The only difference is that the target field becomes a 2D array. So instead target=[1,2,3,4,5] you would have target=[[1,2,3,4,5],[6,7,8,9,10]]. Does this help?

@pratikgehlott
Copy link

No sir, @mbohlkeschneider

@pratikgehlott
Copy link

Hi @mbohlkeschneider , can you please help me..?

@jaschau
Copy link

jaschau commented Apr 27, 2021

A complete example #382 can be found here. I am not sure it's entirely up to date but it sure demonstrates the basic setup.

@pratikgehlott
Copy link

@jaschau its outdated!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants