Question on WindowsDataset / TimeSeriesLoader #256

david-waterworth · 2022-06-07T06:37:13Z

I'm keen to use nueralforecast for my own work. I'm interested in anomaly detection which is slightly different to forecasting - generally trying to reconstruct a window rather than forecasting the next n points but I think I can make it work.

I'm a little confused around the TimeSeriesLoader though, I've been working through my own example but I'll refer to the getting started notebook to make it easier.

The WindowsDataset is indexed by the number of unique_ids in the original dataset.

train_dataset = nf.data.tsdataset.WindowsDataset(
    Y_df=Y_df_train, 
    X_df=X_df_train,
    f_cols=[f'ex_{i}' for i in range(1, 5)],
    input_size=input_size,
    output_size=output_size,
    mask_df=train_mask_df
)

So in this case there are 7 series, each containing 744 windows of length 144 i.e.

len(train_dataset) == 7
train_dataset[0]['Y'].shape == [774, 144]

Where I'm really confused is the TimeSeriesLoader, I assumed when you set the batch_size=32 then I expect 7 * 774 // 32 batches of data per epoch, each having shape [32,144] but no matter what parameters I try, I only seem to get a single batch that's either of size batch_size or n_windows, i.e.

train_loader = nf.data.tsloader.TimeSeriesLoader(train_dataset, batch_size=32, eq_batch_size=True, shuffle=True)
for batch in train_loader:
    print(batch['Y'].shape)

torch.Size([32, 144])

It only returns a single batch?

I'm expecting something like the following

import torch
from torch.utils.data import DataLoader, TensorDataset

# dummy tensor with fiorst dim being number of series x number of windows per series
train_dataset = TensorDataset(torch.zeros(size=(774*7,144)))
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

for batch in train_loader:
    print(batch[0].shape)

torch.Size([32, 144])
torch.Size([32, 144])
torch.Size([32, 144])
...
torch.Size([32, 144])

Is there something wrong here, I want to train a model that has multiple long (1 year) series with 15T frequency and 1 day windows. I'm confused as to why the DataLoader only returns one batch per epoch?

The text was updated successfully, but these errors were encountered:

david-waterworth · 2022-06-07T06:54:40Z

I've also noted that if I change the batch_size to 1 and use n_windows=32 I get what I assume is one window per time-series per epoch, i.e.

train_loader = DataLoader(train_dataset, batch_size=1, n_windows=32, shuffle=True)
for batch in train_loader:
    print(batch[0].shape)

But I can only get either 1 or n_series batches per epoch - is it possible to sample many windows from multiple series on each epoch or is that not supported?

kdgutier · 2022-06-07T13:17:38Z

Hi @david-waterworth ,

Anomaly detection as you mentioned is more of an interpolation task than an extrapolation task. Forecasting models tend to be a good baseline but I recommend you to use a dedicated model.

In the team we have studied the anomaly detection task, we have this paper:
Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection.

The DGHL code is available with a link in the paper. We have not yet integrated in the repository.

Hope this helps.

david-waterworth · 2022-06-07T22:07:48Z

Thanks @kdgutier, I understand the difference - I've implemented a number of timeseries VAE models. I'm just looking for a good general pytorch based framework I can use for deep timeseries models rather than write datasets / dataloaders myself.

My question/confusion is that you don't appear to use all the windows in each epoch for training or for validation? The TimeSeriesLoader seems to sample a (potentially very small) subset of the windows from each series? This means that each epoch only trains the model on a very small subset of the data from what I can see - and then validates it on a subset of the validation data? Is this true?

It seems that if all the series are the same length, and you set batch_size=1 and n_windows to the number of windows per series then you get a batch per series that contains all the windows - but for long series that may not fit on the GPU?

In my modelling I represent the data as a tensor (series,windows,features) and then I reshape to (series x windows,features) in order to ensure that the dataloader returns all the windows every time.

kdgutier · 2022-06-08T00:07:21Z

Yes, the WindowsDataset is doing a 2 stage sampling strategy: first it samples a series, then it creates rolling windows and then samples the windows from it.

Answering the questions @david-waterworth:

We don't use all the windows during optimization because that breaks the gradient correlations and also the creation of the windows is done in the GPU, so in most cases it does not increase too much the computation time.
During validation we normally use the last window of horizon size before the test set, to have the cross validation signal for model selection, we use the entire validation dataset for this signal (not a subset). See the declaration of the valid_dataset/test_dataset here.
We use the setting batch_size=1 and n_windows_per_series=n_windows during inference of the models. Long series are not a problem for N-BEATS-like models because they already trim the data for limited lags only. In case of a full RNN it can face GPU memory limitations.

We are interested in extending NeuralForecast towards time series anomaly detection, we are interested in your past experience on the topic.

david-waterworth · 2022-06-08T00:30:41Z

Thanks, I'm interested in your first point "that breaks the gradient correlations" - can you expand on that - or point me to literature I can read?

I don't ahve a lot of experience yet, I've implemented a few different VAE's. The most obvious thing you'd need to change is I don't see a way of implementing multivariate anomaly detection with the current design. Univariate could be fairly easily accomodated by using y_df, with x_df=None, but technically it should probably be the other way around. But in places you rely on having a y_df and that's univariate at the moment.

kdgutier · 2022-06-08T00:36:04Z

The loss signals are correlated for windows of the same time series.
If the signals are correlated that reduces the variance of the SGD, same goes for the SGD batch size.

It is almost an open research question to explore the relationship between SGD's variance and the algorithms generalization; but a general belief is that more variance in the gradients leads to better optima.

Here is the "On Large batch training for deep learning" paper that won an award on the topic.

This has not been carefully studied in the neural forecasting domain.

david-waterworth · 2022-06-08T00:46:26Z

Thanks yes that makes sense, and I've noticed that performance of a VAE is very poor if you train with strided windows where the stride length is the same length as the window size but as you imply if you have a stride of one then each window has near 100% correlation with the previous and next so there is a lot of redundancy/duplication.

kdgutier closed this as completed Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on WindowsDataset / TimeSeriesLoader #256

Question on WindowsDataset / TimeSeriesLoader #256

david-waterworth commented Jun 7, 2022

david-waterworth commented Jun 7, 2022

kdgutier commented Jun 7, 2022

david-waterworth commented Jun 7, 2022

kdgutier commented Jun 8, 2022 •

edited

david-waterworth commented Jun 8, 2022 •

edited

kdgutier commented Jun 8, 2022 •

edited

david-waterworth commented Jun 8, 2022

Question on WindowsDataset / TimeSeriesLoader #256

Question on WindowsDataset / TimeSeriesLoader #256

Comments

david-waterworth commented Jun 7, 2022

david-waterworth commented Jun 7, 2022

kdgutier commented Jun 7, 2022

david-waterworth commented Jun 7, 2022

kdgutier commented Jun 8, 2022 • edited

david-waterworth commented Jun 8, 2022 • edited

kdgutier commented Jun 8, 2022 • edited

david-waterworth commented Jun 8, 2022

kdgutier commented Jun 8, 2022 •

edited

david-waterworth commented Jun 8, 2022 •

edited

kdgutier commented Jun 8, 2022 •

edited