Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on WindowsDataset / TimeSeriesLoader #256

Closed
david-waterworth opened this issue Jun 7, 2022 · 7 comments
Closed

Question on WindowsDataset / TimeSeriesLoader #256

david-waterworth opened this issue Jun 7, 2022 · 7 comments

Comments

@david-waterworth
Copy link

I'm keen to use nueralforecast for my own work. I'm interested in anomaly detection which is slightly different to forecasting - generally trying to reconstruct a window rather than forecasting the next n points but I think I can make it work.

I'm a little confused around the TimeSeriesLoader though, I've been working through my own example but I'll refer to the getting started notebook to make it easier.

The WindowsDataset is indexed by the number of unique_ids in the original dataset.

train_dataset = nf.data.tsdataset.WindowsDataset(
    Y_df=Y_df_train, 
    X_df=X_df_train,
    f_cols=[f'ex_{i}' for i in range(1, 5)],
    input_size=input_size,
    output_size=output_size,
    mask_df=train_mask_df
)

So in this case there are 7 series, each containing 744 windows of length 144 i.e.

len(train_dataset) == 7
train_dataset[0]['Y'].shape == [774, 144]

Where I'm really confused is the TimeSeriesLoader, I assumed when you set the batch_size=32 then I expect 7 * 774 // 32 batches of data per epoch, each having shape [32,144] but no matter what parameters I try, I only seem to get a single batch that's either of size batch_size or n_windows, i.e.

train_loader = nf.data.tsloader.TimeSeriesLoader(train_dataset, batch_size=32, eq_batch_size=True, shuffle=True)
for batch in train_loader:
    print(batch['Y'].shape)

torch.Size([32, 144])

It only returns a single batch?

I'm expecting something like the following

import torch
from torch.utils.data import DataLoader, TensorDataset

# dummy tensor with fiorst dim being number of series x number of windows per series
train_dataset = TensorDataset(torch.zeros(size=(774*7,144)))
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

for batch in train_loader:
    print(batch[0].shape)

torch.Size([32, 144])
torch.Size([32, 144])
torch.Size([32, 144])
...
torch.Size([32, 144])

Is there something wrong here, I want to train a model that has multiple long (1 year) series with 15T frequency and 1 day windows. I'm confused as to why the DataLoader only returns one batch per epoch?

@david-waterworth
Copy link
Author

I've also noted that if I change the batch_size to 1 and use n_windows=32 I get what I assume is one window per time-series per epoch, i.e.

train_loader = DataLoader(train_dataset, batch_size=1, n_windows=32, shuffle=True)
for batch in train_loader:
    print(batch[0].shape)

But I can only get either 1 or n_series batches per epoch - is it possible to sample many windows from multiple series on each epoch or is that not supported?

@kdgutier
Copy link
Collaborator

kdgutier commented Jun 7, 2022

Hi @david-waterworth ,

Anomaly detection as you mentioned is more of an interpolation task than an extrapolation task. Forecasting models tend to be a good baseline but I recommend you to use a dedicated model.

In the team we have studied the anomaly detection task, we have this paper:
Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection.

The DGHL code is available with a link in the paper. We have not yet integrated in the repository.

Hope this helps.

@david-waterworth
Copy link
Author

Thanks @kdgutier, I understand the difference - I've implemented a number of timeseries VAE models. I'm just looking for a good general pytorch based framework I can use for deep timeseries models rather than write datasets / dataloaders myself.

My question/confusion is that you don't appear to use all the windows in each epoch for training or for validation? The TimeSeriesLoader seems to sample a (potentially very small) subset of the windows from each series? This means that each epoch only trains the model on a very small subset of the data from what I can see - and then validates it on a subset of the validation data? Is this true?

It seems that if all the series are the same length, and you set batch_size=1 and n_windows to the number of windows per series then you get a batch per series that contains all the windows - but for long series that may not fit on the GPU?

In my modelling I represent the data as a tensor (series,windows,features) and then I reshape to (series x windows,features) in order to ensure that the dataloader returns all the windows every time.

@kdgutier
Copy link
Collaborator

kdgutier commented Jun 8, 2022

Yes, the WindowsDataset is doing a 2 stage sampling strategy: first it samples a series, then it creates rolling windows and then samples the windows from it.

Answering the questions @david-waterworth:

  1. We don't use all the windows during optimization because that breaks the gradient correlations and also the creation of the windows is done in the GPU, so in most cases it does not increase too much the computation time.

  2. During validation we normally use the last window of horizon size before the test set, to have the cross validation signal for model selection, we use the entire validation dataset for this signal (not a subset). See the declaration of the valid_dataset/test_dataset here.

  3. We use the setting batch_size=1 and n_windows_per_series=n_windows during inference of the models. Long series are not a problem for N-BEATS-like models because they already trim the data for limited lags only. In case of a full RNN it can face GPU memory limitations.

We are interested in extending NeuralForecast towards time series anomaly detection, we are interested in your past experience on the topic.

@david-waterworth
Copy link
Author

david-waterworth commented Jun 8, 2022

Thanks, I'm interested in your first point "that breaks the gradient correlations" - can you expand on that - or point me to literature I can read?

I don't ahve a lot of experience yet, I've implemented a few different VAE's. The most obvious thing you'd need to change is I don't see a way of implementing multivariate anomaly detection with the current design. Univariate could be fairly easily accomodated by using y_df, with x_df=None, but technically it should probably be the other way around. But in places you rely on having a y_df and that's univariate at the moment.

@kdgutier
Copy link
Collaborator

kdgutier commented Jun 8, 2022

The loss signals are correlated for windows of the same time series.
If the signals are correlated that reduces the variance of the SGD, same goes for the SGD batch size.

It is almost an open research question to explore the relationship between SGD's variance and the algorithms generalization; but a general belief is that more variance in the gradients leads to better optima.

Here is the "On Large batch training for deep learning" paper that won an award on the topic.

This has not been carefully studied in the neural forecasting domain.

@david-waterworth
Copy link
Author

Thanks yes that makes sense, and I've noticed that performance of a VAE is very poor if you train with strided windows where the stride length is the same length as the window size but as you imply if you have a stride of one then each window has near 100% correlation with the previous and next so there is a lot of redundancy/duplication.

@kdgutier kdgutier closed this as completed Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants