Fix off-by-one in torch DeepAR #2618

lostella · 2023-02-06T21:03:51Z

Issue #, if available: Fixes #2616

Description of changes: The RNN in torch DeepAR was getting dynamic features shifted by one as input.

Split prepare_rnn_input out of unroll_lagged_rnn
Add test to verify alignment of RNN inputs (target values vs dynamic features, test is failing prior to fix)
Change is breaking in that previously serialized models will not work as before (network changes)

Todo:

Take the opportunity to improve the code
Test effect of bug/fix (e.g. simple indicator that zeroes-out a series, model should not get it right prior to fix)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

kashif · 2023-02-07T12:08:51Z

src/gluonts/torch/model/deepar/module.py

+        time_feat = torch.cat(
+            (
+                past_time_feat[..., -self.context_length + 1 :, :],
+                future_time_feat,


is there a need to append this when there is no future_target

Yes, future_time_feat is not an Optional argument anymore (it never should have been, probably). The assumption is that at least one "future output" will be produced, for which the corresponding dynamic features are needed. In fact, it is assumed that there's one less future_target than future_time_feat, if any, see lines 211-215 above. Let me improve this a bit

src/gluonts/torch/model/deepar/module.py

kashif · 2023-02-08T19:56:02Z

BTW @lostella do you think the same issue also affects the transformer model? https://github.com/awslabs/gluonts/blob/dev/src/gluonts/mx/model/transformer/_network.py#L172-L178

lostella · 2023-02-16T08:17:36Z

BTW @lostella do you think the same issue also affects the transformer model? https://github.com/awslabs/gluonts/blob/dev/src/gluonts/mx/model/transformer/_network.py#L172-L178

It's hard to say. The easiest way to verify this, I think, is to set up a small unit test what the input to whatever layer will be: in this case the tensor is split into encoder + decoder input, so one could have a method prepare those two directly and then test that lagged target values, dynamic features, and what not, are aligned as expected.

jaheba · 2023-02-16T09:57:48Z

test/torch/model/test_deepar_modules.py

+        .unsqueeze(0)
+        .unsqueeze(-1),


I think I prefer reshape(1, history_length, 1) instead of unsqueeze. Makes it more clear what the output shapes is in my view.

or even .view()

lostella · 2023-02-24T10:28:37Z

test/torch/model/test_estimators.py

+        # lambda dataset: DeepNPTSEstimator(
+        #     freq=dataset.metadata.freq,
+        #     prediction_length=dataset.metadata.prediction_length,
+        #     context_length=2 * dataset.metadata.prediction_length,
+        #     batch_size=4,
+        #     num_batches_per_epoch=3,
+        #     epochs=2,
+        # ),


For some reason this test started failing again with

E RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ../.pyenv/versions/3.8.13/envs/gluonts/lib/python3.8/site-packages/torch/autograd/__init__.py:197: RuntimeError

I think we need to understand this better and fix it separately.

This kind of error kept #2496 on hold for a while, then it disappeared on its own. But something must be off.

mbohlkeschneider · 2023-02-28T16:42:23Z

src/gluonts/torch/model/deepar/module.py

@@ -132,6 +132,7 @@ def __init__(
            else [min(50, (cat + 1) // 2) for cat in cardinality]
        )
        self.lags_seq = lags_seq or get_lags_for_frequency(freq_str=freq)
+        self.lags_seq = [l - 1 for l in self.lags_seq]


Maybe I'm completely not getting the issue here, but if we do l-1 for the lags, don't we feed in the current time stamp during training when l=1 thereby introducing a data leak? Maybe I'm wrong, though.

It depends on what time index the lag is applied: for how the RNN input preparation is written, the lag index gets subtracted from t, when we have data up to time t and want a prediction for time t+1. The test_rnn_input test function aims at verifying precisely this. Let me try to see whether the test can be made more apparent.

kashif · 2023-03-06T10:04:56Z

@lostella one issue i found with the lagged_sequence_values helper is that with multivariate inputs this helper doesn't return the stacked tensors... e.g.

# prior_input.shape torch.Size([1, 167, 137])
# input.shape torch.Size([1, 24, 137])

lags = lagged_sequence_values([0, 167], prior_input, input, dim=1)
# lags.shape torch.Size([1, 24, 137, 2])

where i would have expected torch.Size([1, 24, 137*2])

lostella added BREAKING This is a breaking change (one of pr required labels) bug fix (one of pr required labels) torch This concerns the PyTorch side of GluonTS labels Feb 6, 2023

kashif reviewed Feb 7, 2023

View reviewed changes

src/gluonts/torch/model/deepar/module.py Outdated Show resolved Hide resolved

lostella force-pushed the fix-deepar-off-by-one branch from d37e363 to 6cdcbfe Compare February 15, 2023 12:44

lostella marked this pull request as ready for review February 15, 2023 12:50

lostella force-pushed the fix-deepar-off-by-one branch from 6cdcbfe to d2afdba Compare February 15, 2023 15:35

jaheba reviewed Feb 16, 2023

View reviewed changes

lostella force-pushed the fix-deepar-off-by-one branch 5 times, most recently from 53bedf4 to b110bb2 Compare February 24, 2023 10:27

lostella commented Feb 24, 2023

View reviewed changes

lostella force-pushed the fix-deepar-off-by-one branch 2 times, most recently from 058a9eb to 550af57 Compare February 27, 2023 14:29

lostella added this to the v0.13 milestone Feb 27, 2023

lostella force-pushed the fix-deepar-off-by-one branch from 550af57 to 97f70ba Compare February 28, 2023 12:02

mbohlkeschneider reviewed Feb 28, 2023

View reviewed changes

Lorenzo Stella and others added 4 commits March 6, 2023 09:42

add test and fix bug

0069810

fixes after rebase

57a74f1

re-enable test case

37cde48

add comment

d6e7b11

lostella force-pushed the fix-deepar-off-by-one branch from 61a9c19 to d6e7b11 Compare March 6, 2023 08:42

lostella mentioned this pull request Mar 6, 2023

lagged_sequence_values helper does not work well with multivariate data #2708

Open

jaheba approved these changes Mar 9, 2023

View reviewed changes

Merge branch 'dev' into fix-deepar-off-by-one

52fd497

lostella enabled auto-merge (squash) March 9, 2023 14:15

Merge branch 'dev' into fix-deepar-off-by-one

fc7e592

lostella disabled auto-merge March 9, 2023 14:22

lostella merged commit cd3ddae into awslabs:dev Mar 9, 2023

lostella deleted the fix-deepar-off-by-one branch March 9, 2023 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix off-by-one in torch DeepAR #2618

Fix off-by-one in torch DeepAR #2618

lostella commented Feb 6, 2023 •

edited

kashif Feb 7, 2023

lostella Feb 7, 2023 •

edited

kashif commented Feb 8, 2023 •

edited

lostella commented Feb 16, 2023

jaheba Feb 16, 2023

kashif Feb 16, 2023

lostella Feb 24, 2023

lostella Feb 24, 2023

mbohlkeschneider Feb 28, 2023

lostella Feb 28, 2023

kashif commented Mar 6, 2023 •

edited

Fix off-by-one in torch DeepAR #2618

Fix off-by-one in torch DeepAR #2618

Conversation

lostella commented Feb 6, 2023 • edited

kashif Feb 7, 2023

Choose a reason for hiding this comment

lostella Feb 7, 2023 • edited

Choose a reason for hiding this comment

kashif commented Feb 8, 2023 • edited

lostella commented Feb 16, 2023

jaheba Feb 16, 2023

Choose a reason for hiding this comment

kashif Feb 16, 2023

Choose a reason for hiding this comment

lostella Feb 24, 2023

Choose a reason for hiding this comment

lostella Feb 24, 2023

Choose a reason for hiding this comment

mbohlkeschneider Feb 28, 2023

Choose a reason for hiding this comment

lostella Feb 28, 2023

Choose a reason for hiding this comment

kashif commented Mar 6, 2023 • edited

lostella commented Feb 6, 2023 •

edited

lostella Feb 7, 2023 •

edited

kashif commented Feb 8, 2023 •

edited

kashif commented Mar 6, 2023 •

edited