-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix off-by-one in torch DeepAR #2618
Conversation
time_feat = torch.cat( | ||
( | ||
past_time_feat[..., -self.context_length + 1 :, :], | ||
future_time_feat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a need to append this when there is no future_target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, future_time_feat
is not an Optional
argument anymore (it never should have been, probably). The assumption is that at least one "future output" will be produced, for which the corresponding dynamic features are needed. In fact, it is assumed that there's one less future_target
than future_time_feat
, if any, see lines 211-215 above. Let me improve this a bit
BTW @lostella do you think the same issue also affects the transformer model? https://github.com/awslabs/gluonts/blob/dev/src/gluonts/mx/model/transformer/_network.py#L172-L178 |
d37e363
to
6cdcbfe
Compare
6cdcbfe
to
d2afdba
Compare
It's hard to say. The easiest way to verify this, I think, is to set up a small unit test what the input to whatever layer will be: in this case the tensor is split into encoder + decoder input, so one could have a method prepare those two directly and then test that lagged target values, dynamic features, and what not, are aligned as expected. |
.unsqueeze(0) | ||
.unsqueeze(-1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer reshape(1, history_length, 1)
instead of unsqueeze
. Makes it more clear what the output shapes is in my view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even .view()
53bedf4
to
b110bb2
Compare
test/torch/model/test_estimators.py
Outdated
# lambda dataset: DeepNPTSEstimator( | ||
# freq=dataset.metadata.freq, | ||
# prediction_length=dataset.metadata.prediction_length, | ||
# context_length=2 * dataset.metadata.prediction_length, | ||
# batch_size=4, | ||
# num_batches_per_epoch=3, | ||
# epochs=2, | ||
# ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this test started failing again with
E RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
../.pyenv/versions/3.8.13/envs/gluonts/lib/python3.8/site-packages/torch/autograd/__init__.py:197: RuntimeError
I think we need to understand this better and fix it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of error kept #2496 on hold for a while, then it disappeared on its own. But something must be off.
058a9eb
to
550af57
Compare
550af57
to
97f70ba
Compare
@@ -132,6 +132,7 @@ def __init__( | |||
else [min(50, (cat + 1) // 2) for cat in cardinality] | |||
) | |||
self.lags_seq = lags_seq or get_lags_for_frequency(freq_str=freq) | |||
self.lags_seq = [l - 1 for l in self.lags_seq] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm completely not getting the issue here, but if we do l-1
for the lags, don't we feed in the current time stamp during training when l=1
thereby introducing a data leak? Maybe I'm wrong, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what time index the lag is applied: for how the RNN input preparation is written, the lag index gets subtracted from t
, when we have data up to time t
and want a prediction for time t+1
. The test_rnn_input
test function aims at verifying precisely this. Let me try to see whether the test can be made more apparent.
61a9c19
to
d6e7b11
Compare
@lostella one issue i found with the # prior_input.shape torch.Size([1, 167, 137])
# input.shape torch.Size([1, 24, 137])
lags = lagged_sequence_values([0, 167], prior_input, input, dim=1)
# lags.shape torch.Size([1, 24, 137, 2]) where i would have expected |
Issue #, if available: Fixes #2616
Description of changes: The RNN in torch DeepAR was getting dynamic features shifted by one as input.
prepare_rnn_input
out ofunroll_lagged_rnn
Todo:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup