MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

dcmaddix · 2020-09-16T03:56:48Z

Issue #, if available:

Removes unnecessary additional padding with zeros in MQ-CNN
Fixed bug with start_idx being updated in the loop in the decoder and not reset in the encoder

Description of changes:

Bounded context_length <= max_ts_len - prediction_length = max_pad_len to avoid unnecessary zero padding
Currently context_length = 4 * prediction_length by default which may be longer than max_ts_len - prediction_length and then we are doing unnecessary left zero padding in the encoder and decoder, where the most time is spent in the forking decoder
num_forking for decoder is bounded above by context_length
Computed max_ts_len in calculate_dataset_statistics method and is returned in from_inputs
Call as_strided() directly, where the outer dimension is num_forking - skip
Added num_forking hp to MQRNNEstimator as well
Updated shapes in comments
Moved pad_to_size to the util file - can be combined it with pad_arrays from the parallelized data loader
Removed context_length hyper-parameter from the forking_network since the past arrays should already have axis=1 dim equal to enc_len = context_length and slicing to context_length is unnecessary
Add backwards compatibility for batch transform when a model is trained using the old container without num_forking but then batch_transform is called on new container to remove num_forking validation error from the forking_network with batch_transform, by making it an optional Hyperparameter inside the forking_network and setting it to the default context_length
In case that prediction_length > max_ts_len, we leave context_length equal to its inputted value or the default 4 * prediction_length, so that it will not be negative from max_pad_len = max_ts_len - prediction_length

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…unnecessary zero padding

lostella · 2020-09-16T18:07:02Z

src/gluonts/dataset/parallelized_loader.py

@@ -196,7 +170,7 @@ def batchify(
    dtype: DType,
    multi_processing: bool,
    single_process_ctx: Optional[mx.Context] = None,
-    variable_length: bool = False,
+    variable_length: bool = True,


Is the change of default intentional? Why is that?

I changed the left padding of the ragged tensors in MQ-CNN to be done by the data loader, which I though made more sense. It looks variable_length is only used on line 128:

if variable_length and not _is_stackable(data): data = pad_arrays(data, axis=0)

So updating the default to True, will only result in calling _is_stackable(), which doesn't seem computationally expensive because it just looks at the shapes and then for all algorithms would left pad ragged tensors if they have mismatching shapes from time series of different lengths and if they do all have the shape shape not _is_stackable(data) will return as False and and then pad_arrays() won't be called. Is there a way I can pass it as True to batchify for MQ-CNN without updating the default?

I also wanted to check with you on the prior pad_arrays implementation. I combined it with the one from MQ-CNN and moved to util.py, so other models could use the padding, but it seems like in the data loader before it was right padding with zeros instead of left padding and I updated it to left padding. I wasn't sure on why right padding was the default?

I changed the left padding of the ragged tensors in MQ-CNN to be done by the data loader, which I though made more sense.

As opposed to the instance splitter, you mean?

I think the padding helper function (as well as the variable_length option) was out there for the specific use case of temporal point processes, so this change may significantly affect those. I would ask @canerturkmen or @shchur to chip in here.

I think we should avoid relying on the data loader, as much as possible, for model-specific behavior. TPPs constitute a necessary exception here, I think, because of the intrinsic nature of the data they deal with; but in the case of MQCNN the time length is pre-determined, right?

Yes, I removed the left padding from the instance splitter. Yes, makes sense that the ragged tensors were for TPP. We should confirm with @canerturkmen that the desired behavior for TPP was really right padding instead of left padding with zeros? I think still keep the padding function in the util file so that it can be used in both the loader and MQ-CNN instance splitter, rather than duplicate code.

Yes, I can move the left padding back to the MQ-CNN instance splitter. It's simple for the past target and dynamic features, we simply past to enc_len = context_length in the encoder. I was experimenting with the forking_decoder, where we compute the sliding windows with as_strided. It seems that as_strided is efficient but accessing the 3D array future_feature_dynamic[skip:] can be slow, so I was testing if padding may be faster, but I think updating the zero array in place is faster. I will move this back to the instance splitter. Thanks!

… in instance splitter, updated pad_to_size util function to support right and left padding

…wslabs#1037) * Bound context_length by the max_ts_len - prediction_length to remove unnecessary zero padding * Fixing test_accuracy * Reverted variable_length=False in data_loader and did update in place in instance splitter, updated pad_to_size util function to support right and left padding * Updating the comments for right padding * Revert back to from_hyperparameters in the tests * Reverting data loader changes Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>

Bound context_length by the max_ts_len - prediction_length to remove …

9b286e3

…unnecessary zero padding

dcmaddix requested review from lovvge and lostella September 16, 2020 04:06

Fixing test_accuracy

57a6fac

lostella reviewed Sep 16, 2020

View reviewed changes

Danielle Robinson and others added 6 commits September 16, 2020 20:40

Reverted variable_length=False in data_loader and did update in place…

0172372

… in instance splitter, updated pad_to_size util function to support right and left padding

Updating the comments for right padding

9083ae0

Revert back to from_hyperparameters in the tests

1f6c59e

Merge branch 'master' into mqcnn_v2

0a8455f

Reverting data loader changes

fe307fa

Merge branch 'mqcnn_v2' of github.com:dcmaddix/gluon-ts into mqcnn_v2

882c26a

lostella approved these changes Sep 17, 2020

View reviewed changes

dcmaddix merged commit c3150eb into awslabs:master Sep 17, 2020

dcmaddix deleted the mqcnn_v2 branch September 17, 2020 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

dcmaddix commented Sep 16, 2020 •

edited

Loading

lostella Sep 16, 2020

dcmaddix Sep 16, 2020

lostella Sep 16, 2020

dcmaddix Sep 16, 2020

MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

Conversation

dcmaddix commented Sep 16, 2020 • edited Loading

lostella Sep 16, 2020

Choose a reason for hiding this comment

dcmaddix Sep 16, 2020

Choose a reason for hiding this comment

lostella Sep 16, 2020

Choose a reason for hiding this comment

dcmaddix Sep 16, 2020

Choose a reason for hiding this comment

dcmaddix commented Sep 16, 2020 •

edited

Loading