Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037

Merged
merged 8 commits into from
Sep 17, 2020

Conversation

dcmaddix
Copy link
Contributor

@dcmaddix dcmaddix commented Sep 16, 2020

Issue #, if available:

  • Removes unnecessary additional padding with zeros in MQ-CNN
  • Fixed bug with start_idx being updated in the loop in the decoder and not reset in the encoder

Description of changes:

  • Bounded context_length <= max_ts_len - prediction_length = max_pad_len to avoid unnecessary zero padding
  • Currently context_length = 4 * prediction_length by default which may be longer than max_ts_len - prediction_length and then we are doing unnecessary left zero padding in the encoder and decoder, where the most time is spent in the forking decoder
  • num_forking for decoder is bounded above by context_length
  • Computed max_ts_len in calculate_dataset_statistics method and is returned in from_inputs
  • Call as_strided() directly, where the outer dimension is num_forking - skip
  • Added num_forking hp to MQRNNEstimator as well
  • Updated shapes in comments
  • Moved pad_to_size to the util file - can be combined it with pad_arrays from the parallelized data loader
  • Removed context_length hyper-parameter from the forking_network since the past arrays should already have axis=1 dim equal to enc_len = context_length and slicing to context_length is unnecessary
  • Add backwards compatibility for batch transform when a model is trained using the old container without num_forking but then batch_transform is called on new container to remove num_forking validation error from the forking_network with batch_transform, by making it an optional Hyperparameter inside the forking_network and setting it to the default context_length
  • In case that prediction_length > max_ts_len, we leave context_length equal to its inputted value or the default 4 * prediction_length, so that it will not be negative from max_pad_len = max_ts_len - prediction_length

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@@ -196,7 +170,7 @@ def batchify(
dtype: DType,
multi_processing: bool,
single_process_ctx: Optional[mx.Context] = None,
variable_length: bool = False,
variable_length: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change of default intentional? Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the left padding of the ragged tensors in MQ-CNN to be done by the data loader, which I though made more sense. It looks variable_length is only used on line 128:

if variable_length and not _is_stackable(data):
         data = pad_arrays(data, axis=0)

So updating the default to True, will only result in calling _is_stackable(), which doesn't seem computationally expensive because it just looks at the shapes and then for all algorithms would left pad ragged tensors if they have mismatching shapes from time series of different lengths and if they do all have the shape shape not _is_stackable(data) will return as False and and then pad_arrays() won't be called. Is there a way I can pass it as True to batchify for MQ-CNN without updating the default?

I also wanted to check with you on the prior pad_arrays implementation. I combined it with the one from MQ-CNN and moved to util.py, so other models could use the padding, but it seems like in the data loader before it was right padding with zeros instead of left padding and I updated it to left padding. I wasn't sure on why right padding was the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the left padding of the ragged tensors in MQ-CNN to be done by the data loader, which I though made more sense.

As opposed to the instance splitter, you mean?

I think the padding helper function (as well as the variable_length option) was out there for the specific use case of temporal point processes, so this change may significantly affect those. I would ask @canerturkmen or @shchur to chip in here.

I think we should avoid relying on the data loader, as much as possible, for model-specific behavior. TPPs constitute a necessary exception here, I think, because of the intrinsic nature of the data they deal with; but in the case of MQCNN the time length is pre-determined, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed the left padding from the instance splitter. Yes, makes sense that the ragged tensors were for TPP. We should confirm with @canerturkmen that the desired behavior for TPP was really right padding instead of left padding with zeros? I think still keep the padding function in the util file so that it can be used in both the loader and MQ-CNN instance splitter, rather than duplicate code.

Yes, I can move the left padding back to the MQ-CNN instance splitter. It's simple for the past target and dynamic features, we simply past to enc_len = context_length in the encoder. I was experimenting with the forking_decoder, where we compute the sliding windows with as_strided. It seems that as_strided is efficient but accessing the 3D array future_feature_dynamic[skip:] can be slow, so I was testing if padding may be faster, but I think updating the zero array in place is faster. I will move this back to the instance splitter. Thanks!

@dcmaddix dcmaddix merged commit c3150eb into awslabs:master Sep 17, 2020
@dcmaddix dcmaddix deleted the mqcnn_v2 branch September 17, 2020 17:23
kashif pushed a commit to kashif/gluon-ts that referenced this pull request Oct 10, 2020
…wslabs#1037)

* Bound context_length by the max_ts_len - prediction_length to remove unnecessary zero padding

* Fixing test_accuracy

* Reverted variable_length=False in data_loader and did update in place in instance splitter, updated pad_to_size util function to support right and left padding

* Updating the comments for right padding

* Revert back to from_hyperparameters in the tests

* Reverting data loader changes

Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants