-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQ-CNN: Bound context_length by the max_ts_len - prediction_length #1037
Conversation
…unnecessary zero padding
@@ -196,7 +170,7 @@ def batchify( | |||
dtype: DType, | |||
multi_processing: bool, | |||
single_process_ctx: Optional[mx.Context] = None, | |||
variable_length: bool = False, | |||
variable_length: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change of default intentional? Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the left padding of the ragged tensors in MQ-CNN
to be done by the data loader, which I though made more sense. It looks variable_length
is only used on line 128
:
if variable_length and not _is_stackable(data):
data = pad_arrays(data, axis=0)
So updating the default to True, will only result in calling _is_stackable()
, which doesn't seem computationally expensive because it just looks at the shapes and then for all algorithms would left pad ragged tensors if they have mismatching shapes from time series of different lengths and if they do all have the shape shape not _is_stackable(data)
will return as False
and and then pad_arrays()
won't be called. Is there a way I can pass it as True
to batchify
for MQ-CNN
without updating the default?
I also wanted to check with you on the prior pad_arrays
implementation. I combined it with the one from MQ-CNN and moved to util.py
, so other models could use the padding, but it seems like in the data loader before it was right padding with zeros instead of left padding and I updated it to left padding. I wasn't sure on why right padding was the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the left padding of the ragged tensors in MQ-CNN to be done by the data loader, which I though made more sense.
As opposed to the instance splitter, you mean?
I think the padding helper function (as well as the variable_length
option) was out there for the specific use case of temporal point processes, so this change may significantly affect those. I would ask @canerturkmen or @shchur to chip in here.
I think we should avoid relying on the data loader, as much as possible, for model-specific behavior. TPPs constitute a necessary exception here, I think, because of the intrinsic nature of the data they deal with; but in the case of MQCNN the time length is pre-determined, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I removed the left padding from the instance splitter. Yes, makes sense that the ragged tensors were for TPP. We should confirm with @canerturkmen that the desired behavior for TPP was really right padding instead of left padding with zeros? I think still keep the padding function in the util
file so that it can be used in both the loader and MQ-CNN instance splitter, rather than duplicate code.
Yes, I can move the left padding back to the MQ-CNN instance splitter. It's simple for the past
target and dynamic features, we simply past to enc_len = context_length
in the encoder. I was experimenting with the forking_decoder, where we compute the sliding windows with as_strided
. It seems that as_strided
is efficient but accessing the 3D array future_feature_dynamic[skip:]
can be slow, so I was testing if padding may be faster, but I think updating the zero array in place is faster. I will move this back to the instance splitter. Thanks!
… in instance splitter, updated pad_to_size util function to support right and left padding
…wslabs#1037) * Bound context_length by the max_ts_len - prediction_length to remove unnecessary zero padding * Fixing test_accuracy * Reverted variable_length=False in data_loader and did update in place in instance splitter, updated pad_to_size util function to support right and left padding * Updating the comments for right padding * Revert back to from_hyperparameters in the tests * Reverting data loader changes Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Issue #, if available:
MQ-CNN
start_idx
being updated in the loop in the decoder and not reset in the encoderDescription of changes:
context_length <= max_ts_len - prediction_length = max_pad_len
to avoid unnecessary zero paddingcontext_length = 4 * prediction_length
by default which may be longer thanmax_ts_len - prediction_length
and then we are doing unnecessary left zero padding in the encoder and decoder, where the most time is spent in the forking decodernum_forking
for decoder is bounded above bycontext_length
max_ts_len
incalculate_dataset_statistics
method and is returned infrom_inputs
as_strided()
directly, where the outer dimension isnum_forking - skip
num_forking
hp toMQRNNEstimator
as wellpad_to_size
to theutil
file - can be combined it withpad_arrays
from the parallelized data loadercontext_length
hyper-parameter from theforking_network
since the past arrays should already haveaxis=1
dim equal toenc_len = context_length
and slicing tocontext_length
is unnecessarynum_forking
but thenbatch_transform
is called on new container to removenum_forking
validation error from theforking_network
withbatch_transform
, by making it anoptional
Hyperparameter inside theforking_network
and setting it to the defaultcontext_length
prediction_length > max_ts_len
, we leavecontext_length
equal to its inputted value or the default4 * prediction_length
, so that it will not be negative frommax_pad_len = max_ts_len - prediction_length
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.