-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use categoical and dynamc features by default in DeepAR #144
Comments
I think in case of GluonTS, we aspire to make a scientific library. Thus, I think the algorithms should fail if there are issues in the data. That informs the user that something is not right. Otherwise, you are left wondering why your results are not as good as you are expecting, especially if something is silently not used/discarded/filtered. I think this behavior should be avoided throughout the code base. |
I have started looking at this issue the last two days and it is a combination of addressing the input format question and defining the correct transformations behaviour. I agree with Michael that we should not do things silently and if something is wrong we should throw an error instead of trying to filter it internally. However, this opens more questions:
For the |
I think the ideal solution would be to drive what is being used from the data (and therefore expected in the data) using schema-like structures like the following:
This could be used among other things to configure the transformation chain: the keys in such dictionary will tell you what fields are expected to be in the data. Using this schema-like dictionary,
Constructing such a schema from the training data would require a full pass through the dataset, not only looking at which fields are there, but also looking for the maximum of all categorical features (to get the cardinality of their domain). But this doesn't seem too bad to me. There are some structures in the codebase that aim at something similar I think (cfr. |
@lostella Can you please confirm if you were able to complete the POC? |
Currently DeepAR does not use categorical and dynamic features by default even though they are present in the dataset. The flags
use_feat_dynamic_real
anduse_feat_static_cat
are set toFalse
by default inDeepAREstimator
. This is very bad for us in terms of results since people usually run methods with their default options or miss setting these flags explicitly.There were a couple of arguments against not using them by default but there are better remedies for them than not silently running DeepAR with incorrect options and returning less accurate results:
Issue with not setting
cardinality
and still usingfeat_static_cat
: we can makecardinality
argument compulsory (or ideally derive this from data)If all time series do not have same features: there is no harm in failing with suitable error when the data is not consistent.
Which case is important/priority for us right now: Running smoothly even if the data is not properly formatted or running with correct options that succeeds only when the data is consistent?
The text was updated successfully, but these errors were encountered: