New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Reducer prototype rework - experimental #2833
Conversation
FYI @KishManani |
Correct me if I'm wrong, but, with this implementation you specify whether you want you want all exogenous features to be fitted concurrently (i.e., we know the values of the features in the future) or shifted (i.e., we might use this in most cases when we don't know the value of the features in the future). In practice, we'll have a mix of these types of features. I think this is one reason why the Darts API distinguishes between, in the language of Darts, future covariates (when you know the future value of features) and past covariates (where you only know the past value of features and hence tend to use lagged versions of them). |
Yes, @KishManani - all the exogeneous features that reach this estimator. You can deal by that by composition. I.e., you use the concurrent reducer, but before that you apply a pipe = ForecastingPipeline(
ColumnTransformer(
("past_vars", Lag(42), past_vars),
("future_vars", "passthrough", future_vars),
),
DirectReducer(foo),
)
pipe.fit(y=y, X=X, fh=42) Or, would you want a mix, i.e., past features being "shifted", "future" being concurrent? For a single element in the forecasting horizon, you can still get that. Alternatively, we could add arguments that specify "concurrent" and "shifted" features?
I feel there is a problem with distinguishing the "past" and "future" covariates in the API, namely that your composition options and syntax proliferate, i.e., when you build pipelines. Do you want to apply something to "past" or "future" or "target? How do you specify cleanly in a pipeline if you want to apply a transformer to one and move it across to the other? It is already difficult to cover the case where you separate into "endogeneous" and "exogeneous" by separating arguments. Example, compare these two cases:
My current conceptualization is, from the data you already know which features are "past" or "future" - the ones which are observed only in the "past" are, well, "past". You do not need to specify this in addition, or even manually separate your data. If you want to deal with these separately (and often you do), you can use Would that not always work? Or do you have examples where it is more convenient, in composition, to deal with them separately interface-wise? |
Prototype rework of the direct reducer, to showcase how a larger refactor could work and solicit feedback.
Reworks the direct reducer in order to:
X
, up to and including the setting where onlyX
at the same time is used to predicty
pandas
input tosklearn
estimators and associated features, e.g., inspecting fitted model parameters with feature namesLag
, and of imputing to the transformerImputer
. The code logic specific to the redicuerLag
Depends on fix #2832 since
ForecastingHorizon
yields numpy integers which apparently break theLag
transformer.FYI @GrzegorzRut, @danbartl