Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Reducer prototype rework - experimental #2833

Merged
merged 25 commits into from Aug 10, 2022
Merged

[ENH] Reducer prototype rework - experimental #2833

merged 25 commits into from Aug 10, 2022

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Jun 19, 2022

Prototype rework of the direct reducer, to showcase how a larger refactor could work and solicit feedback.

Reworks the direct reducer in order to:

  • allow concurrent use of exogeneous data X, up to and including the setting where only X at the same time is used to predict y
  • support pandas input to sklearn estimators and associated features, e.g., inspecting fitted model parameters with feature names
  • factor out the complexity of lagging and windowing to the transformer Lag, and of imputing to the transformer Imputer. The code logic specific to the redicuer
  • support for time stamp-like lags, through Lag

Depends on fix #2832 since ForecastingHorizon yields numpy integers which apparently break the Lag transformer.

FYI @GrzegorzRut, @danbartl

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting refactor Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature. enhancement Adding new functionality labels Jun 19, 2022
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2022

FYI @KishManani

@KishManani
Copy link
Contributor

Correct me if I'm wrong, but, with this implementation you specify whether you want you want all exogenous features to be fitted concurrently (i.e., we know the values of the features in the future) or shifted (i.e., we might use this in most cases when we don't know the value of the features in the future). In practice, we'll have a mix of these types of features. I think this is one reason why the Darts API distinguishes between, in the language of Darts, future covariates (when you know the future value of features) and past covariates (where you only know the past value of features and hence tend to use lagged versions of them).

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 20, 2022

Correct me if I'm wrong, but, with this implementation you specify whether you want you want all exogenous features to be fitted concurrently

Yes, @KishManani - all the exogeneous features that reach this estimator.

You can deal by that by composition. I.e., you use the concurrent reducer, but before that you apply a ColumnTransformer with a suitable Lag applied to the "past features" only. Would that not work? For example:

pipe = ForecastingPipeline(
    ColumnTransformer(
        ("past_vars", Lag(42), past_vars),
        ("future_vars", "passthrough", future_vars),
    ),
    DirectReducer(foo),
)

pipe.fit(y=y, X=X, fh=42)

Or, would you want a mix, i.e., past features being "shifted", "future" being concurrent? For a single element in the forecasting horizon, you can still get that.

Alternatively, we could add arguments that specify "concurrent" and "shifted" features?

In practice, we'll have a mix of these types of features. I think this is one reason why the Darts API distinguishes between, in the language of Darts, future covariates (when you know the future value of features) and past covariates (where you only know the past value of features and hence tend to use lagged versions of them).

I feel there is a problem with distinguishing the "past" and "future" covariates in the API, namely that your composition options and syntax proliferate, i.e., when you build pipelines.

Do you want to apply something to "past" or "future" or "target? How do you specify cleanly in a pipeline if you want to apply a transformer to one and move it across to the other? It is already difficult to cover the case where you separate into "endogeneous" and "exogeneous" by separating arguments.

Example, compare these two cases:

  • you want to lag the "past" variables and now consider them "future"
  • you want to lag the "past" variables but still consider them "past"

My current conceptualization is, from the data you already know which features are "past" or "future" - the ones which are observed only in the "past" are, well, "past". You do not need to specify this in addition, or even manually separate your data.

If you want to deal with these separately (and often you do), you can use ColumnTransformer.

Would that not always work?

Or do you have examples where it is more convenient, in composition, to deal with them separately interface-wise?

@fkiraly fkiraly marked this pull request as ready for review August 9, 2022 16:00
@fkiraly fkiraly requested a review from aiwalter as a code owner August 9, 2022 16:00
@fkiraly fkiraly changed the title DRAFT: [ENH] Reducer prototype rework [ENH] Reducer prototype rework - experimental Aug 9, 2022
@fkiraly fkiraly merged commit 188dac8 into main Aug 10, 2022
@fkiraly fkiraly deleted the reduce_rework branch August 10, 2022 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting refactor Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants