Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] extending reducers to hierarchical data #2396

Merged
merged 19 commits into from Apr 10, 2022

Conversation

danbartl
Copy link
Collaborator

@danbartl danbartl commented Apr 5, 2022

This is the final step to enable global forecasting via an efficient application of make_reduction using the new argument transformers.

I will do some refactoring, introduce checks etc., but already posted to discuss implementation strategy, see below.

Example use case

# -*- coding: utf-8 -*-
"""Test extraction of features across (shifted) windows."""
__author__ = ["danbartl"]

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_airline
from sktime.datatypes import get_examples
from sktime.forecasting.compose import make_reduction
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.transformations.series.summarize import WindowSummarizer

# Load data that will be the basis of tests
y = load_airline()
y_pd = get_examples(mtype="pd.DataFrame", as_scitype="Series")[0]
y_series = get_examples(mtype="pd.Series", as_scitype="Series")[0]
y_multi = get_examples(mtype="pd-multiindex", as_scitype="Panel")[0]
# y Train will be univariate data set
y_train, y_test = temporal_train_test_split(y)

# Create Panel sample data
mi = pd.MultiIndex.from_product([[0], y.index], names=["instances", "timepoints"])
y_group1 = pd.DataFrame(y.values, index=mi, columns=["y"])

mi = pd.MultiIndex.from_product([[1], y.index], names=["instances", "timepoints"])
y_group2 = pd.DataFrame(y.values, index=mi, columns=["y"])

y_grouped = pd.concat([y_group1, y_group2])

# Get different WindowSummarizer functions
kwargs = WindowSummarizer.get_test_params()[0]
kwargs_alternames = WindowSummarizer.get_test_params()[1]
kwargs_variant = WindowSummarizer.get_test_params()[2]

regressor = make_pipeline(
    RandomForestRegressor(),
)

forecaster = make_reduction(
    regressor,
    scitype="tabular-regressor",
    transformers=[WindowSummarizer(**kwargs)],
    window_length=10,
)

forecaster.fit(y_grouped, fh=1)

y_pred = forecaster.predict(fh=1)

Open questions:

What kind of arguments do we want to use in transformers? Currently I can only think about WindowSummarizer, but we could of course apply any kind of function where we want a grouped application. Currently only the first transformer in the list is applied (I will extend this after we resolved the open implementation question)

What is currently best practice to apply an Imputer across the different y time series grouped together in pd-multiindex? Should that also be covered here?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 5, 2022

notebook is failing - related to the bug in get_time_index fixed here?
#2380

@danbartl danbartl requested a review from mloning as a code owner April 9, 2022 15:56
@fkiraly fkiraly changed the title ReducerCheck [ENH] extending reducers to hierarchical data Apr 10, 2022
@fkiraly fkiraly merged commit 999810f into sktime:main Apr 10, 2022
srggrs added a commit to Gridsight/sktime that referenced this pull request Apr 11, 2022
* upstream/main: (34 commits)
  Update codecov uploader from deprecated version and cosmetic improvements of CI scripts. (sktime#2389)
  bump version (sktime#2445)
  Fix typo in PULL_REQUEST_TEMPLATE.md (sktime#2446)
  [BUG] Incorrect indices returned by make_reduction on hierarchical data fixed (sktime#2438)
  [BUG] fix erroneous direct passthrough in ColumnEnsembleForecaster (sktime#2436)
  [BUG] forecasting pipeline dunder fix (sktime#2431)
  [BUG] temp workaround for unnamed levels in hierarchical X passed to aggregator (sktime#2432)
  Release v0.11.1 (sktime#2428)
  [ENH] extending reducers to hierarchical data, add transform-on-y functionality (sktime#2396)
  [ENH] interface to statsmodels SARIMAX interface (sktime#2400)
  [BUG] fixed fitting logic for postprocessing in `TransformedTargetForecaster` (sktime#2426)
  [BUG] `TransformedTargetForecaster` inverses were not working for univariate transformers and more than one quantile (sktime#2425)
  [BUG] fixing proba predict methods of forecasting tuning estimators (sktime#2423)
  [ENH] suppressing deprecation messages in `all_estimators` estimator retrieval, address `dtw` import message (sktime#2418)
  [BUG] fixed `score_average` parameter of proba metrics, docstrings (sktime#2401)
  [BUG] Sets "can handle missing value" tag in ARIMA and AutoARIMA (sktime#2420)
  [ENH] tests for `check_estimator` tests passing (sktime#2408)
  [ENH] post-processing in `TransformedTargetForecaster`, dunder method for (transformed `y`) forecasting pipelines (sktime#2404)
  [ENH] extend `_HeterogeneousMetaEstimator` estimator to allow mixed tuple/estimator list (sktime#2406)
  [BUG] fixed get_time_index for most mtypes (sktime#2380)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants