Normalize M Competition Basemodels (Point Forecast) #53

AntonioGonzalezSuarez · 2023-10-24T12:17:29Z

The following list are basemodels used by M-Competition and I think It is a good idea to normalize how they are implemented to be able to automate the generation of this model.

Base Models

Regular series

Naive: Random walk model $$\hat{Y}_{n+i} = Y_n \;, i=1,2,...h$$
Seasonal Naive (sNaive): Like Naive, but taking into account possible seasonal component. Hence, uses the last known observation of the same period. (Weekly, Monthly, Annualy,...). For example if prediction frequency is daily use the same day of the week from last week.
Moving Averages (MA): Forecast are computed based on the average of the last $k$ points

$$\hat{Y}_t=\frac{\sum_i Y_{t-i}}{k}$$

Simple Exponential Smoothing (SES): Parameter a has to be decided. The parameter is selected from the range [0.1,0.3] by minimizing the insample mean squared error of the model.$

$$\hat{Y}_{t+1} = aY_t+a(1-a)y_{t-1}+a(1-a)^2Y_{t-2}+ ...$$

Intermitent series

Croston's method: The method decomposes the series into the non-zero demand size $z_t$ and the inter-demand intervals, $p_t$:

$$\hat{Y}_t = \frac{\hat{z}_t}{\hat{p}_t}$$

where $z_t$ and $p_t$ are predicted using SES. In this case both SES models use $a=0.1$. First observation of the components is used as initialization for the SES model

Optimized Croston's method (optCro): Same but smoothing parameter is selected from the range [0.1,0.3] like SES model.
Syntetos-Boylan Approximation (SBA): Introduces a constant to debias the model:

$$\hat{Y}_t =0.95 \frac{\hat{z}_t}{\hat{p}_t}$$

Teunter-Syntetos-Babai method (TSB): A modification to Croston's method that replaces the inter-demand intervals component with the demand probability, $d_t$, being $1$ if demmand occurs at time $t$ and $0$ otherwise. Similar to Croston's $d_t$ is forecasted using SES with independent smoothing parameter:

$$\hat{Y}_t =\hat{d}_t\hat{z}_t$$

Both series

AutoRegressive: All models of autoregressive style: ARIMA, ARIMAX, SARIMA, SARIMAX,...
Multi-Layer Perceptron (MLP): 14 input nodes (last 2 weeks of data), 28 hidden nodes, one output. Activations: Logistic and linear for output. There are 10 different MLPs models with different random initialization to minimize variations and ensemble using the median of all.
RandomForest (RF)

Example of implementation

Here is a snippet of code to be able to normalize with the Sklearn API (fit, predict,...)

from sklearn.base import BaseEstimator, RegressorMixin
import numpy as np

class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

if __name__ == "__main__":
    # Example usage:
    from mango.data import get_ts_dataset

    df = get_ts_dataset()
    df = df.set_index("date")
    df["exog"] = (df.index.dayofweek == 5).astype(int)

    X_train = df[-200:-40]
    X_test = df[-40:]
    # Create the moving average with a window size of 3
    kwargs = {"y": X_test["target"], "horizon": 1}
    ma = MovingAverage(window_size=5)
    ma = ma.fit(X_train.drop(columns=["target"]), y=X_train["target"])
    preds = ma.predict(X_test.drop(columns=["target"]), **kwargs)
    print(ma.score(X_test.drop(columns=["target"]), **kwargs))

TODO list of implementations

ggsdc · 2023-10-24T12:50:11Z

I have added a todo list to keep track of the implementations.

AntonioGonzalezSuarez · 2023-10-25T09:12:33Z

Code for:

Naive
sNaive
SES
Moving average

from scipy.optimize import minimize_scalar
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.metrics import mean_squared_error

metric_dict = {
    "mse": mean_squared_error,
    "rmse": lambda y_true, y_pred, sample_weight=None: mean_squared_error(
        y_true, y_pred, sample_weight=sample_weight, squared=False
    ),
}

class Naive(BaseEstimator, RegressorMixin):
    def __init__(self, horizon=1):
        self.horizon = horizon

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        return y.shift(self.horizon)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class ExponentialSmoothing(BaseEstimator, RegressorMixin):
    def __init__(self, alpha=None):
        self.alpha = alpha

    def fit(self, X=None, y=None, horizon=1):
        if self.alpha:
            return self
        # Optimize alpha to minimize the RMSE
        self.alpha = minimize_scalar(
            lambda alpha: self.score(X, y, alpha=alpha, horizon=horizon),
            bounds=(0, 1),
            method="bounded",
        ).x
        return self

    def predict(self, X=None, y=None, alpha=None, horizon=1):
        # Implement the exponential smoothing
        alpha = alpha or self.alpha
        if not alpha:
            alpha = 0.2
        return y.shift(horizon).ewm(alpha=alpha, adjust=False).mean()

    def score(
        self, X=None, y=None, metric=None, sample_weight=None, alpha=None, horizon=1
    ):
        preds = self.predict(X, y, alpha, horizon)
        if metric is None:
            metric =lambda y_true, y_pred, sample_weight=None: mean_squared_error(
                y_true, y_pred, sample_weight=sample_weight, squared=False
            )
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class SeasonalNaive(BaseEstimator, RegressorMixin):
    def __init__(self, period=7):
        self.period = period

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        # Implement the seasonal naive using the period
        return y.shift(self.period)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

ggsdc · 2023-10-31T14:38:38Z

Code for:

Naive
sNaive
SES
Moving average

from scipy.optimize import minimize_scalar
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.metrics import mean_squared_error

metric_dict = {
    "mse": mean_squared_error,
    "rmse": lambda y_true, y_pred, sample_weight=None: mean_squared_error(
        y_true, y_pred, sample_weight=sample_weight, squared=False
    ),
}

class Naive(BaseEstimator, RegressorMixin):
    def __init__(self, horizon=1):
        self.horizon = horizon

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        return y.shift(self.horizon)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class ExponentialSmoothing(BaseEstimator, RegressorMixin):
    def __init__(self, alpha=None):
        self.alpha = alpha

    def fit(self, X=None, y=None, horizon=1):
        if self.alpha:
            return self
        # Optimize alpha to minimize the RMSE
        self.alpha = minimize_scalar(
            lambda alpha: self.score(X, y, alpha=alpha, horizon=horizon),
            bounds=(0, 1),
            method="bounded",
        ).x
        return self

    def predict(self, X=None, y=None, alpha=None, horizon=1):
        # Implement the exponential smoothing
        alpha = alpha or self.alpha
        if not alpha:
            alpha = 0.2
        return y.shift(horizon).ewm(alpha=alpha, adjust=False).mean()

    def score(
        self, X=None, y=None, metric=None, sample_weight=None, alpha=None, horizon=1
    ):
        preds = self.predict(X, y, alpha, horizon)
        if metric is None:
            metric =lambda y_true, y_pred, sample_weight=None: mean_squared_error(
                y_true, y_pred, sample_weight=sample_weight, squared=False
            )
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class SeasonalNaive(BaseEstimator, RegressorMixin):
    def __init__(self, period=7):
        self.period = period

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        # Implement the seasonal naive using the period
        return y.shift(self.period)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

Could we have this in a branch and a PR?

AntonioGonzalezSuarez · 2023-11-02T09:14:06Z

Where should they be placed within the repo? @ggsdc

ggsdc · 2023-11-02T09:25:29Z

Where should they be placed within the repo? @ggsdc

Place it wherever you want, during the PR review we can move it.

ggsdc · 2024-01-10T08:22:22Z

Some more additional info:
https://github.com/Mcompetitions
https://www.kaggle.com/competitions/m5-forecasting-accuracy/overview
https://mofc.unic.ac.cy/m5-competition/

AntonioGonzalezSuarez added enhancement New feature or request good first issue Good for newcomers labels Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize M Competition Basemodels (Point Forecast) #53

Normalize M Competition Basemodels (Point Forecast) #53

AntonioGonzalezSuarez commented Oct 24, 2023 •

edited

Loading

ggsdc commented Oct 24, 2023

AntonioGonzalezSuarez commented Oct 25, 2023

ggsdc commented Oct 31, 2023

AntonioGonzalezSuarez commented Nov 2, 2023

ggsdc commented Nov 2, 2023

ggsdc commented Jan 10, 2024

Normalize M Competition Basemodels (Point Forecast) #53

Normalize M Competition Basemodels (Point Forecast) #53

Comments

AntonioGonzalezSuarez commented Oct 24, 2023 • edited Loading

Base Models

Regular series

Intermitent series

Both series

Example of implementation

TODO list of implementations

ggsdc commented Oct 24, 2023

AntonioGonzalezSuarez commented Oct 25, 2023

ggsdc commented Oct 31, 2023

AntonioGonzalezSuarez commented Nov 2, 2023

ggsdc commented Nov 2, 2023

ggsdc commented Jan 10, 2024

AntonioGonzalezSuarez commented Oct 24, 2023 •

edited

Loading