Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize M Competition Basemodels (Point Forecast) #53

Open
4 of 11 tasks
AntonioGonzalezSuarez opened this issue Oct 24, 2023 · 6 comments
Open
4 of 11 tasks

Normalize M Competition Basemodels (Point Forecast) #53

AntonioGonzalezSuarez opened this issue Oct 24, 2023 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@AntonioGonzalezSuarez
Copy link
Contributor

AntonioGonzalezSuarez commented Oct 24, 2023

The following list are basemodels used by M-Competition and I think It is a good idea to normalize how they are implemented to be able to automate the generation of this model.

Base Models

Regular series

  1. Naive: Random walk model $$\hat{Y}_{n+i} = Y_n \;, i=1,2,...h$$
  2. Seasonal Naive (sNaive): Like Naive, but taking into account possible seasonal component. Hence, uses the last known observation of the same period. (Weekly, Monthly, Annualy,...). For example if prediction frequency is daily use the same day of the week from last week.
  3. Moving Averages (MA): Forecast are computed based on the average of the last $k$ points
$$\hat{Y}_t=\frac{\sum_i Y_{t-i}}{k}$$
  1. Simple Exponential Smoothing (SES): Parameter a has to be decided. The parameter is selected from the range [0.1,0.3] by minimizing the insample mean squared error of the model.$
$$\hat{Y}_{t+1} = aY_t+a(1-a)y_{t-1}+a(1-a)^2Y_{t-2}+ ...$$

Intermitent series

  1. Croston's method: The method decomposes the series into the non-zero demand size $z_t$ and the inter-demand intervals, $p_t$:
$$\hat{Y}_t = \frac{\hat{z}_t}{\hat{p}_t}$$

where $z_t$ and $p_t$ are predicted using SES. In this case both SES models use $a=0.1$. First observation of the components is used as initialization for the SES model

  1. Optimized Croston's method (optCro): Same but smoothing parameter is selected from the range [0.1,0.3] like SES model.
  2. Syntetos-Boylan Approximation (SBA): Introduces a constant to debias the model:
$$\hat{Y}_t =0.95 \frac{\hat{z}_t}{\hat{p}_t}$$
  1. Teunter-Syntetos-Babai method (TSB): A modification to Croston's method that replaces the inter-demand intervals component with the demand probability, $d_t$, being $1$ if demmand occurs at time $t$ and $0$ otherwise. Similar to Croston's $d_t$ is forecasted using SES with independent smoothing parameter:
$$\hat{Y}_t =\hat{d}_t\hat{z}_t$$

Both series

  1. AutoRegressive: All models of autoregressive style: ARIMA, ARIMAX, SARIMA, SARIMAX,...
  2. Multi-Layer Perceptron (MLP): 14 input nodes (last 2 weeks of data), 28 hidden nodes, one output. Activations: Logistic and linear for output. There are 10 different MLPs models with different random initialization to minimize variations and ensemble using the median of all.
  3. RandomForest (RF)

Example of implementation

Here is a snippet of code to be able to normalize with the Sklearn API (fit, predict,...)

from sklearn.base import BaseEstimator, RegressorMixin
import numpy as np

class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

if __name__ == "__main__":
    # Example usage:
    from mango.data import get_ts_dataset

    df = get_ts_dataset()
    df = df.set_index("date")
    df["exog"] = (df.index.dayofweek == 5).astype(int)

    X_train = df[-200:-40]
    X_test = df[-40:]
    # Create the moving average with a window size of 3
    kwargs = {"y": X_test["target"], "horizon": 1}
    ma = MovingAverage(window_size=5)
    ma = ma.fit(X_train.drop(columns=["target"]), y=X_train["target"])
    preds = ma.predict(X_test.drop(columns=["target"]), **kwargs)
    print(ma.score(X_test.drop(columns=["target"]), **kwargs))

TODO list of implementations

  • Naive
  • Seasonal Naive
  • Moving averages
  • SES
  • Croston's
  • Optimized Croston's
  • Syntetos-Boylan Approximation (SBA)
  • Teunter-Syntetos-Babai method (TSB)
  • AutoRegressive
  • Multi-Layer Perceptron (MLP)
  • RandomForest (RF)
@AntonioGonzalezSuarez AntonioGonzalezSuarez added enhancement New feature or request good first issue Good for newcomers labels Oct 24, 2023
@ggsdc
Copy link
Member

ggsdc commented Oct 24, 2023

I have added a todo list to keep track of the implementations.

@AntonioGonzalezSuarez
Copy link
Contributor Author

Code for:

  • Naive
  • sNaive
  • SES
  • Moving average
from scipy.optimize import minimize_scalar
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.metrics import mean_squared_error

metric_dict = {
    "mse": mean_squared_error,
    "rmse": lambda y_true, y_pred, sample_weight=None: mean_squared_error(
        y_true, y_pred, sample_weight=sample_weight, squared=False
    ),
}

class Naive(BaseEstimator, RegressorMixin):
    def __init__(self, horizon=1):
        self.horizon = horizon

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        return y.shift(self.horizon)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class ExponentialSmoothing(BaseEstimator, RegressorMixin):
    def __init__(self, alpha=None):
        self.alpha = alpha

    def fit(self, X=None, y=None, horizon=1):
        if self.alpha:
            return self
        # Optimize alpha to minimize the RMSE
        self.alpha = minimize_scalar(
            lambda alpha: self.score(X, y, alpha=alpha, horizon=horizon),
            bounds=(0, 1),
            method="bounded",
        ).x
        return self

    def predict(self, X=None, y=None, alpha=None, horizon=1):
        # Implement the exponential smoothing
        alpha = alpha or self.alpha
        if not alpha:
            alpha = 0.2
        return y.shift(horizon).ewm(alpha=alpha, adjust=False).mean()

    def score(
        self, X=None, y=None, metric=None, sample_weight=None, alpha=None, horizon=1
    ):
        preds = self.predict(X, y, alpha, horizon)
        if metric is None:
            metric =lambda y_true, y_pred, sample_weight=None: mean_squared_error(
                y_true, y_pred, sample_weight=sample_weight, squared=False
            )
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class SeasonalNaive(BaseEstimator, RegressorMixin):
    def __init__(self, period=7):
        self.period = period

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        # Implement the seasonal naive using the period
        return y.shift(self.period)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

@ggsdc
Copy link
Member

ggsdc commented Oct 31, 2023

Code for:

  • Naive
  • sNaive
  • SES
  • Moving average
from scipy.optimize import minimize_scalar
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.metrics import mean_squared_error

metric_dict = {
    "mse": mean_squared_error,
    "rmse": lambda y_true, y_pred, sample_weight=None: mean_squared_error(
        y_true, y_pred, sample_weight=sample_weight, squared=False
    ),
}

class Naive(BaseEstimator, RegressorMixin):
    def __init__(self, horizon=1):
        self.horizon = horizon

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        return y.shift(self.horizon)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class ExponentialSmoothing(BaseEstimator, RegressorMixin):
    def __init__(self, alpha=None):
        self.alpha = alpha

    def fit(self, X=None, y=None, horizon=1):
        if self.alpha:
            return self
        # Optimize alpha to minimize the RMSE
        self.alpha = minimize_scalar(
            lambda alpha: self.score(X, y, alpha=alpha, horizon=horizon),
            bounds=(0, 1),
            method="bounded",
        ).x
        return self

    def predict(self, X=None, y=None, alpha=None, horizon=1):
        # Implement the exponential smoothing
        alpha = alpha or self.alpha
        if not alpha:
            alpha = 0.2
        return y.shift(horizon).ewm(alpha=alpha, adjust=False).mean()

    def score(
        self, X=None, y=None, metric=None, sample_weight=None, alpha=None, horizon=1
    ):
        preds = self.predict(X, y, alpha, horizon)
        if metric is None:
            metric =lambda y_true, y_pred, sample_weight=None: mean_squared_error(
                y_true, y_pred, sample_weight=sample_weight, squared=False
            )
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"
        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class SeasonalNaive(BaseEstimator, RegressorMixin):
    def __init__(self, period=7):
        self.period = period

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None):
        # Implement the seasonal naive using the period
        return y.shift(self.period)

    def score(self, X, y, metric=None, sample_weight=None):
        preds = self.predict(X, y)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        else:
            assert callable(
                metric
            ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)


class MovingAverage(BaseEstimator, RegressorMixin):
    def __init__(self, window_size=5):
        self.window_size = window_size

    def fit(self, X=None, y=None):
        return self

    def predict(self, X=None, y=None, horizon=1):
        # Implement the moving average using the window size and the horizon
        # Hint: use the pandas rolling function
        return y.shift(horizon).rolling(self.window_size).mean()

    def score(self, X, y, metric=None, sample_weight=None, horizon=1):
        preds = self.predict(X, y, horizon)
        if metric is None:
            metric = mean_squared_error
        if isinstance(metric, str):
            metric = metric_dict[metric]
        assert callable(
            metric
        ), "metric must be a callable with the signature metric(y_true, y_pred, sample_weight=None)"

        y = y[preds.notna()]
        preds = preds[preds.notna()]
        return metric(y, preds, sample_weight=sample_weight)

Could we have this in a branch and a PR?

@AntonioGonzalezSuarez
Copy link
Contributor Author

Where should they be placed within the repo? @ggsdc

@ggsdc
Copy link
Member

ggsdc commented Nov 2, 2023

Where should they be placed within the repo? @ggsdc

Place it wherever you want, during the PR review we can move it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants