Native Time Series and Forecasting Support (Sequence Learning) #49

andrewdalpino · 2019-11-11T22:35:43Z

Time series analysis is a popular machine learning technique for forecasting trends of time-dependent variables such as stock price, GDP, and quarterly sales. Given the popularity (#35, #38, #40) and current lack of tooling within the PHP ecosystem, I propose adding native time series support as well as a new type of estimator class for forecasting time series datasets. This includes the following ...

A datastructure extending Dataset for time series datasets that includes an additional index for timestamps
An additional estimator type "Forecaster" to predict the next k values in a series

There should be no need to modify any of the public interfaces to integrate these features into the current architecture

Proposed initial Forecaster implementations:

ARIMA - AutoRegressive Integrated Moving Average (univariate)
VARMAX - Vector AutoRegressive Moving Average with eXogenous regressors (multivariate)

Open to comments

BasvanH · 2019-11-12T06:59:43Z

Yes, I would very like those additions to the library. Thank you!

andrewdalpino · 2019-11-13T22:32:28Z

Thanks for the input @BasvanH

Expanding on the aforementioned design outline ...

The TimeSeries dataset object will have additional sorting, filtering, etc. methods that operate on the timestamp column. These will be similar to how Labeled provides additional methods that operate on labels. The timestamp column will allow either homogeneous integer or DateTime object elements.

Since time series estimation often diverges when considering univariate vs the multivarate case, the TimeSeries dataset object will handle both cases simultaneously, simply by keeping track of the number of target variables (as already accomplished using the numColumns() method on the Dataset class). For example, a univariate TimeSeries dataset object has a single column, whereas a multivariate one has more than 1 column. It will be the responsibility of the estimator to check whether the incoming dataset is compatible.

As mentioned previously, the public Estimator API will not change with the introduction of the new estimator type. In the case of forecasters the output of the predict() method will be the estimation of the next value given the last value in a series. The interpretation of the dataset therefore is slightly different at inference than during training in which the dataset is interpreted as a both contiguous and atomic. During inference, each sample will be considered independently and the value will be interpreted as either the empirical or theoretical last value of a time series the user would like to start inferring from. Since forecasters are estimators at heart, they benefit from all the additional tooling such as meta-Estimators and the cross validation framework.

In addition, we will add the Forecaster interface allowing estimators to implement the forecast() method which, unlike predict() will estimate the next k values starting at a given offset. It is assumed that most forecaster types will implement the Forecaster interface as prediction (as defined above) is only a special case of forecasting where k=1. There are currently two prototypes for the forecast() method signature to consider. The first is borrowing the idea of start and end from the statsmodels library (see their predict API). The second idea is to use the timestamp of the TimeSeries dataset object as the start and then output the next k subsequent values. The differences look like this ...

public forecast(TimeSeries $dataset, $start, $end) : array

vs.

public forecast(TimeSeries $dataset, int $k) : array

So far I personally prefer the latter case

As with the Learner, Probabalistic, and Ranking interfaces, the Forecaster interface will also include the forecastSample() method to handle inference on single samples at a time.

Open to comments

andrewdalpino · 2020-04-11T00:54:33Z

Update:

Since we are in a feature-freeze for the time being, this enhancement will be moved over to the Extras package for the time being and may be integrated into the main package after

LasseRafn · 2021-02-12T23:57:08Z

Hi! sorry for commenting on a closed issue.

The comment said that its moved to the Extras package, understandably, however is it that the idea will be moved there or is it already there?

Regardless I much appreciate all the hard work been put into RubixML, just curious. 😄

Rello · 2021-03-17T12:29:07Z

Hello,
I would also like to know the status here.
I would like to test forecasting for an idea on my side

thank you

andrewdalpino · 2021-03-17T22:45:12Z

Hello @LasseRafn and @Rello thanks for commenting, I'll give an update and we'll reopen this issue to keep the discussion going.

We haven't got around to implementing time-series in ML or Extras yet, although we have plenty of research planned in regards to sequence learning, we have no immediate plans to implement features at this time. Having that said, we're seeing an uptick in contributions, it's possible that someone from the community can take on this effort.

mindaugasdi · 2021-08-06T11:29:20Z

Could simpler sequence implementation be faster to implement first?

For example, dataset:

[0,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,0,0,0,1,1,1,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1,1,1,1,0,1,0,0,0,0]

I see in this data, that 1 is more likely to be followed by 1, and 0 is more likely to be followed by 0. The more 1 or 0 are in a row, the more likely next value to be the same. Maybe there are other patters too. If human can see this pattern, maybe ML could too (and state the confidence).

itrack · 2022-09-13T09:38:23Z

Hi guys!
Any news about this feature?

Thank you!

andrewdalpino · 2022-09-13T22:03:35Z

Hi @itrack. There's still talk about implementing VAR (vector autoregression) and LSTM. Nothing material has come about yet though. It's not that there's not enough want for sequence learning but that we really don't have the resources right now. Hopefully, we can attract more interest from the community.

ThomasW69 · 2023-06-15T06:51:39Z

Are there any new developments here in the meantime. I would also be interested in a time series forecast.

andrewdalpino added the enhancement New feature or request label Nov 11, 2019

andrewdalpino self-assigned this Nov 11, 2019

andrewdalpino added this to Backlog in Roadmap via automation Nov 11, 2019

andrewdalpino mentioned this issue Nov 11, 2019

Logger and persistence problem #42

Closed

andrewdalpino moved this from Backlog to In progress in Roadmap Nov 11, 2019

andrewdalpino mentioned this issue Nov 16, 2019

Determine the range of expected values #40

Closed

andrewdalpino moved this from In progress to Backlog in Roadmap Feb 4, 2020

andrewdalpino closed this as completed Apr 11, 2020

Roadmap automation moved this from Backlog to Completed Apr 11, 2020

andrewdalpino removed this from Completed in Roadmap Apr 11, 2020

andrewdalpino reopened this Mar 17, 2021

andrewdalpino changed the title ~~Native Time Series and Forecasting Support~~ Native Time Series and Forecasting Support (Sequence Learning) Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Time Series and Forecasting Support (Sequence Learning) #49

Native Time Series and Forecasting Support (Sequence Learning) #49

andrewdalpino commented Nov 11, 2019 •

edited

BasvanH commented Nov 12, 2019

andrewdalpino commented Nov 13, 2019 •

edited

andrewdalpino commented Apr 11, 2020

LasseRafn commented Feb 12, 2021

Rello commented Mar 17, 2021

andrewdalpino commented Mar 17, 2021 •

edited

mindaugasdi commented Aug 6, 2021

itrack commented Sep 13, 2022

andrewdalpino commented Sep 13, 2022

ThomasW69 commented Jun 15, 2023

Native Time Series and Forecasting Support (Sequence Learning) #49

Native Time Series and Forecasting Support (Sequence Learning) #49

Comments

andrewdalpino commented Nov 11, 2019 • edited

BasvanH commented Nov 12, 2019

andrewdalpino commented Nov 13, 2019 • edited

andrewdalpino commented Apr 11, 2020

LasseRafn commented Feb 12, 2021

Rello commented Mar 17, 2021

andrewdalpino commented Mar 17, 2021 • edited

mindaugasdi commented Aug 6, 2021

itrack commented Sep 13, 2022

andrewdalpino commented Sep 13, 2022

ThomasW69 commented Jun 15, 2023

andrewdalpino commented Nov 11, 2019 •

edited

andrewdalpino commented Nov 13, 2019 •

edited

andrewdalpino commented Mar 17, 2021 •

edited