# mlforecast

> Scalable machine learning based time series forecasting.

**mlforecast** is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

## Install

`pip install mlforecast`

## How to use

### Programmatic API

In [None]:
#hide
import os

from IPython.display import display, Markdown


os.chdir('..')


def display_df(df):
    display(Markdown(df.to_markdown()))

Store your time series in a pandas dataframe with an index named **unique_id** that is the identifier of each serie, a column **ds** that contains the datestamps and a column **y** with the values.

In [None]:
from mlforecast.utils import generate_daily_series

series = generate_daily_series(20)
display_df(series.head())

Then you define your flow configuration. These include lags, transformations on the lags and date features. The transformations are defined as `numba` jitted functions that transform an array. If they have additional arguments you supply a tuple (`transform_func`, `arg1`, `arg2`, ...)

In [None]:
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

flow_config = dict(
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)

Next define a model, if you're on a single machine this can be any regressor that follows the scikit-learn API. For distributed training there are `LGBMForecast` and `XGBForecast`.

In [None]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

Now instantiate your forecast object with the model and the flow configuration. There are two types of forecasters, `Forecast` and `DistributedForecast`. Since this is a single machine example we'll use the first.

In [None]:
from mlforecast.forecast import Forecast

fcst = Forecast(model, flow_config)

To compute the transformations and train the model on the data you call `.fit` on your `Forecast` object.

In [None]:
fcst.fit(series)

To get the forecasts for the next 14 days you just call `.predict(14)` on the forecaster.

In [None]:
predictions = fcst.predict(14)

display_df(predictions.head())

### CLI

If you're looking for computing quick baselines, want to avoid some boilerplate or just like using CLIs better then you can use the `mlforecast` binary with a configuration file like the following:

In [None]:
!cat sample_configs/local.yaml

This will use the data in `prefix/input` and write the results to `prefix/output`.

In [None]:
#hide
!mkdir data
series = generate_daily_series(20)
series.to_parquet('data/train')

In [None]:
!mlforecast sample_configs/local.yaml

In [None]:
!ls data/outputs/

In [None]:
#hide
!rm -rf data