In [None]:
#| hide
%load_ext autoreload
%autoreload 2

# Nixtla &nbsp; [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Statistical%20Forecasting%20Algorithms%20by%20Nixtla%20&url=https://github.com/Nixtla/statsforecast&via=nixtlainc&hashtags=StatisticalModels,TimeSeries,Forecasting) &nbsp;[![Slack](https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white)](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ)

<div align="center">
<center>
<img src="https://raw.githubusercontent.com/Nixtla/neuralforecast/main/nbs/imgs_indx/logo_mid.png">
</center>
<h1 align="center">Machine Learning 🤖 Forecast</h1>
<h3 align="center">Scalable machine learning for time series forecasting</h3>
    
[![CI](https://github.com/Nixtla/mlforecast/actions/workflows/ci.yaml/badge.svg)](https://github.com/Nixtla/mlforecast/actions/workflows/ci.yaml)
[![Python](https://img.shields.io/pypi/pyversions/mlforecast)](https://pypi.org/project/mlforecast/)
[![PyPi](https://img.shields.io/pypi/v/mlforecast?color=blue)](https://pypi.org/project/mlforecast/)
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/mlforecast?color=blue)](https://anaconda.org/conda-forge/mlforecast)
[![License](https://img.shields.io/github/license/Nixtla/mlforecast)](https://github.com/Nixtla/mlforecast/blob/main/LICENSE)
    
**mlforecast** is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

</div>

## Install

### PyPI

`pip install mlforecast`

### conda-forge

`conda install -c conda-forge mlforecast`

For more detailed instructions you can refer to the [installation page](docs/getting-started/install.html).

## Quick Start

**Minimal Example**

```python
import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)
```

**Get Started with this [quick guide](docs/getting-started/quick_start_local.html).**

**Follow this [end-to-end walkthrough](docs/getting-started/end_to_end_walkthrough.html) for best practices.**

### Sample notebooks
* [m5](https://www.kaggle.com/code/lemuz90/m5-mlforecast-eval)
* [m4](https://www.kaggle.com/code/lemuz90/m4-competition)
* [m4-cv](https://www.kaggle.com/code/lemuz90/m4-competition-cv)

## Why? 

Current Python alternatives for machine learning models are slow, inaccurate and don't scale well. So we created a library that can be used to forecast in production environments.  `MLForecast` includes efficient feature engineering to train any machine learning model (with `fit` and `predict` methods such as [`sklearn`](https://scikit-learn.org/stable/)) to fit millions of time series.

## Features

* Fastest implementations of feature engineering for time series forecasting in Python. 
* Out-of-the-box compatibility with Spark, Dask, and Ray.
* Probabilistic Forecasting with Conformal Prediction.
* Support for exogenous variables and static covariates.
* Familiar `sklearn` syntax: `.fit` and `.predict`.


Missing something? Please open an issue or write us in [![Slack](https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white)](https://join.slack.com/t/nixtlaworkspace/shared_invite/zt-135dssye9-fWTzMpv2WBthq8NK0Yvu6A)

## Examples and Guides

📚 [End to End Walkthrough](docs/getting-started/end_to_end_walkthrough.html): model training, evaluation and selection for multiple time series.

🔎 [Probabilistic Forecasting](docs/how-to-guides/prediction_intervals.html): use Conformal Prediction to produce prediciton intervals. 

👩‍🔬 [Cross Validation](docs/how-to-guides/cross_validation.html): robust model’s performance evaluation.

🔌 [Predict Demand Peaks](docs/tutorials/electricity_peak_forecasting.html): electricity load forecasting for detecting daily peaks and reducing electric bills.

📈 [Transfer Learning](docs/how-to-guides/transfer_learning.html): pretrain a model using a set of time series and then predict another one using that pretrained model. 

🌡️ [Distributed Training](docs/getting-started/quick_start_distributed.html): use a Dask, Ray or Spark cluster to train models at scale.


## How to use

The following provides a very basic overview, for a more detailed description see the [documentation](https://nixtla.github.io/mlforecast/).

### Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

In [None]:
from mlforecast.utils import generate_daily_series

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series.head()

Unnamed: 0,unique_id,ds,y,static_0
0,id_00,2000-01-01,17.519167,72
1,id_00,2000-01-02,87.799695,72
2,id_00,2000-01-03,177.442975,72
3,id_00,2000-01-04,232.70411,72
4,id_00,2000-01-05,317.510474,72


### Models

Next define your models. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are `LGBMForecast` and `XGBForecast`.

In [None]:
import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(verbosity=-1),
    xgb.XGBRegressor(),
    RandomForestRegressor(random_state=0),
]

### Forecast object

Now instantiate a `MLForecast` object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. The lag transformations are defined as [numba](http://numba.pydata.org/) *jitted* functions that transform an array, if they have additional arguments you can either supply a tuple (`transform_func`, `arg1`, `arg2`, ...) or define new functions fixing the arguments. You can also define differences to apply to the series before fitting that will be restored when predicting.

In [None]:
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean


@njit
def rolling_mean_28(x):
    return rolling_mean(x, window_size=28)


fcst = MLForecast(
    models=models,
    freq='D',
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [rolling_mean_28]
    },
    date_features=['dayofweek'],
    target_transforms=[Differences([1])],
)

### Training

To compute the features and train the models call `fit` on your `Forecast` object.

In [None]:
fcst.fit(series)

MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)

### Predicting

To get the forecasts for the next `n` days call `predict(n)` on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

In [None]:
predictions = fcst.predict(14)
predictions

Unnamed: 0,unique_id,ds,LGBMRegressor,XGBRegressor,RandomForestRegressor
0,id_00,2000-04-04,299.923771,309.664124,298.424164
1,id_00,2000-04-05,365.424147,382.150085,365.816014
2,id_00,2000-04-06,432.562441,453.373779,436.360620
3,id_00,2000-04-07,495.628000,527.965149,503.670100
4,id_00,2000-04-08,60.786223,75.762299,62.176080
...,...,...,...,...,...
275,id_19,2000-03-23,36.266780,29.889120,34.799780
276,id_19,2000-03-24,44.370984,34.968884,39.920982
277,id_19,2000-03-25,50.746222,39.970238,46.196266
278,id_19,2000-03-26,58.906524,45.125305,51.653060


### Visualize results

In [None]:
from utilsforecast.plotting import plot_series

In [None]:
fig = plot_series(series, predictions, max_ids=4, plot_random=False)
fig.savefig('figs/index.png', bbox_inches='tight')

![](https://raw.githubusercontent.com/Nixtla/mlforecast/main/nbs/figs/index.png)

## How to contribute
See [CONTRIBUTING.md](https://github.com/Nixtla/mlforecast/blob/main/CONTRIBUTING.md).