In [1]:
%load_ext autoreload
%autoreload 2
import sys
#sys.path.insert(1, '/home/ximo/Documents/GitHub/skforecast')
%config Completer.use_jedi = False

Since **Version 0.4.0** only pandas series and dataframes are allowed (although internally numpy arrays are used for performance). Base on the type of pandas index, the following rules are applied:

+ If index is not of type DatetimeIndex, a RangeIndex is created.

+ If index is of type DatetimeIndex and but has no frequency, a RangeIndex is created.

+ If index is of type DatetimeIndex and has frequency, nothing is changed.

There is nothing wrong with using data that does not have an associated date/time index. However, if pandas series with an associated frequency is used, results will have a more useful index.

In [2]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg

In [3]:
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')

In [4]:
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)

2008-07-01    0.714526
2008-08-01    0.789144
2008-09-01    0.818433
2008-10-01    0.845027
2008-11-01    0.914621
Freq: MS, Name: pred, dtype: float64

In [5]:
data = data.reset_index(drop=True)
print(data.head().to_markdown())

|    |        y |
|---:|---------:|
|  0 | 0.429795 |
|  1 | 0.400906 |
|  2 | 0.432159 |
|  3 | 0.492543 |
|  4 | 0.502369 |


In [6]:
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)

204    0.714526
205    0.789144
206    0.818433
207    0.845027
208    0.914621
Name: pred, dtype: float64

There is nothing wrong with using data that does not have an associated date/time index. However, if pandas series with an associated frequency is used, results will have a more useful index.