# Input data 

Since **Version 0.4.0** only pandas series and dataframes are allowed (although internally numpy arrays are used for performance). Based on the type of pandas index, the following rules are applied:

+ If the index is not of type DatetimeIndex, a RangeIndex is created.

+ If the index is of type DatetimeIndex and but has no frequency, a RangeIndex is created.

+ If the index is of type DatetimeIndex and has a frequency, nothing is changed.

<script src="https://kit.fontawesome.com/d20edc211b.js" crossorigin="anonymous"></script>

<div class="admonition note" name="html-admonition" style="background: rgba(0,184,212,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px; border-left: 8px solid #00b8d4;">

<p class="title">
    <i class="fa-circle-exclamation fa" style="font-size: 18px; color:#00b8d4;"></i>
    <b> &nbsp Note</b>
</p>

There is nothing wrong with using data that does not have an associated date/time index. However, if a pandas series with an associated frequency is used, results will have a more useful index.

</div>

## Libraries

In [1]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg

## Data

In [2]:
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1991-07-01,0.429795
1991-08-01,0.400906
1991-09-01,0.432159
1991-10-01,0.492543
1991-11-01,0.502369
...,...
2008-02-01,0.761822
2008-03-01,0.649435
2008-04-01,0.827887
2008-05-01,0.816255


## Train and predict using input with datetime and frequency index

In [3]:
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)

2008-07-01    0.714526
2008-08-01    0.789144
2008-09-01    0.818433
2008-10-01    0.845027
2008-11-01    0.914621
Freq: MS, Name: pred, dtype: float64

## Train and predict using input without datetime index

In [4]:
data = data.reset_index(drop=True)
data.head()

Unnamed: 0,y
0,0.429795
1,0.400906
2,0.432159
3,0.492543
4,0.502369


In [5]:
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)

204    0.714526
205    0.789144
206    0.818433
207    0.845027
208    0.914621
Name: pred, dtype: float64

In [6]:
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>